<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Paul Leclercq</title>
    <description>The latest articles on Forem by Paul Leclercq (@paulleclercq).</description>
    <link>https://forem.com/paulleclercq</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F11258%2Fa6d39170-48e9-43a0-8a45-205fa950bf95.png</url>
      <title>Forem: Paul Leclercq</title>
      <link>https://forem.com/paulleclercq</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/paulleclercq"/>
    <language>en</language>
    <item>
      <title>Meetups, the right way</title>
      <dc:creator>Paul Leclercq</dc:creator>
      <pubDate>Sun, 19 May 2019 15:04:55 +0000</pubDate>
      <link>https://forem.com/paulleclercq/meetups-the-right-way-38bi</link>
      <guid>https://forem.com/paulleclercq/meetups-the-right-way-38bi</guid>
      <description>&lt;h1&gt;
  
  
  Meetups, the right way
&lt;/h1&gt;

&lt;p&gt;Under this provocative, clickbait title, I hope these feedback can help meetups organizers to better target their audience and make meetups more valuable for everyone.&lt;br&gt;
These feedback come from meetups of different people, cities and countries: Montpellier, Paris, Marseille, New York, Montréal and San Francisco.&lt;/p&gt;
&lt;h2&gt;
  
  
  Rule #1 : No Q&amp;amp;A (Question &amp;amp; Answer)
&lt;/h2&gt;

&lt;p&gt;I'm a simple person : I hate wars, I hate global warming deniers, I hate poverty, and… I hate Q&amp;amp;A at the end of conferences or meetups. I wonder why this is a absolute norm.&lt;br&gt;
During a meetup, you have the great power to have &lt;strong&gt;rare collective time&lt;/strong&gt;, so please use it wisely! Instead of a Q&amp;amp;A session (often as long as the speaker's talk), add a tiny talk with no slides from someone of the audience. &lt;br&gt;
Example: "I am working on this interesting project at my company because we did this thing this particular way."&lt;/p&gt;
&lt;h3&gt;
  
  
  Everyone should be able to express his/her opinion
&lt;/h3&gt;

&lt;p&gt;It's hard to express ourselves in front of an audience, and let's be honest, the tech community has more introverts than other communities. Having an open mic Q&amp;amp;A is not fair to everyone.&lt;/p&gt;
&lt;h3&gt;
  
  
  Alternatives
&lt;/h3&gt;

&lt;p&gt;Propose an interactive quiz with &lt;a href="https://kahoot.com/"&gt;Kahoot&lt;/a&gt;, an interactive survey on a slide at the beginning/end to know your audience. Have a &lt;a href="https://en.wikipedia.org/wiki/Master_of_ceremonies"&gt;MC&lt;/a&gt; to animate the audience, group/filter questions, or a person who has already prepared questions for a round table discussion. Another thing, they can highlight someone in the audience they know by doing 3 minute interview on what they are working on.&lt;/p&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--rd0JzpLs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1052902112332574722/PkV0qZPC_normal.jpg" alt="Katia Aresti profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Katia Aresti
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @karesti
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--P4t6ys1m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://practicaldev-herokuapp-com.freetls.fastly.net/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      I loved the idea of being able to send questions to the speaker through a form during the talk and someone from the organization asking at the end of the talk for us instead of passing microphones. More questions, more efficient, less trolling  &lt;a href="https://twitter.com/hashtag/bilbostack2019"&gt;#bilbostack2019&lt;/a&gt; &lt;a href="https://twitter.com/BilboStack"&gt;@BilboStack&lt;/a&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      12:58 PM - 26 Jan 2019
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1089145441818677248" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="/assets/twitter-reply-action.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1089145441818677248" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="/assets/twitter-retweet-action.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      6
      &lt;a href="https://twitter.com/intent/like?tweet_id=1089145441818677248" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="/assets/twitter-like-action.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
      32
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  Rule #2: No Q&amp;amp;A
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Just to be sure&lt;/strong&gt; 😉 &lt;br&gt;
Plus, following these rules, you will be able to filter out assholes, who waste collective time by asking a specific questions about version 6.2.X of an obscure software. They just want to look smart in front of everybody by saying something that nobody can understand and it will make you feel shitty about not knowing this. &lt;br&gt;
We all experienced something like this during meetups, it's time to say no more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comfort and location
&lt;/h2&gt;

&lt;p&gt;To attract more people, the venue must be able to host people. That could appear like an obvious one, but no. I'm sure we've all been to venue with not enough (personal) space. An university's lecture classroom is ideal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Food
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;We want beers and pizzas 🍕&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No. Developers are normal person, they can eat different type of foods, be sure to propose something that does not contain meat for everyone to feel welcomed &lt;/p&gt;

&lt;h2&gt;
  
  
  Set and say your rules
&lt;/h2&gt;

&lt;p&gt;Have an agenda of the evening.&lt;/p&gt;

&lt;p&gt;Say it's totally fine to leave the room at any time, it will not be received as an insult to the speaker. Attendees can also wait for a break to leave the room if they want to show more respect to the speaker.&lt;br&gt;
It's totally OK to not applaud at the end of a talk. If you liked the talk, say it directly to the speaker, or contact her/him on twitter/email later.&lt;/p&gt;

&lt;p&gt;By setting some rules, some people blocked by the stress of answering live questions would dare to present a subject, which would be great for the meetup diversity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start early, finish early.
&lt;/h2&gt;

&lt;p&gt;Companies should support employees leaving early when a free meetup happens, it's in their own interest.&lt;/p&gt;

&lt;p&gt;I do not want to and I cannot be attentive for several hours after 6pm, especially for technical subjects, and especially if I'm hungry.&lt;br&gt;
Make it short, and be sure to… (read below 😛)&lt;/p&gt;

&lt;h2&gt;
  
  
  Have enough time for networking
&lt;/h2&gt;

&lt;p&gt;What I mean by networking, is not to share a business card and to recruit people by talking about how disrupting your company is. It's to make sure to say hi, being friendly and smile to other people. They share the same passion as you, you share at least one interest together, it's rare in this world, so meet new people! &lt;/p&gt;

&lt;p&gt;A great icebreaker is to ask to all attendees to discuss for 4 minutes to the person next to them before the conference begins.&lt;br&gt;
Nothing is more painful than seeing a HR in a 2 hour technical meetup, they clearly want to be somewhere else. Be empathetic, say hi to them, and if you are looking for some fresh air you can help each other exchanging information 😄&lt;/p&gt;

&lt;h2&gt;
  
  
  Record talks
&lt;/h2&gt;

&lt;p&gt;It's the best advertising you can have to attract more people next time. I recommend &lt;a href="https://methylbro.fr/aventure/captation-video-des-meetups-au-live-streaming/"&gt;this article by Thomas Gasc @meltybro (french)&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Organizing meetups is no joke, thanks for all people who does it. I would not be the dev I am today without them : &lt;a href="https://en.wikipedia.org/wiki/Ubuntu_philosophy"&gt;Ubuntu&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What are your top advises for meetups organizers ?&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Special thanks to: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://twitter.com/NDuforet"&gt;https://twitter.com/NDuforet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://twitter.com/JDarsel"&gt;https://twitter.com/JDarsel&lt;/a&gt;
&lt;div class="ltag__user ltag__user__id__7256"&gt;
  
    .ltag__user__id__7256 .follow-action-button {
      background-color: #0030b3 !important;
      color: #ffffff !important;
      border-color: #0030b3 !important;
    }
  
    &lt;a href="/aurel_tyson" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jxKgetnA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://res.cloudinary.com/practicaldev/image/fetch/s--HlNnNTUy--/c_fill%2Cf_auto%2Cfl_progressive%2Ch_150%2Cq_auto%2Cw_150/https://dev-to-uploads.s3.amazonaws.com/uploads/user/profile_image/7256/-sXVkmzZ.jpg" alt="aurel_tyson image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/aurel_tyson"&gt;Aurel Tyson&lt;/a&gt;
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/aurel_tyson"&gt;/aurel_tyson&lt;/a&gt;
    &lt;/div&gt;
    &lt;p class="ltag__user__social"&gt;
        &lt;a href="https://twitter.com/Aurel_Tyson" rel="noopener"&gt;
          &lt;img class="icon-img" alt="twitter logo" src="https://res.cloudinary.com/practicaldev/image/fetch/s--oEHrSmvE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://practicaldev-herokuapp-com.freetls.fastly.net/assets/twitter-logo.svg"&gt;Aurel_Tyson
        &lt;/a&gt;
    &lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>meetup</category>
    </item>
    <item>
      <title>Twitter Without Short Links </title>
      <dc:creator>Paul Leclercq</dc:creator>
      <pubDate>Tue, 13 Feb 2018 03:44:49 +0000</pubDate>
      <link>https://forem.com/paulleclercq/twitter-without-short-links--1cig</link>
      <guid>https://forem.com/paulleclercq/twitter-without-short-links--1cig</guid>
      <description>

&lt;p&gt;As &lt;a href="https://chrome.google.com/webstore/detail/devtwitter/fhlipionhojfohecgljcljbpblojlaef?hl=en-US"&gt;dev.to&lt;/a&gt;, I created my own Chrome extension (sorry Firefox), I was tired of not seeing the complete URL of a tweet. I felt the right to know where I'm going to land. Also, a lot of information can be found in a URL (&lt;a href="https://www.nngroup.com/articles/url-as-ui/"&gt;URLs can be UI&lt;/a&gt;), and my click can depend on them. Example :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tu-spQlk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/polomarcus/faster-links/master/app/images/TweetWithoutFasterLink.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tu-spQlk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/polomarcus/faster-links/master/app/images/TweetWithoutFasterLink.png" alt="Without fasterLinks"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;center&gt;&lt;em&gt;What's the complete name of the repo?&lt;/em&gt;&lt;/center&gt;
&lt;br&gt;
I've been using my extension, called &lt;strong&gt;&lt;a href="https://chrome.google.com/webstore/detail/fasterlinks/ojggkiabpbjlckhpaphgdhhojgcpimah"&gt;fasterLinks&lt;/a&gt;&lt;/strong&gt; (worst name ever?), for &lt;strong&gt;3 years now&lt;/strong&gt;, and even now I have trouble to use Twitter without it. So I wanted to share it with my favorite online community.! It replaces every short URLs (bitly, etc.) and display long, for twitter, URLs at their full length. &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UTsMOGzE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/polomarcus/faster-links/master/app/images/TweetWithFasterLink.png" alt="With fasterLinks"&gt;&lt;center&gt;&lt;em&gt;fasterLinks!&lt;/em&gt;&lt;/center&gt;

&lt;p&gt;Marketers love to add several &lt;a href="https://en.wikipedia.org/wiki/UTM_parameters"&gt;UTMs&lt;/a&gt; to track tweets performance, as they are visual pollution, the extension removes them as well.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;fasterLinks&lt;/strong&gt; is free, &lt;a href="https://github.com/polomarcus/faster-links"&gt;open source&lt;/a&gt;, it does not contains any trackers, except the one used to count active users (28!!) on the Chrome store.&lt;/p&gt;

&lt;p&gt;Fork it &lt;a href="https://github.com/polomarcus/faster-links"&gt;https://github.com/polomarcus/faster-links&lt;/a&gt; and/or try it &lt;a href="https://chrome.google.com/webstore/detail/fasterlinks/ojggkiabpbjlckhpaphgdhhojgcpimah"&gt;https://chrome.google.com/webstore/detail/fasterlinks/ojggkiabpbjlckhpaphgdhhojgcpimah&lt;/a&gt;&lt;/p&gt;


</description>
      <category>twitter</category>
      <category>chromeextension</category>
      <category>tool</category>
      <category>showdev</category>
    </item>
    <item>
      <title>What I Like About Data Engineering</title>
      <dc:creator>Paul Leclercq</dc:creator>
      <pubDate>Tue, 23 Jan 2018 03:51:33 +0000</pubDate>
      <link>https://forem.com/paulleclercq/what-i-like-about-data-engineering-2hhe</link>
      <guid>https://forem.com/paulleclercq/what-i-like-about-data-engineering-2hhe</guid>
      <description>&lt;p&gt;I've recently been asked why I chose to specialize in data engineering and what I liked about it. &lt;/p&gt;

&lt;p&gt;As a developer, I'm more used to technical questions/articles/tweets based on facts and long hours of development, and it felt actually good to remember &lt;strong&gt;why&lt;/strong&gt; I spend some much time reading and developing by answering this question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; To build trustworthy analytical dashboards, whereas more data engineers I know seem to enjoy more building systems that scale well&lt;/p&gt;

&lt;h2&gt;
  
  
  The beginning
&lt;/h2&gt;

&lt;p&gt;I started to work as front-end dev/technical support in a high-paced advertising start-up (from 50 to 500+ people in a few years), dealing with lot of advertising &lt;em&gt;openRTB&lt;/em&gt; events such as click, view, complete... And got interested by knowing how the business actually runs and why customers use the platform, &lt;a href="https://www.strikemag.org/bullshit-jobs"&gt;simply because I want my work to make sense to me.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I could notice &lt;strong&gt;trust&lt;/strong&gt; was the number 1 reason, beside a wide network of publishers, why customers invest their advertising campaigns. And, dashboards were the key to it. Late, changing, or simply wrong metrics happen and can hurt business badly. And don't think it happens only to others, &lt;a href="https://www.forbes.com/sites/greatspeculations/2016/11/17/more-bugs-found-in-facebooks-ad-metrics-to-the-dismay-of-advertisers/#505ddbfe2a85"&gt;even Facebook keeps having "bug" with their ad metrics&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After having to support several cases like metrics discrepancies with remorseless customers, it led me to understand how metrics were calculated, &lt;em&gt;Hadoop MapReduce, Spark jobs&lt;/em&gt;, where those metrics were stored, &lt;em&gt;Cassandra, Postgres, Hadoop Distributed File System&lt;/em&gt;, what could we do if something goes wrong, &lt;em&gt;Lambda/Kappa architecture&lt;/em&gt;, how to send data between services, &lt;em&gt;Kafka&lt;/em&gt;, and what format should data be, &lt;em&gt;Avro, Parquet&lt;/em&gt;, &lt;strong&gt;to be sure to provide the best metrics possible that benefit business.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Thanks to this willingness to learn, another startup trusted me to become their first data engineer. I'll be forever thankful for everything I learned there. &lt;em&gt;I know I sound cliché :p&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My advice to start Data Engineering
&lt;/h2&gt;

&lt;p&gt;Data engineering, is still software development, at the point where I want to call myself more and more software engineer/developer/your favorite words to describe a developer, instead of data engineer.&lt;br&gt;
The only difference is you might spend a bit more time in your database/system config files or documentation, one of my running joke is that I'm not a data engineer but a configuration engineer, so &lt;strong&gt;you don't be afraid to start, you can do it.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Learn
&lt;/h2&gt;

&lt;p&gt;You can start today to write your first Spark code using Scala/Java/Python/SQL or R on &lt;a href="https://community.cloud.databricks.com/"&gt;Databricks&lt;/a&gt;, that provides a notebook platform with a free 6G server to analyze your favorite dataset or use one of their introduction notebooks. Or play with &lt;a href="https://cloud.google.com/bigquery/public-data/"&gt;BigQuery, Google's data warehouse for analytics, to analyse open datasets&lt;/a&gt; like &lt;a href="https://bigquery.cloud.google.com/dataset/bigquery-public-data:github_repos"&gt;Github.&lt;/a&gt; &lt;/p&gt;

&lt;h2&gt;
  
  
  Learn more
&lt;/h2&gt;

&lt;p&gt;Maxime Beauchemin (AirBnb/Lyft) writes beautiful &lt;a href="https://medium.freecodecamp.org/the-rise-of-the-data-engineer-91be18f1e603"&gt;a&lt;/a&gt;rticle&lt;a href="https://medium.com/@maximebeauchemin/the-downfall-of-the-data-engineer-5bfb701e5d6b"&gt;s&lt;/a&gt; about Data Engineering&lt;/p&gt;

&lt;p&gt;I name-dropped a lot of technologies that look fancy and complicate, &lt;a href="https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html"&gt;Martin Kleppman's Designing data-intense applications&lt;/a&gt; book contains everything you need to know about the subject. Remember, deep down it's all about simple config files.&lt;/p&gt;

&lt;p&gt;Now, I'm also going to ask you a simple question&lt;/p&gt;

&lt;h3&gt;
  
  
  Why do you do what you do ?
&lt;/h3&gt;

</description>
      <category>data</category>
      <category>love</category>
      <category>work</category>
      <category>thankful</category>
    </item>
    <item>
      <title>I Suck At Whiteboard Interviews</title>
      <dc:creator>Paul Leclercq</dc:creator>
      <pubDate>Fri, 07 Jul 2017 09:18:08 +0000</pubDate>
      <link>https://forem.com/paulleclercq/i-suck-at-whiteboard-interviews</link>
      <guid>https://forem.com/paulleclercq/i-suck-at-whiteboard-interviews</guid>
      <description>&lt;p&gt;&lt;em&gt;Also published on &lt;a href="https://medium.com/@polomarcus/i-suck-at-whiteboards-interviews-809e9927d2d6" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;/em&gt; &lt;br&gt;
Nowadays my main objective is to find my dream job, why? Simply because I am young, don’t have any kids, don’t have any loans, and I think I have the right positive energy and knowledge to work on everything, so why not focus on these kind of jobs?&lt;br&gt;
I’ve recently done interviews for Spotify in their Stockholm office for a Data Engineer position thanks to 2 blog articles (&lt;a href="https://medium.com/@polomarcus/analyze-one-year-of-radio-station-songs-aired-with-sql-spark-spotify-and-databricks-835fcf73df6" rel="noopener noreferrer"&gt;first&lt;/a&gt;, and &lt;a href="https://medium.com/@polomarcus/music-recommendation-service-with-the-spotify-api-spark-mllib-and-databricks-7cde9b16d35d" rel="noopener noreferrer"&gt;second&lt;/a&gt;) I wrote about their API. I had my first video interview in March 2017 and my last on-site interview on May 31st.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1500%2F1%2Agwe2Kiq_Z_Z6ECfgyXdO5g.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1500%2F1%2Agwe2Kiq_Z_Z6ECfgyXdO5g.jpeg" alt="Stockholm, Sweden (credit Unsplash)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Why I consider this position as a possible dream job:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They’ve worked a lot &lt;a href="https://vimeo.com/85490944" rel="noopener noreferrer"&gt;on their engineering culture&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Excellent reads on &lt;a href="https://labs.spotify.com/" rel="noopener noreferrer"&gt;their tech blog&lt;/a&gt;, meaning that you can collaborate with one of the best engineers and learn from them&lt;/li&gt;
&lt;li&gt;They used a managed solution on the Google Cloud Platform, which means spending more time on your product and less time managing serversÂ &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/spotify/scio" rel="noopener noreferrer"&gt;build the tools to work efficiently on top of GCP&lt;/a&gt; and &lt;a href="https://beam.apache.org/get-started/beam-overview/" rel="noopener noreferrer"&gt;use new tech&lt;/a&gt; for defining batch and streaming data processing jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This kind of job makes you grow as a engineer.&lt;/p&gt;
&lt;h1&gt;
  
  
  Preparation
&lt;/h1&gt;

&lt;p&gt;I’ve read ton of articles, read &lt;a href="https://library.oreilly.com/book/0636920032175/designing-data-intensive-applications/26.xhtml?ref=toc#idm140605782689984" rel="noopener noreferrer"&gt;Martin Kleppman’s “designing data-intense applications”&lt;/a&gt;, &lt;a href="http://shop.oreilly.com/product/0636920046967.do" rel="noopener noreferrer"&gt;Holden Karau’s “High Performance Spark”&lt;/a&gt;, tried a lot of new tech, presented &lt;a href="https://www.slideshare.net/PaulLeclercq2/analyze-one-year-of-radio-station-songs-aired-with-spark-sql-spotify-and-databricks" rel="noopener noreferrer"&gt;my first talk&lt;/a&gt; at 2 Meetups, thanks to the interview pressure I felt.&lt;/p&gt;

&lt;p&gt;&lt;iframe class="tweet-embed" id="tweet-483688700447326209-881" src="https://platform.twitter.com/embed/Tweet.html?id=483688700447326209"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-483688700447326209-881');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=483688700447326209&amp;amp;theme=dark"
  }



&lt;/p&gt;

&lt;center&gt;*The more you know the less you know, Trump version*&lt;/center&gt;

&lt;p&gt;I also had a few technical interviews with others companies to practice my presentation skills, &lt;em&gt;well… that’s the game and also you might get lucky enough to find excellent opportunities&lt;/em&gt;, where I was sometimes weirdly questioned about stuffs I learned during my computer science studies: &lt;a href="https://stackoverflow.com/questions/4980757/how-do-hashtables-deal-with-collisions/4980797#4980797" rel="noopener noreferrer"&gt;how collisions are handled within a hashtable&lt;/a&gt;, or exercises about &lt;a href="https://www.interviewcake.com/concept/python/linked-list?" rel="noopener noreferrer"&gt;Linked List&lt;/a&gt;. I spent a lot of time to prepare for it, it made me a better engineer, and it feels good.&lt;br&gt;
I’ve also noted everything I could while doing interviews, to be sure to work on my weak points for the next ones.&lt;br&gt;
Also to boost my confidence, just before my on-site interview, I spent the whole day at the National Library of Stockholm studying for my interview that will take place the next day.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F1%2AsN6zs7mGAG2wIi91T1V2nQ.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F1%2AsN6zs7mGAG2wIi91T1V2nQ.jpeg" alt="Stockholm, Sweden (credit Unsplash)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  On-site Interviews
&lt;/h1&gt;

&lt;p&gt;I got welcomed by the HR I had on the phone, he explains me kindly what was going to happen and how I should act during interviews. During chit-chat he asks me how I was, I sincerely answered him that I felt stressed and that’s I didn’t have much sleep the night before, but I was really excited and curious about these interviews.&lt;/p&gt;

&lt;p&gt;They use easygoing-style interviews to be sure you act normal as possible, or at least the most normal you can, in front of a whiteboard. A lot of smile, soft drinks, bathroom breaks invits…&lt;/p&gt;

&lt;h2&gt;
  
  
  Programming interview and SystemÂ design
&lt;/h2&gt;

&lt;p&gt;I’ve always have this weird feeling when I got asked something that the interviewer know, actually I have the same feeling when I speak English to a French-speaking person that speaks better English than I do. The advice I have, and that I have trouble to follow, is to imagine you want to explain your solution to one of colleague that doesn’t work in your team, as a data engineer, for example a front-end developer: he has the potential to understand it, but hasn’t spent as much time thinking about it as you have.&lt;br&gt;
So we’ve talked about how we could design a real-time dashboard for the most played songs, and an exercise about removing duplicates from a list while maintaining the order. I’ve had this great feeling that I’ve almost done everything right when going out of these interview, and it gave me confidence for the others.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lunch
&lt;/h2&gt;

&lt;p&gt;I had a one-hour break, that I thought I was going to spend alone, so I took a book with me, but actually, the Engineers that were my interviewers for the day ate with me, I really enjoyed it, and I felt like a true team member, even for one hour.&lt;br&gt;
&lt;em&gt;Fun anecdote: I once got interviewed by a huge company, where my 2 interviewers, that have apparently already eaten, couldn’t leave me alone while I was having my lunch, so I had the strangest lunch of my entire life, where 2 people that you don’t know try to make small talks while watching you eating.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Culture interview
&lt;/h2&gt;

&lt;p&gt;I would rather had a written culture interview, because some questions cannot be answered without thinking them throughÂ : what is the best advice you gaveÂ ? &lt;em&gt;Hmm.. Well.. Exercice and eat healthy, I guess?.&lt;/em&gt; Other questions were what is a team? What is a successful one? And you should answer these questions with concrete examples.&lt;br&gt;
I try to be honest when answering these questions, for example I said that I need a team to backup the solution that I propose otherwise I don’t feel 100% sure. But I guess it can be quite easy to embellish answers. My advice would be try to do so, because that’s simply the interview game.&lt;br&gt;
Another advice, you should have no doubts in which team you want to work with, every answer you give should say, I am the right fit for this particular team. To know this, you need to do your homework by looking for on LinkedIn what people actually do in their team.&lt;/p&gt;

&lt;h2&gt;
  
  
  Case study interview
&lt;/h2&gt;

&lt;p&gt;With a real-life problem they had, they want to know how you would debug the problem that is hidden somewhere inside monitoring tools. It was a role-play kind of interview, and by the end he had fun with the 2 interviewers. One of them knew that I’ve built the #1 online alarm clock. It feels good when interviewers know your profile, you can say that’s a obvious thing to do when interviewing people, but it’s really rare. You would be surprised by the number of times I had to present my profile during different interviews at the same company, this is one of the reason I currently feel a fatigue of interviews, although I love talking about data engineering. One time on a 5-hour on-site interview, I had to answer to a question that has been asked to me at a previous interview on the same day, even if I told the interviewer that I’ve already been asked this, he suggested me: try to give a different answer. After the on-site interviews, I knew I won’t accept any jobs there.Â &lt;br&gt;
Some advices I would give to interviewers, please be agile and act as an human being, as every interviewee is different.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;After 5 hours at the Spotify’s office, being interviewed, I felt weirdly great and full of energy, meaning that I had good times. I summed up my day with the HR I met when I arrived at the beginning of the day, and he proposed me very nicely to visit the whole office: I’ve never seen some many microwaves in my life haha&lt;br&gt;
2 days later, after discovering Stockholm, the city where I was going to move because I did so great, I received this email from the same HR&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am sorry that I have not been able to get in touch earlier with the outcome of the onsite. It took longer than expected. Unfortunately, I have to let you know that we will not proceed with you for this role.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F1%2A4xR6bhFYug1LMRArIP7N0Q.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F600%2F1%2A4xR6bhFYug1LMRArIP7N0Q.jpeg" alt="Boy, that hurts"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;He also offers me to go through each interview by phone so that I can understand their final answer. It was really appreciated. To sum up their decision, I tend to lack of confidence, that I don’t feel sure when I explain my solution. But I guess this is what  whiteboard interviews is all about, see how you can handle a new problem without your usual tools, a proper amount of time to check your different solutions and 2 people observing all your movements. I would had loved to be asked a bit more about the solutions I presented to show that I could answer, most, questions. But also, to confort me, the HR told me that all interviewers said that I was friendly, &lt;em&gt;it’s a good point for Gryffondor, I guesss?&lt;/em&gt; And he also gave me hope that I could be a good fit in one year.Â &lt;/p&gt;

&lt;h1&gt;
  
  
  My thoughts on 5-hours on-site interviews
&lt;/h1&gt;

&lt;p&gt;The on-site interviews happened after being selected among a tons of CVs, two 45-minute phone/video interviews with HRs and engineers to make sure you can fit the role they offer, and a 1:30 hour video technical interview with 2 engineers. If we sum the hours spent with the on-site interviews, it’s 8 hours.&lt;br&gt;
If I compare that with my experience of working with new hires, I could say in 30 minutes if the person knows how to code, if I will get along well with him/her, and most importantly, if the person is passionate about what she/he does. &lt;strong&gt;&lt;a href="https://medium.com/javascript-scene/tech-hiring-has-always-been-broken-heres-how-i-survived-it-for-decades-b7ac33088de6" rel="noopener noreferrer"&gt;How could we need 8 hours to see that&lt;/a&gt;?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Although I had a great time being interviewed at Spotify, I have to say that for a second I couldn’t help feeling frustrated and seriously thinking about questioning my professional skills. Especially when you think you are going to get an offer because you were quite happy with all the answers you gave, and sometimes even proud. And also when you went to the extra miles by writing 2 show cases of your skills based on their API.&lt;/p&gt;

&lt;p&gt;On the other hand, &lt;em&gt;how would you handle 10 000 applicants for a position?&lt;/em&gt; You have to select on details, like Spotify did with me, and I can totally understand that.&lt;/p&gt;

&lt;p&gt;In my opinion, on-site interviews should be casual, no code needed, as they’re done during video-interviews. Only case study ones or architecture design to see your point of view on different subjects and what you could bring to the company, with the whole team you are being interviewed for, followed up by a comfort meal all together to be sure that you can fit well among the team.&lt;/p&gt;

&lt;h1&gt;
  
  
  Try.
&lt;/h1&gt;

&lt;p&gt;To finish this article, and encourage you taking a proper amount of time to apply for your dream/better job,have a look to &lt;a href="https://dev.to/ben/embrace-how-random-the-programming-interview-is"&gt;this quote from Ben of the Practical Dev&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;But if your goal is to land your dream jobâ€Š–â€Šor just get your first jobâ€Š–â€Šplay the game and take more risks. It is up to the hiring firm to disqualify you, it’s not your job. Somebody is going to be the benefactor of randomness, it may as well be you.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>interview</category>
      <category>spotify</category>
      <category>whiteboard</category>
    </item>
    <item>
      <title>Analyze one year of radio station songs aired with SQL, Spark, Spotify, and Databricks</title>
      <dc:creator>Paul Leclercq</dc:creator>
      <pubDate>Mon, 20 Mar 2017 15:50:16 +0000</pubDate>
      <link>https://forem.com/paulleclercq/analyze-one-year-of-radio-station-songs-aired-with-sql-spark-spotify-and-databricks</link>
      <guid>https://forem.com/paulleclercq/analyze-one-year-of-radio-station-songs-aired-with-sql-spark-spotify-and-databricks</guid>
      <description>&lt;p&gt;&lt;em&gt;Note: This post was originally published on &lt;a href="https://medium.com/@polomarcus/analyze-one-year-of-radio-station-songs-aired-with-sql-spark-spotify-and-databricks-835fcf73df6#.7o1uwd9vc" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Whenever I drive or code, I listen to music, as this happens a lot, and in order to find new songs, I listen to the radio or I listen to Spotify's discover weekly playlist, &lt;em&gt;which made me like Mondays (because they release it every Monday)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A french &lt;em&gt;old-school&lt;/em&gt; institute called &lt;a href="http://www.mediametrie.fr/radio/" rel="noopener noreferrer"&gt;MediamÃ©trie&lt;/a&gt; analyzes radio stations' songs. Since I have seen their study (&lt;em&gt;that I can't find anymore&lt;/em&gt;) some years ago, I have been obsessed with creating my own.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2Aaic7vTL46SYNhhkdHgir9Q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2Aaic7vTL46SYNhhkdHgir9Q.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Mediametrie's image you can find online â†’ old school*&lt;/center&gt;

&lt;p&gt;This article will present the year 2016 for 4 main french radio stations through &lt;em&gt;fun&lt;/em&gt; SQL queries, then we will connect each song to the Spotify API to create the radio stations' musical profile.&lt;/p&gt;

&lt;p&gt;We will use the &lt;a href="https://community.cloud.databricks.com/" rel="noopener noreferrer"&gt;Databricks community version&lt;/a&gt; to visualize our data. &lt;a href="https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6937750999095841/1807085967979471/6197123402747553/latest.html" rel="noopener noreferrer"&gt;All SQL queries and all results are available on this notebook.&lt;/a&gt; It's the “backstage” of this article, where the magic happens if we can say.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Protip&lt;/strong&gt;: don't miss the bonuses at the end of the article&lt;/p&gt;

&lt;h3&gt;
  
  
  Radio stations introduction
&lt;/h3&gt;

&lt;p&gt;We all have a favorite radio station, mine is &lt;a href="https://en.wikipedia.org/wiki/Radio_Nova_%28France%29" rel="noopener noreferrer"&gt;Radio Nova&lt;/a&gt; for their diversity, their humor, and as a hip hop fan this is the only national radio where we can hear &lt;em&gt;listenable&lt;/em&gt; hip hop songs&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F0%2AnKeR6DQGjRIQj5a7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F0%2AnKeR6DQGjRIQj5a7.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Radio nova had&lt;a href="http://www.mediametrie.fr/radio/communiques/telecharger.php?f=b132ecc1609bfcf302615847c1caa69a" rel="noopener noreferrer"&gt; 1,4% of the audience in September 2016 (PDF to download from Mediametrie)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2A_g363O12zm0KugNa_qcQ0g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2A_g363O12zm0KugNa_qcQ0g.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Most 5 broadcasted songs on Nova in 2016*&lt;/center&gt;

&lt;p&gt;In order to see how a radio becomes number 1, we are also going to analyze the number 1 music radio called &lt;a href="http://www.nrj.fr/" rel="noopener noreferrer"&gt;NRJ &lt;/a&gt;who has 10,8% of the audience and 2 others : Virgin (5%) which, we'll see, sounds like NRJ, and Skyrock (6%), &lt;em&gt;don't mind the name it's a rap radio… haha&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AiKd8P0iu3oe6C_iOYeYeAA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AiKd8P0iu3oe6C_iOYeYeAA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Most 5 broadcasted songs on NRJ in 2016*&lt;/center&gt;

&lt;p&gt;The main question is, after we compared these radios, should we give to Radio Nova the tips of how to be the number one based on NRJ's analyze? &lt;em&gt;What do you say, Nova? Learn from the best, right?!&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting the Radio's songs data
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2Al8ljCalKR3Tpnpp9ZM9Kbg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2Al8ljCalKR3Tpnpp9ZM9Kbg.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*“What was this title?” Nova page*&lt;/center&gt;

&lt;p&gt;In order to extract the songs lists, artist, song title and timestamp, we are going to parse each Radio “What was this song?” HTML pages, except for Skyrock which &lt;a href="http://skyrock.fm/api/v3/sound?search_date=2016-08-15&amp;amp;search_hour=04:59" rel="noopener noreferrer"&gt;has a handy RESTful web service&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Every song extracted will be converted into this Song class to query them easily with (Spark) SQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;    &lt;span class="nc"&gt;Song&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;timestamp:&lt;/span&gt;&lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;humanDate:&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;year:&lt;/span&gt;&lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;month:&lt;/span&gt;&lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;day:&lt;/span&gt;&lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;hour:&lt;/span&gt;&lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;minute:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;artist:&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;allArtists:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;title:&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In 2016 300K broadcasts were collected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nova : 95K broadcasts of 5000 different songs&lt;/li&gt;
&lt;li&gt;NRJ : 50K broadcasts of 800 different songs&lt;/li&gt;
&lt;li&gt;Virgin: 60K broacasts of 1200 different songs&lt;/li&gt;
&lt;li&gt;Skyrock: 100K broadcasts of 1000 different songs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every songs is stored in a &lt;a href="http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files" rel="noopener noreferrer"&gt;parquet format&lt;/a&gt; to extract only once the data (&lt;em&gt;you're welcome radios servers :p&lt;/em&gt;) and &lt;a href="https://blog.cloudera.com/blog/2016/04/benchmarking-apache-parquet-the-allstate-experience/" rel="noopener noreferrer"&gt;to speed up SparkSQL queries&lt;/a&gt;. &lt;em&gt;Btw, if you are interested by the file I can export it to you in CSV, or parquet.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Remember that the best way to speed up, the &lt;a href="http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence" rel="noopener noreferrer"&gt;Spark doc&lt;/a&gt; says often by more than 10x, queries, if you have to use the same SQL table (or Dataset/Dataframe) again and again, is to cache them in memory (&lt;em&gt;Thanks Databricks for the 6Go RAM server!&lt;/em&gt;) with the &lt;code&gt;dataframe.cache()&lt;/code&gt; method.&lt;/p&gt;

&lt;p&gt;Let's dive into our analysis now ! &lt;/p&gt;

&lt;h3&gt;
  
  
  How many songs by day?
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Some days were not recorded by the radios' history system, so the real numbers should be a bit higher.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A0tYTwHCCN6r2nq6EqJojdA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A0tYTwHCCN6r2nq6EqJojdA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Songs broadcasted by day*&lt;/center&gt;

&lt;p&gt;Fun to see that both radio stations broadcast more songs during summer (if we do not take in consideration the one-week bug of Radio Nova, in blue, in August), this is certainly due to summer holidays. &lt;em&gt;They do a good job all year long, so, it's OK to take some days off, I guess!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We can see that Skyrock and Nova broadcasts the same number of songs each day, whereas NRJ and Virgin a bit less, certainly due to more talk shows or untracked DJs night shows.&lt;/p&gt;

&lt;h3&gt;
  
  
  How many different songs by day?
&lt;/h3&gt;

&lt;p&gt;The real difference comes from the number of different songs played, see by yourself the number of different tracks per day:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AOmx2QwvONBJMBLmUOtD5Pw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AOmx2QwvONBJMBLmUOtD5Pw.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Different songs by day*&lt;/center&gt;

&lt;p&gt;More mainstream radios such as NRJ, Virgin and Skyrock top 100/120 different songs a day whereas Nova is more about 280. If you want to discover more songs, it's clearly on Nova.&lt;/p&gt;

&lt;h3&gt;
  
  
  How many different songs by month?
&lt;/h3&gt;

&lt;p&gt;If we have a look to the monthly different songs, the gap between radios is even bigger.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A7WL3--Hb00WNPvjeM5if8w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A7WL3--Hb00WNPvjeM5if8w.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Different songs by month in 2016*&lt;/center&gt;

&lt;h3&gt;
  
  
  Top 10 played titles by each radio station
&lt;/h3&gt;

&lt;p&gt;It's interesting to see how “hits” are played through the year. &lt;/p&gt;

&lt;p&gt;We can notice summer hits: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://open.spotify.com/track/0EM0yABJzbFOvZQkfvuvCy" rel="noopener noreferrer"&gt;Kaytranada&lt;/a&gt; for Nova &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://open.spotify.com/track/6YZdkObH88npeKrrkb8Ggf" rel="noopener noreferrer"&gt;Enrique Iglesias&lt;/a&gt; for NRJ&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://open.spotify.com/track/27PmvZoffODNFW2p7ehZTQ" rel="noopener noreferrer"&gt;Kent Jones&lt;/a&gt; and &lt;a href="https://open.spotify.com/track/1xznGGDReH1oQq0xzbwXa3" rel="noopener noreferrer"&gt;Drake&lt;/a&gt; for Skyrock &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://open.spotify.com/track/1ZdWNpOXCJT1nmt40UuxWS" rel="noopener noreferrer"&gt;Imany&lt;/a&gt; and &lt;a href="https://open.spotify.com/track/0cAuqPI1R8RlFsXXWWO039" rel="noopener noreferrer"&gt;Kungs&lt;/a&gt; for Virgin.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can tell that the most broadcasted songs are mostly aired during summer. So artists, play smart here and release your songs between February and June to have more chance to become number one, &lt;em&gt;or to have more people hating your music because they heard it too many times?&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Nova
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A9mum9beXiQln5uSyfy2yww.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A9mum9beXiQln5uSyfy2yww.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  NRJ
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AawiIMX1Bwu5rdS-_rkeeYg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AawiIMX1Bwu5rdS-_rkeeYg.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Skyrock
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AZ4KvMZzWL7ltwtjwMM5Nkw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AZ4KvMZzWL7ltwtjwMM5Nkw.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Virgin
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AnZMYiYJttXEOaV7KugtjHA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AnZMYiYJttXEOaV7KugtjHA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Percentage of music by day
&lt;/h3&gt;

&lt;p&gt;If we take the average broadcasted songs by day and the mean duration of a song, 3.30 minutes, we can guess the percentage of music by day. The other percentage is likely to be talk shows, advertising or untracked songs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AxAIRqQ96Leq1zSD--o4akA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AxAIRqQ96Leq1zSD--o4akA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Percentage of music per day*&lt;/center&gt;

&lt;p&gt;To understand more these percentages, we should see what a normal day is for our analyzed radio stations.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is a typical Monday for our radio stations?
&lt;/h3&gt;

&lt;p&gt;Let's have a look to the average of number of songs for all radio stations for Mondays&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2Ae3ev4GNUCtKPcnq9SmgoFA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2Ae3ev4GNUCtKPcnq9SmgoFA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Average number of songs for Mondays*&lt;/center&gt;

&lt;p&gt;We can distinguish 2 gaps, during the morning and evening shows for every radio stations. &lt;em&gt;Amazing.&lt;/em&gt; More seriously, no discovery here, it's a known fact that most radios have morning and evening shows during which there is less music and more talk.&lt;/p&gt;

&lt;h4&gt;
  
  
  Advertising time
&lt;/h4&gt;

&lt;p&gt;If we recalculate the average percentage of music at noon, when there is no shows for all radio stations, we can estimate the percentage of advertising by radio by hour. We estimate that the radio hosts speak 5 minutes during the whole hour. &lt;em&gt;We have to note that radios may advertise more during prime time when they have a larger audience.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For 60 minutes, we get from 7 minutes of advertising time, for Skyrock, to 15 minutes, for Virgin. In details, we have this table: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F1%2AVcuOm-luXT_e_q6d-5q9Zg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F1%2AVcuOm-luXT_e_q6d-5q9Zg.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Average minute of music and advertising for every Monday at noon*&lt;/center&gt;

&lt;h3&gt;
  
  
  Radios  brainwashing?
&lt;/h3&gt;

&lt;p&gt;An annoying feeling we have sometimes with radios is we keep listening to the same songs over and over. As we are men and women who believe in science and not in our instinct we are going to use basic statistics to verify this weird feeling.&lt;/p&gt;

&lt;h4&gt;
  
  
  How many times is the same song aired on the same day?
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AAhz21it10spW3txOSnPJew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AAhz21it10spW3txOSnPJew.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Average number of times the same song isbroadcasted by day*&lt;/center&gt;

&lt;p&gt;These pie charts below tells us a lot about radio stations's habits, more mainstream radios such as Virgin, NRJ or Skyrock are more about to broadcasts the same songs multiple times.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F1%2At-xKlfHx8j5fWBSG5epeaw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F1%2At-xKlfHx8j5fWBSG5epeaw.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*How many times the same song is broadcasted?*&lt;/center&gt;

&lt;h4&gt;
  
  
  When is the next time we will listen to the same song during the same day?
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F1%2AS-baYaxkztVV7znKPKYRhQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F1%2AS-baYaxkztVV7znKPKYRhQ.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Minimum difference in hours between the same songs broadcasted on same day*&lt;/center&gt;

&lt;p&gt;Again, the most mainstream radios, NRJ, Skyrock and Virgin tend to broadcast the same song most often 2/3 hours since it was first aired. Nova's value is more about 7/8 hours.&lt;/p&gt;

&lt;p&gt;While we have different distribution, the average for our 4 radios is between 7 and 8 hours.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AVe3UMz8tRiDVmVnca4vVUw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AVe3UMz8tRiDVmVnca4vVUw.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  How many new songs* are added and when?
&lt;/h4&gt;

&lt;p&gt;*“New songs” means songs that are not yet broadcasted in 2016.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A5PC0QuOvyDTStuoDK7zzXw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A5PC0QuOvyDTStuoDK7zzXw.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*New songs by month distribution*&lt;/center&gt;

&lt;p&gt;If we look at the average after April 2016, we see that's Nova is ahead, but don't forget Nova plays 2500 different songs each month, so it's normal statistically speaking&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2ANZfug_TEvZQhUR3zargkmA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2ANZfug_TEvZQhUR3zargkmA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Average new songs by month*&lt;/center&gt;

&lt;p&gt;New songs are distributed equally along the week for all radios&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AhwwdZoj1GRNnWuW7LokE7w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AhwwdZoj1GRNnWuW7LokE7w.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*New songs per weekday*&lt;/center&gt;

&lt;h3&gt;
  
  
  Common songs between radio stations
&lt;/h3&gt;

&lt;p&gt;On the table below, we can see NRJ has 25% of common songs with Virgin and 12% with Skyrock. &lt;/p&gt;

&lt;p&gt;Virgin has 18% with NRJ while Skyrock has 9% of common songs with NRJ.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2Aw1sB-Z8zzoP19Lsg2V-U8A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2Aw1sB-Z8zzoP19Lsg2V-U8A.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Number of same songs broadcasted between radios*&lt;/center&gt;

&lt;p&gt;Nova has a few similar songs with the others radio, there are mostly legendary artists such as Bob Marley, Daft Punk, Aloe Blacc, Kavinsky, BeyoncÃ©… If you are interested by the full list look for the “Similar songs between radios” cell in &lt;a href="https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6937750999095841/1807085967979471/6197123402747553/latest.html" rel="noopener noreferrer"&gt;the “backstage” AKA the blog article's Databricks notebook&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our 4 radio stations are different, for sure, but do they have common songs between them? Surprisingly the answer is yes&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://open.spotify.com/track/28S2K2KIvnZ9H6AyhRtenm" rel="noopener noreferrer"&gt;Prince – Kiss&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://open.spotify.com/track/255uSDEuvkWp1QyYnm82VJ" rel="noopener noreferrer"&gt;C2C – Happy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://open.spotify.com/track/4z70Px77quweOupQRiaG2Q" rel="noopener noreferrer"&gt;Stromae – Formidable&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I would classify these songs as songs that everybody likes, &lt;em&gt;you can play them at your party without any stress of being booed&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If we use a visualization for our previous table it will look like this, the blue bar is the similar songs, the orange and the green bar are the total of different songs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AXnAgFeqhykf0mhYurSKXJg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AXnAgFeqhykf0mhYurSKXJg.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Similar songs*&lt;/center&gt;

&lt;h3&gt;
  
  
  What are the secrets to be #1?
&lt;/h3&gt;

&lt;p&gt;We have analyzed 4 radio stations based on the artist name, the title name and the day and time the songs were broadcasted. Beside letters and numbers, these 3 values mean nothing, if we want to make a deeper analyze we have to learn more about the songs played: how popular is the song right now? what is the genre of the song? How many followers does the artist have?&lt;/p&gt;

&lt;p&gt;Hopefully, by connecting each song to the Spotify API we will get a lot of data we can play with :&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://api.spotify.com/v1/search?q=$encodedQuery&amp;amp;type=track&amp;amp;limit=1" rel="noopener noreferrer"&gt;https://api.spotify.com/v1/search?q=ARTISTTITLE&amp;amp;type=track&amp;amp;limit=1&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In 2016, we have collected 8000 different songs from the radios, so to get the artist, the track and the tracks' audio features from the Spotify API we have to make : &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Number of songs x (&lt;a href="https://developer.spotify.com/web-api/console/get-artist/" rel="noopener noreferrer"&gt;Artist&lt;/a&gt; + &lt;a href="https://developer.spotify.com/web-api/console/get-track/" rel="noopener noreferrer"&gt;track&lt;/a&gt; + &lt;a href="https://developer.spotify.com/web-api/console/get-audio-features-track/" rel="noopener noreferrer"&gt;audiofeatures&lt;/a&gt;) = 24K requests&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F0%2Aj0EeLA19ElJmN_ae.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F0%2Aj0EeLA19ElJmN_ae.jpg"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*From [HTTP STATUS DOGS](https://httpstatusdogs.com/)*&lt;/center&gt;

&lt;p&gt;That's a lot. Plus, Spotify has a limit of request in time, so we have got to do it slowly, 20 request every 2 seconds, &lt;em&gt;why not you know.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;BUT, with this slow rate one thing I didn't plan is we could see the number of followers change when we requested a song's artist, as most artists have multiple songs been broadcasted, the artist information was asked from twice to 10 times. &lt;em&gt;No problemo, right?&lt;/em&gt; No…This will mess up our SQL join between artist and track data later just because the DISTINCT on artists information &lt;a href="https://www.youtube.com/watch?v=1IDF-8khS3w" rel="noopener noreferrer"&gt;were fake&lt;/a&gt; due to &lt;code&gt;followers.total&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I have to say this led me to craziness, because I had more songs after my join than before haha&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AUlBp9dA-JJyY3aJ-klrsBg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AUlBp9dA-JJyY3aJ-klrsBg.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Spotify API Stats from
[https://developer.spotify.com/my-applications](https://developer.spotify.com/my-applications)*&lt;/center&gt;

&lt;h3&gt;
  
  
  Songs Popularity By Radio
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Definition by Spotify
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AvVSuuElWOqbngrTlj4SNdw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AvVSuuElWOqbngrTlj4SNdw.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Percentage of songs popularity distribution by radio*&lt;/center&gt;

&lt;p&gt;No surprises here, mainstream radios NRJ, Virgin or Skyrock, tend to play more popular songs, &lt;em&gt;that's why I use the term mainstream, clever, right?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AFwnhWdDxbYIQ_YhIjkLspA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AFwnhWdDxbYIQ_YhIjkLspA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Popularity average*&lt;/center&gt;

&lt;p&gt;&lt;em&gt;But the real question is : was the song popular before it was broadcasted on the radio?&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Audio features
&lt;/h3&gt;

&lt;p&gt;The Spotify API gives &lt;a href="https://developer.spotify.com/web-api/get-audio-features/#tablepress-215" rel="noopener noreferrer"&gt;audio features extracted from the Song's soundwaves&lt;/a&gt;, thanks to these we can display a musical profile of each radio:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=pWdd6_ZxX8c" rel="noopener noreferrer"&gt;In my opinion&lt;/a&gt;, the most meaningful audio features are :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;danceability &lt;em&gt;describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;energy &lt;em&gt;is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;valence &lt;em&gt;describes the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's see their average and their distribution, &lt;em&gt;as a average alone can be sometimes misleading&lt;/em&gt;, among the radios' tracks. As Nova got more different songs than the others we are going use percentage to compare our radios to add more context to our stats.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/@polomarcus/my-facebook-interview-journey-5205e111155f#.r3w705wo3" rel="noopener noreferrer"&gt;If you have read my Facebook Interview Journey,&lt;/a&gt; you know this is where I failed during my SQL interview, this code is specially for you, dear Mr. Interviewer, no hard feelings though :p&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;subTotal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_radio&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;percentage_of_songs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;subTotal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_radio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;popularity&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;popularity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;radio&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;AudioFeatureArtistTrackRadios&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_radio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;radio&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;AudioFeatureArtistTrackRadios&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;radio&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;subTotal&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;subTotal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;radio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;radio&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;subTotal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_radio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;popularity&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;radio&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;popularity&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;center&gt;*Brilliant SQL code*&lt;/center&gt;

&lt;h4&gt;
  
  
  Energy
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AY20xDfW6m6x4PQ-gdtMouA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AY20xDfW6m6x4PQ-gdtMouA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Energy by radio distribution*&lt;/center&gt;

&lt;p&gt;Mainstream radios tend to play more energetic songs, &lt;em&gt;I guess there are more easy to listen to?&lt;/em&gt; Some example of song with a lot of energy are &lt;a href="https://open.spotify.com/track/6Unf6X6PC7KZtrRXH3dFHg" rel="noopener noreferrer"&gt;We Are Your Friends -  JUSTICE&lt;/a&gt;, &lt;a href="https://open.spotify.com/track/7yI4qKGEFHNId1B893XopS" rel="noopener noreferrer"&gt;Steppin' stone - Davy Jones&lt;/a&gt;, and of course, the classics from the classics &lt;a href="https://open.spotify.com/track/1bx7OUl2UmAnA5oZkm9If7" rel="noopener noreferrer"&gt;Jerk It Out – Caesars&lt;/a&gt;, I've first heard it while playing &lt;a href="https://www.youtube.com/watch?v=-0x3aZEPcMA" rel="noopener noreferrer"&gt;SSX3 on GameCube&lt;/a&gt; 8)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AIySsIzTaReMzd1OZ9PxXWw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AIySsIzTaReMzd1OZ9PxXWw.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Danceability
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AP4erZSV3quz3__oR2c24bA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AP4erZSV3quz3__oR2c24bA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This chart tells us both radio broadcasts the same kind of danceable songs. Some example of danceable song are &lt;a href="https://open.spotify.com/track/3aImJnJlAgcE7bJ1NxthGt" rel="noopener noreferrer"&gt;Trick Me – Kelis&lt;/a&gt;, &lt;a href="https://open.spotify.com/track/1pKYYY0dkg23sQQXi0Q5zN" rel="noopener noreferrer"&gt;Around the world – Daft Punk&lt;/a&gt; or &lt;a href="https://www.youtube.com/watch?v=LDZX4ooRsWs" rel="noopener noreferrer"&gt;Anaconda – Nicki Mina&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AJpPrbAzb0U13iMZK7PihXQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AJpPrbAzb0U13iMZK7PihXQ.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Danceability average*&lt;/center&gt;

&lt;h4&gt;
  
  
  Valence / Positiveness
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AujgMyDVxmhj2YlgfIM11HA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AujgMyDVxmhj2YlgfIM11HA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Valence / Positiveness By Radio distribution*&lt;/center&gt;

&lt;p&gt;&lt;a href="" class="article-body-image-wrapper"&gt;&lt;img&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same as the danceability, both radio broadcasts the same kind of positive tracks. Some examples : &lt;a href="https://open.spotify.com/track/5nNmj1cLH3r4aA4XDJ2bgY" rel="noopener noreferrer"&gt;September – Earth Wind &amp;amp; Fire&lt;/a&gt;,&lt;a href="https://open.spotify.com/track/4BLu47sbjr3aJZwxZujgXT" rel="noopener noreferrer"&gt;Ska-Boo-Da-Ba – The Skatalites&lt;/a&gt; or &lt;a href="https://open.spotify.com/track/5WQ1hIc5d2EVbRQ8qsj8Uh" rel="noopener noreferrer"&gt;Hey Ya! – OutKast&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AJaa8mmCC84LsxskMJjbuNg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AJaa8mmCC84LsxskMJjbuNg.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Valence / Positiveness average by radio*&lt;/center&gt;

&lt;p&gt;2 others interesting data, which are not Spotify (&lt;a href="http://the.echonest.com/" rel="noopener noreferrer"&gt;Echo Nest&lt;/a&gt;) specific, are the BPM (beats per minute) and the songs duration&lt;/p&gt;

&lt;h4&gt;
  
  
  Tempo / Beats per minute
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AXkq3Q3yjJaNowJkhzXP1FA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AXkq3Q3yjJaNowJkhzXP1FA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Duration
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A67Kr4JwQOCrc2u6Obz3nMg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A67Kr4JwQOCrc2u6Obz3nMg.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Duration distribution*&lt;/center&gt;

&lt;p&gt;Nova seems to be a bit different from the other radios by playing shorter or longer tracks. Virgin, NRJ and Skyrock are really into 3-minute tracks.&lt;/p&gt;

&lt;p&gt;When I first saw this graph, I couldn't help myself to think about this &lt;a href="https://open.spotify.com/track/1DhpyURGQ8gAQCYo8dOLQo" rel="noopener noreferrer"&gt;Hocus Pocus' song called “Voyage immobile&lt;/a&gt;” (motionless journey) and this  sentence about our undiversified musical environment :&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Je ne voyais que blocs longs de 3 minutes taillÃ© dans le roc et dans le mÃªme but”&lt;/p&gt;

&lt;p&gt;“I could only see 3-minute blocks from the same base with the same goal”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2Afpld7Zi9lOJDPKWDdrcFpw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2Afpld7Zi9lOJDPKWDdrcFpw.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*The duration average in minute by radio*&lt;/center&gt;

&lt;h3&gt;
  
  
  Music genres
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F1%2AsgT6naW21Wu0M7sYaL2qMQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F1%2AsgT6naW21Wu0M7sYaL2qMQ.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Spotify got some pretty weird music genres, have you noticed “post-teen pop”, “pop christmas”, &lt;em&gt;pop songs you listen during christmas I guess? haha&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We can clearly see that NRJ and Virgin, &lt;em&gt;which are very alike&lt;/em&gt;, are more about pop/dance/electro music, their top 3 genres are : pop, dance pop and tropical house. Nova is about soul, funk and indie music, and Skyrock is more about rap, dance and pop&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2A_5E7highTXSqJ8rS7oFI0Q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2A_5E7highTXSqJ8rS7oFI0Q.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Number of different music genres by radio*&lt;/center&gt;

&lt;h4&gt;
  
  
  Hip hop genres
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F0%2A58-RGcDqjhx8XLD3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F0%2A58-RGcDqjhx8XLD3.jpg"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*First on rap*&lt;/center&gt;

&lt;p&gt;Skyrock is famous for its motto “1st on Rap”, let's compare Hip hop/Rap genres (genres with “rap”, “hip” or “hop” inside the name) with the others radios.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AOGs_m824iH2tMza0ziY8JQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AOGs_m824iH2tMza0ziY8JQ.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Number of hip hop, rap songs by radio*&lt;/center&gt;

&lt;p&gt;OK, that's a close match between Skyrock and Nova, let's compare the internal hip hop genres now.&lt;/p&gt;

&lt;p&gt;I don't really care about genres, but there are a lot of confusion between Hip hop, which is a culture, and rap, which is the actual fact of rapping, if you want to learn more check this &lt;a href="https://en.wikipedia.org/wiki/Hip_hop#Culture" rel="noopener noreferrer"&gt;Wikipedia Chapter&lt;/a&gt;, I also recommend the excellent &lt;a href="https://www.netflix.com/title/80141782" rel="noopener noreferrer"&gt;Netflix's documentary “Evolution of Hip Hop”&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AAsTuELi8z5QUmnG25FfUmA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2AAsTuELi8z5QUmnG25FfUmA.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;*Genres with Hip or Hop or Rap by Radios*&lt;/center&gt;

&lt;p&gt;Nova, in orange, is more about indie/alternative/undergroup hip hop music, and Skyrock, in blue, is really more into French rap/trap/hiphop and also popular rap. So let's fix Skyrock's motto by “1st on French rap” haha&lt;/p&gt;

&lt;h3&gt;
  
  
  Music classifier for Radios' selection idea
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://medium.com/@polomarcus/music-recommendation-service-with-the-spotify-api-spark-mllib-and-databricks-7cde9b16d35d" rel="noopener noreferrer"&gt;In my last article&lt;/a&gt;, I explained how to create your own music recommendation system thanks to these audio features.&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://en.coursera.org/learn/progfun1" rel="noopener noreferrer"&gt;fun project&lt;/a&gt; (&lt;em&gt;the link is a tribute to the Scala Guru &lt;a href="https://medium.com/u/9a80872a98bb" rel="noopener noreferrer"&gt;Martin Odersky&lt;/a&gt;, he tends to say too many times that his Scala exercises are fun whereas they are brain melt haha&lt;/em&gt;) would be to create an algorithm that will help music selectors to find radios style's songs.&lt;/p&gt;

&lt;h4&gt;
  
  
  Spotify recommendation system
&lt;/h4&gt;

&lt;p&gt;Spotify's system is not only based on the  audio features we saw earlier. It also analyzes what others similar users listen to. This slide contains a &lt;a href="http://www.slideshare.net/sinisalyh/scala-data-pipelines-spotify/5-Recommendation_systems" rel="noopener noreferrer"&gt;nice schema that explains their whole system.&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What's next?
&lt;/h3&gt;

&lt;p&gt;Thanks to this project I have built solid foundations to query the Spotify API in Scala, process it thanks to Spark SQL, and visualize it thanks to Databricks. I think more projects are about to come, plus Spotify has just released, March 2017, this new endpoint &lt;a href="https://developer.spotify.com/news-stories/2017/03/01/new-endpoint-recently-played-tracks/" rel="noopener noreferrer"&gt;“Recently Played Tracks”&lt;/a&gt; and ideas are coming.&lt;/p&gt;

&lt;h3&gt;
  
  
  Databricks pros and cons
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Pros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Free community edition with 6Go RAM server&lt;/li&gt;
&lt;li&gt;Awesome and easy-to-use &lt;a href="https://docs.databricks.com/user-guide/visualizations/index.html" rel="noopener noreferrer"&gt;Data Viz&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons (or more, what can be better)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Can only visualize a maximum of 10 elements when using a GROUP BY, the others elements go to one category called “Others”&lt;/li&gt;
&lt;li&gt;Not possible to choose the color of an entity, so a Radio can be blue on a graph and red on another, it can be sometimes confusing&lt;/li&gt;
&lt;li&gt;Cannot export graph as iframe, so we have to export pictures from the interactive graphs&lt;/li&gt;
&lt;li&gt;Cannot modify SQL on the Data Viz interface&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Thanks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Databricks&lt;/strong&gt;, for their awesome platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spotify&lt;/strong&gt;, for their easy-to-use API and their &lt;strong&gt;human-readable&lt;/strong&gt; documentation&lt;/li&gt;
&lt;li&gt;
&lt;a href="http://www.novaplanet.com/" rel="noopener noreferrer"&gt;Radio Nova&lt;/a&gt; for being a top music &lt;a href="http://www.urbandictionary.com/define.php?term=selecta" rel="noopener noreferrer"&gt;selecta&lt;/a&gt;, I would not listen to the same music that I listen to today without you&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://twitter.com/m_hlimi" rel="noopener noreferrer"&gt;Marc H'LIMI&lt;/a&gt;, Radio Nova's advisor, for our exchanges&lt;/li&gt;
&lt;li&gt;Pierre Trussart, engineer and DJ, &lt;a href="https://twitter.com/b_thuillier" rel="noopener noreferrer"&gt;Benjamin Thuillier&lt;/a&gt;, scala rockstar, Nicolas Duforet, data science master, Justine Mouron, engineer&lt;/li&gt;
&lt;li&gt;My friends for hearing me talking about this project too often&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Bonus – Spotify Playlists
&lt;/h3&gt;

&lt;p&gt;To thank for reading, I created 4 playlists of the most ~200 songs broadcasted sorted by the number of broadcast for :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://open.spotify.com/user/cpolos/playlist/64DWZ46dFb50FaWs9eALIu" rel="noopener noreferrer"&gt;Nova with Calipso Rose, Brisa RochÃ©, Kaytranada, The Roots, M.I.A, The Virgins…&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://open.spotify.com/user/cpolos/playlist/0x2SJgvfcBzxfiMTDkTuC4" rel="noopener noreferrer"&gt;Skyrock with Drake, Alonzo, Major Lazer, Timberlake, Soprano, PNL, Jul&lt;/a&gt;…&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://open.spotify.com/user/cpolos/playlist/3ySBdFmKboUlMWfqSpLZio" rel="noopener noreferrer"&gt;Virgin with Imany, Twenty One Pilots, Sia, Kungs, Julian Perretta&lt;/a&gt;…&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://open.spotify.com/user/cpolos/playlist/7ECHPNZ5fjLggN2rg0poK3" rel="noopener noreferrer"&gt;NRJ with Enrique Iglesias, Soprano, Coldplay, Kungs, Amir, MHD, Tal&lt;/a&gt;…&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>spark</category>
      <category>databricks</category>
      <category>spotify</category>
      <category>sql</category>
    </item>
    <item>
      <title>Bonjour, I'm Paul Leclercq</title>
      <dc:creator>Paul Leclercq</dc:creator>
      <pubDate>Thu, 16 Mar 2017 23:25:32 +0000</pubDate>
      <link>https://forem.com/paulleclercq/bonjour-im-paul-leclercq</link>
      <guid>https://forem.com/paulleclercq/bonjour-im-paul-leclercq</guid>
      <description>&lt;p&gt;I have been coding for 8 years.&lt;/p&gt;

&lt;p&gt;You can find me on Twitter as &lt;a href="https://twitter.com/polomarcus" rel="noopener noreferrer"&gt;@polomarcus&lt;/a&gt; or Medium as ... &lt;a href="https://medium.com/@polomarcus" rel="noopener noreferrer"&gt;@polomarcus&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I live in &lt;a href="https://www.theguardian.com/cities/2017/mar/13/montpellier-spotlight-development-mania-france-fastest-growing-city" rel="noopener noreferrer"&gt;Montpellier, sunny South of France&lt;/a&gt;. But I am currenly looking for a data position in North America.&lt;/p&gt;

&lt;p&gt;I mostly program in these languages: Scala (with Spark), SQL, or JS&lt;/p&gt;

&lt;p&gt;I am currently learning more about Statistics and Data science.&lt;/p&gt;

&lt;p&gt;Fun fact: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I've created the #1 &lt;a href="http://wake-me-up.co/" rel="noopener noreferrer"&gt;online alarm clock&lt;/a&gt; in France when I was a student&lt;/li&gt;
&lt;li&gt;the "cq" in my family name is silent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nice to meet you :)&lt;/p&gt;

</description>
      <category>introduction</category>
    </item>
  </channel>
</rss>
