<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Vicki Boykis</title>
    <description>The latest articles on Forem by Vicki Boykis (@vboykis).</description>
    <link>https://forem.com/vboykis</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F16820%2F63fdca06-047e-44a2-a9bc-5f303d810dd2.jpeg</url>
      <title>Forem: Vicki Boykis</title>
      <link>https://forem.com/vboykis</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/vboykis"/>
    <language>en</language>
    <item>
      <title>Building a Twitter bot with Python, AWS, and art</title>
      <dc:creator>Vicki Boykis</dc:creator>
      <pubDate>Mon, 19 Feb 2018 18:45:18 +0000</pubDate>
      <link>https://forem.com/vboykis/building-a-twitter-art-bot-with-python-aws-and-art--74p</link>
      <guid>https://forem.com/vboykis/building-a-twitter-art-bot-with-python-aws-and-art--74p</guid>
      <description>&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--v8rYyxep--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/media/DWEoQDrXUAAq1jQ.jpg" alt="unknown tweet media content"&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--2FoBIS-_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/961438671365812224/W1Z2Eg1G_normal.jpg" alt="SovietArtBot profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        SovietArtBot
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @sovietartbot
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      "Young Naturalists"&lt;br&gt;Sergiy Grigoriev, 1948 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      11:16 AM - 15 Feb 2018
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=964096054089134081" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=964096054089134081" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=964096054089134081" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;p&gt;Original post &lt;a href="http://veekaybee.github.io/2018/02/19/creating-a-twitter-art-bot/"&gt;here on my blog.&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLDR:&lt;/strong&gt; I built a Twitter bot, &lt;a href="https://twitter.com/SovietArtBot"&gt;@SovietArtBot&lt;/a&gt; that tweets paintings from the WikiArt socialist realism category every 6 hours using Python and AWS Lambdas. Check out the bot's &lt;a href="https://veekaybee.github.io/soviet-art-bot/"&gt;website and code here&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The post outlines why I decided to do that, architecture decisions I made, technical details on how the bot works, and my next steps for the bot.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table of Contents&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Why build an art bot?

&lt;ul&gt;
&lt;li&gt;Technical Goals&lt;/li&gt;
&lt;li&gt;Personal Goals&lt;/li&gt;
&lt;li&gt;Why Socialist Realism&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Breaking a Project into Chunks&lt;/li&gt;
&lt;li&gt;Requirements and Design: High-Level Bot Architecture&lt;/li&gt;
&lt;li&gt;Development: Pulling Paintings from WikiArt&lt;/li&gt;
&lt;li&gt;Development: Processing Paintings and Metadata Locally&lt;/li&gt;
&lt;li&gt;
Development: Using S3 and Lambdas

&lt;ul&gt;
&lt;li&gt;Development: Scheduling the Lambda&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Deployment: Bot Tweets!&lt;/li&gt;
&lt;li&gt;
Where to Next?

&lt;ul&gt;
&lt;li&gt;Testing and Maintenance&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h1&gt;
  
  
  Why build an art bot?
&lt;/h1&gt;

&lt;p&gt;Often when you're starting out as a data scientist or developer, people will give you the well-intentioned advice of "just picking a project and doing it" as a way of learning the skills you need. &lt;/p&gt;

&lt;p&gt;That advice can be hard and vague, particularly when you don't have a lot of experience to draw from to figure out what's even feasible given how much you know, and how that whole process should work.&lt;/p&gt;

&lt;p&gt;By writing out my process in detail, I'm hoping it helps more people understand:&lt;/p&gt;

&lt;p&gt;1) The steps of a software project from beginning to end. &lt;/p&gt;

&lt;p&gt;2) The process of putting out a mininum viable project that's "good enough" and iterating over your existing code to add features.&lt;/p&gt;

&lt;p&gt;3) Picking a project that you're going to enjoy working on.&lt;/p&gt;

&lt;p&gt;4) The joy of socialist realism art.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Goals
&lt;/h2&gt;

&lt;p&gt;I've been doing more software development as part of my data science workflows lately, and I've found that: &lt;/p&gt;

&lt;p&gt;1) I really enjoy doing both the analytical and development pieces of a data science project.&lt;br&gt;
2) The more development skills a data scientist is familiar with, the more valuable they are because it ultimately means they can prototype production workflows, and push their models into production quicker than having to wait for a data engineer.  &lt;/p&gt;

&lt;p&gt;A goal I've had recently is being able to take a full software development project from end-to-end, focusing on understanding modern production best practices, particularly in the cloud. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TwOXl7QM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/squadgoals.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TwOXl7QM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/squadgoals.png" alt="high-level"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;
&lt;h2&gt;
  
  
  Personal Goals
&lt;/h2&gt;

&lt;p&gt;But, a project that's just about "cloud architecture delivery" is really boring. In fact, I fell asleep just reading that last sentence. When I do a project, it has to have an interesting, concrete goal.&lt;/p&gt;

&lt;p&gt;To that end, I've been extremely interested in Twitter as a development platform. I wrote recently that one of the most important ways we can fix the internet is to &lt;a href="http://blog.vickiboykis.com/2016/11/20/fix-the-internet/"&gt;get off Twitter. &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Easier said than done, because Twitter is still one of my favorite places on the internet. It's where I get most of my news, where I find out about new blog posts, engage in &lt;a href="https://twitter.com/vboykis/status/963098810695258117"&gt;discussions about data science&lt;/a&gt;, and a place where &lt;a href="https://twitter.com/vboykis/status/959493939555422208"&gt;I've made a lot of friends&lt;/a&gt; that I've met in real life. &lt;/p&gt;

&lt;p&gt;But, Twitter is extremely noisy, lately to the point of being toxic. There are systemic ways that Twitter can take care of this problem, but I decided to try to tackle this problem this on my own by starting &lt;a href="https://twitter.com/search?q=%23devart&amp;amp;src=typd"&gt;#devart&lt;/a&gt;, a hashtag where people post classical works of art with their own tech-related captions to break up stressful content.&lt;/p&gt;

&lt;p&gt;There's something extremely catharctic about being able to state a problem in technology well enough to ascribe a visual metaphor to it, then sharing it with other people who also appreciate that visual metaphor and find it funny and relatable. &lt;/p&gt;

&lt;p&gt;Liquid error: internal&lt;/p&gt;

&lt;p&gt;Liquid error: internal&lt;/p&gt;

&lt;p&gt;Liquid error: internal&lt;/p&gt;

&lt;p&gt;Liquid error: internal&lt;/p&gt;

&lt;p&gt;Liquid error: internal&lt;/p&gt;

&lt;p&gt;Liquid error: internal&lt;/p&gt;

&lt;p&gt;Liquid error: internal&lt;/p&gt;

&lt;p&gt;Liquid error: internal&lt;/p&gt;

&lt;p&gt;And, sometimes you just want to break up the angry monotony of text with art that moves you. Turns out I'm not the only one. &lt;/p&gt;

&lt;p&gt;Liquid error: internal&lt;/p&gt;

&lt;p&gt;As I posted more #devart, I realized that I enjoyed looking at the source art almost as much as figuring out a caption, and that I enjoyed accounts like &lt;a href="https://twitter.com/archillect?lang=en"&gt;Archillect&lt;/a&gt;, &lt;a href="https://twitter.com/rabihalameddine?lang=en"&gt;Rabih Almeddine's&lt;/a&gt;, and &lt;a href="https://twitter.com/sovietvisuals?lang=en"&gt;Soviet Visuals&lt;/a&gt;, who all tweet a lot of beautiful visual content with at least some level of explanation. &lt;/p&gt;

&lt;p&gt;I decided I wanted to build a bot that tweets out paintings. Particularly, I was interested in socialist realism artworks. &lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;
&lt;h2&gt;
  
  
  Why Socialist Realism
&lt;/h2&gt;

&lt;p&gt;Socialist realism is an artform that was developed after the Russian Revolution. As the Russian monarchy fell, social boundaries dissovled,and people began experimenting with all kinds of new art forms, including futurism and &lt;a href="http://www.wassilykandinsky.net/"&gt;abstractionism.&lt;/a&gt;  I've previously written &lt;a href="http://blog.vickiboykis.com/2015/06/reddit-was-amazing/"&gt;about this shift here.&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;As the Bolsheviks consolidated power, they established &lt;a href="https://en.wikipedia.org/wiki/People%27s_Commissariat_for_Education"&gt;Narkompros&lt;/a&gt;, a body to control the education and cultrual values of what they deemend was acceptable under the new regime, and the government laid out the new criteria for what was accetable Soviet art.&lt;/p&gt;

&lt;p&gt;In looking at socialist realism art, it's obvious that the underlying goal is to promote communism. But, just because the works are blatant propaganda doesn't discount what I love about the genre, which is that it is indeed representative of what real people do in real life.  &lt;/p&gt;

&lt;p&gt;Liquid error: internal&lt;/p&gt;

&lt;p&gt;These are people working, sleeping, laughing, frowning, arguing, and showing real emotion we don't often see in art. They are relatable and humane, and reflect our humanity back to us. What I also strongly love about this genre of art is that women are depicted doing things other than sitting still to meet the artist's gaze. &lt;/p&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--VfiUNxNV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/media/DLadjHVW0AAE4du.jpg" alt="unknown tweet media content"&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--RiOhXAdr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/463852826435674115/yjewjpOI_normal.jpeg" alt="Vicki Boykis profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Vicki Boykis
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        &lt;a class="mentioned-user" href="https://dev.to/vboykis"&gt;@vboykis&lt;/a&gt;

      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      "Young, idealistic data scientists harvesting their first models for pickling"&lt;br&gt;Tetyana Yablonska, 1966 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      00:09 AM - 06 Oct 2017
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=916092943575986176" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=916092943575986176" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=916092943575986176" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;p&gt;So, what I decided is that I'd make a Twitter bot that tweets out one work every couple of hours. &lt;/p&gt;

&lt;p&gt;Here's the final result: &lt;/p&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--v8rYyxep--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/media/DWEoQDrXUAAq1jQ.jpg" alt="unknown tweet media content"&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--2FoBIS-_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/961438671365812224/W1Z2Eg1G_normal.jpg" alt="SovietArtBot profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        SovietArtBot
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @sovietartbot
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      "Young Naturalists"&lt;br&gt;Sergiy Grigoriev, 1948 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      11:16 AM - 15 Feb 2018
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=964096054089134081" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=964096054089134081" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=964096054089134081" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;p&gt;There are several steps in traditional software development: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Requirements&lt;/li&gt;
&lt;li&gt;Design&lt;/li&gt;
&lt;li&gt;Development&lt;/li&gt;
&lt;li&gt;Testing&lt;/li&gt;
&lt;li&gt;Deployment&lt;/li&gt;
&lt;li&gt;Maintenance&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h1&gt;
  
  
  Breaking a Project into Chunks
&lt;/h1&gt;

&lt;p&gt;This is a LOT to take in. When I first started, I made a list of everything that needed to be done: setting up AWS credentials, roles, and permissions, version control, writing the actual code, learning how to download images with requests, how to make the bot tweet on a schedule, and more. &lt;/p&gt;

&lt;p&gt;When you look at it from the top-down, it's overwhelming. But in &lt;a href="https://www.brainpickings.org/2013/11/22/bird-by-bird-anne-lamott/"&gt;"Bird by Bird,"&lt;/a&gt; one of my absolute favorite books that's about the writing processs (but really about any creative process) Anne Lamott writes, &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Thirty years ago my older brother, who was ten years old at the time, was trying to get a report on birds written that he’d had three months to write, which was due the next day. We were out at our family cabin in Bolinas, and he was at the kitchen table close to tears, surrounded by binder paper and pencils and unopened books on birds, immobilized by the hugeness of the task ahead. Then my father sat down beside him, put his arm around my brother’s shoulder, and said, “Bird by bird, buddy. Just take it bird by bird.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And that's how I view software development, too. One thing at a time, until you finish that, and then move on to the next piece. So, with that in mind, I decided I'd use a mix of the steps above from the traditional waterfall approach and mix them with the agile concept of making a lot of small, quick cycles of those steps to get closer to the end result. &lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h1&gt;
  
  
  Requirements and Design: High-Level Bot Architecture
&lt;/h1&gt;

&lt;p&gt;I started building the app by  working backwards from what my requirements:&lt;/p&gt;

&lt;p&gt;a bot on Twitter, pulling painting images and metadata from some kind of database, on a timed schedule, either &lt;a href="https://help.ubuntu.com/community/CronHowto"&gt;cron&lt;/a&gt; or something similar. &lt;/p&gt;

&lt;p&gt;This helped me figure out the design. Since I would be posting to Twitter as my last step, it made sense to have the data already some place in the cloud. I also knew I'd eventually want to incorporate AWS because I didn't want the code and data to be dependent on my local machine being on. &lt;/p&gt;

&lt;p&gt;I knew that I'd also need version control and continuous integration to make sure the bot was stable both on my local machine as I was developing it, and on AWS as I pushed my code through, and so I didn't have to manually put to code in the AWS console. &lt;/p&gt;

&lt;p&gt;Finally, I knew I'd be using Python, because &lt;a href="http://veekaybee.github.io/2017/09/26/python-packaging/"&gt;I like Python&lt;/a&gt;, and also because it has good hooks into Twitter through the Twython API (thanks to Timo for pointing me to &lt;a href="https://twitter.com/tkoola/status/963480840574590978"&gt;Twython&lt;/a&gt; over Tweepy, which is deprecated) and AWS through the &lt;a href="https://twitter.com/vboykis/status/963224376056434688"&gt;Boto library&lt;/a&gt;. &lt;br&gt;
I'd start by getting the paintings and metadata about the paintings from a website that had a lot of good socialist realism paintings not bound by copyright. Then, I'd do something to those paintings to get both the name, the painter, and title so I could tweet all of that out. Then, I'd do the rest of the work in AWS. &lt;/p&gt;

&lt;p&gt;So my high-level flow went something like this: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TyTKU4Ir--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/high-level-flow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TyTKU4Ir--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/high-level-flow.png" alt="high-level"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Eventually, I'd refactor out the dependency on my local machine entirely and push everything to S3, but I didn't want to spend any money in AWS before I figured out what kind of metadata the JSON returned. &lt;/p&gt;

&lt;p&gt;Beyond that, I didn't have a specific idea of the tools I'd need, and made design and architecture choices as my intermediate goals became clearer to me. &lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;
&lt;h1&gt;
  
  
  Development: Pulling Paintings from WikiArt
&lt;/h1&gt;

&lt;p&gt;Now, the development work began. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.wikiart.org/"&gt;WikiArt&lt;/a&gt; has an amazing, well-catalogued collection of artworks in every genre you can think of. It's so well-done that &lt;a href="http://www.lamsade.dauphine.fr/~bnegrevergne/webpage/documents/2017_rasta.pdf"&gt;some researchers use&lt;/a&gt; the catalog &lt;a href="http://cs231n.stanford.edu/reports/2017/pdfs/406.pdf"&gt;for their papers&lt;/a&gt; on &lt;a href="https://arxiv.org/pdf/1605.09612.pdf"&gt;deep learning&lt;/a&gt;, as well.  &lt;/p&gt;

&lt;p&gt;Some days, I go just to browse what's new and get lost in some art. (Please &lt;a href="https://www.wikiart.org/en/donate"&gt;donate&lt;/a&gt; to them if you enjoy them.)&lt;/p&gt;

&lt;p&gt;WikiArt also has two aspects that were important to the project: &lt;/p&gt;

&lt;p&gt;1) They have &lt;a href="https://www.wikiart.org/en/paintings-by-style/socialist-realism?select=featured"&gt;an explicit category&lt;/a&gt; for socialist realism art with a good number of works. 500 works in the socialist realism perspective, which was not a large amount (if I wanted to tweet more than one image a day), but good enough to start with. &lt;/p&gt;

&lt;p&gt;2) Every work has an image, title, artist, and year,which would be important for properly crediting it on Twitter. &lt;/p&gt;

&lt;p&gt;My first step was to see if there was a way to acces the site through an API, the most common way to pull any kind of content from websites programmatically these days. The problem with WikiArt is that it technically doesn't have a readily-available public API,so people have resorted to &lt;a href="https://github.com/lucasdavid/wikiart"&gt;really creative ways&lt;/a&gt; of scraping the site. &lt;/p&gt;

&lt;p&gt;But, I really, really didn't want to scrape, especially because the site has infinite scroll Javascript elements, which are annoying to pick up in &lt;a href="https://www.crummy.com/software/BeautifulSoup/"&gt;BeautifulSoup&lt;/a&gt;, the tool most people use for scraping in Python.&lt;/p&gt;

&lt;p&gt;So I did some sleuthing, and found that &lt;a href="https://docs.google.com/document/d/1Vxi5lQnMCA21dvNm_7JVd6nQkDS3whV3YjRjbwWPfQU/edit#!"&gt;WikiArt does have an API&lt;/a&gt;, even if it's not official and, at this point, somewhat out of date. &lt;/p&gt;

&lt;p&gt;It had some important information on API rate limits, which tells us how often you can access the API without the site getting angry and kicking out out: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;API calls: 10 requests per 2.5 seconds&lt;/p&gt;

&lt;p&gt;Images downloading: 20 requests per second&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and,even more importantly, on how to access a specific category through JSON-based &lt;a href="https://en.wikipedia.org/wiki/Query_string"&gt;query parameters&lt;/a&gt;. The documentation they had, though, was mostly at the artist level:&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;http://www.wikiart.org/en/salvador-dali/by-style/Neoclassicism&amp;amp;json=2&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;so I had to do some trial and error to figure out the correct link I wanted, which was:&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;https://www.wikiart.org/en/paintings-by-style/socialist-realism?json=2&amp;amp;page=1&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;And with that, I was ready to pull the data.&lt;/p&gt;

&lt;p&gt;I started by using the Python &lt;a href="http://docs.python-requests.org/en/master/"&gt;Requests library&lt;/a&gt; to connect to the site and pull two things: &lt;/p&gt;

&lt;p&gt;1) A JSON file that has all the metadata &lt;br&gt;
2) All of the actual paintings as &lt;code&gt;png/jpg/jpeg&lt;/code&gt; files &lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;
&lt;h1&gt;
  
  
  Development: Processing Paintings and Metadata Locally
&lt;/h1&gt;

&lt;p&gt;The JSON I got back looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;ArtistsHtml:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;CanLoadMoreArtists:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Paintings:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Artists:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;AllArtistsCount:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;PaintingsHtml:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;PaintingsHtmlBeta:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;AllPaintingsCount:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;PageSize:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;TimeLog:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Within the paintings array, each painting looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"577271cfedc2cb3880c2de61"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Winter in Kursk"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1916"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"width"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;634&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"height"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;750&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"artistName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Aleksandr Deyneka"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://use2-uploads8.wikiart.org/images/aleksandr-deyneka/winter-in-kursk-1916.jpg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"map"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0123**67*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"paintingUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/en/aleksandr-deyneka/winter-in-kursk-1916"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"artistUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/en/aleksandr-deyneka"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"albums"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"flags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"images"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also downloaded all the image files by returning &lt;code&gt;response.raw&lt;/code&gt; from the JSON and using the &lt;code&gt;shutil.copyfileobj&lt;/code&gt; method. &lt;/p&gt;

&lt;p&gt;I decided not to do anymore processing locally since my goal was to eventually move everything to the cloud anyway, but I now had the files available to me for testing so that I didn't need to hit WikiArt and overload the website anymore. &lt;/p&gt;

&lt;p&gt;I then uploaded both the JSON and the image files to the same S3 bucket with the &lt;a href="https://github.com/boto/boto3"&gt;boto client&lt;/a&gt;, which lets you write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload_images_to_s3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterdir&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="s"&gt;'.png'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'.jpg'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'.jpeg'&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
            &lt;span class="n"&gt;full_file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"/"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;file_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BASE_BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;"put"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As an aside, the &lt;code&gt;.iterdir()&lt;/code&gt; method here is from the pretty great &lt;code&gt;pathlib&lt;/code&gt; library, new to Python 3, which handles file operations better than os. Check out more about it &lt;a href="https://github.com/arogozhnikov/python3_with_pleasure"&gt;here.&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h1&gt;
  
  
  Development: Using S3 and Lambdas
&lt;/h1&gt;

&lt;p&gt;Now that I had my files in S3, I needed some way for Twitter to read them. To do that at a regular time interval,  I decided on using an AWS Lambda function (not to be confused with Python lambda functions, a completely different animal.) Because I was already familiar with Lambdas and their capabilities  - &lt;a href="http://veekaybee.github.io/2018/01/28/working-with-aws/"&gt;see my previous post on AWS&lt;/a&gt; - , they were a tool I could use without a lot of ramp-up time (a key component of architectural decisions.)&lt;/p&gt;

&lt;p&gt;Lambdas are snippets of code that you can run without needing to know anything about the machine that runs them. They're triggered by other events firing in the AWS ecosystem. Or, they can be run on a &lt;a href="http://www.tothenew.com/blog/schedule-lambda-on-cron-expression-triggers/"&gt;cron-like schedule&lt;/a&gt;, which was perfect for what I wanted to do. This was exactly what I needed, since I needed to schedule the bot to post at an interval. &lt;/p&gt;

&lt;p&gt;Lambdas look &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/python-programming-model-handler-types.html"&gt;like this&lt;/a&gt; in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; 
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;some_value&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;event&lt;/code&gt; is what you decide to do to trigger the function and the context &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/python-context-object.html"&gt;sets up all the runtime information&lt;/a&gt; needed to interact with AWS and run the function.&lt;/p&gt;

&lt;p&gt;Because I wanted my bot to tweet both the artwork and some context around it, I'd need a way to tweet both the picture and the metadata, by matching the picture with the metadata. &lt;/p&gt;

&lt;p&gt;To do this, I'd need to create &lt;a href="https://en.wikipedia.org/wiki/Attribute%E2%80%93value_pair"&gt;key-value pairs&lt;/a&gt;, a common programming data model,  where the key was the filename part of the &lt;code&gt;image&lt;/code&gt; attribute, and the value was the &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;year&lt;/code&gt;, and &lt;code&gt;artistName&lt;/code&gt;, so that I could match the two, like this: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MX-3FVC_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/file_match.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MX-3FVC_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/file_match.png" alt="high-level"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, all in all,  I wanted my lambda function to do several things. All of that code I wrote for that section is &lt;a href="https://github.com/veekaybee/soviet-art-bot/blob/master/soviet_art_bot/lambda_function.py"&gt;here.&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;1) Open the S3 bucket object and inspect the contents of the metadata file &lt;/p&gt;

&lt;p&gt;Opening an S3 bucket within a lambda usually looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Records'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'s3'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'bucket'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'s3'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'object'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'key'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; 
        &lt;span class="n"&gt;download_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'/tmp/{}{}'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;download_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;download_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where the event is the JSON file that gets passed in from Lambda that signifies that a trigger has occurred. Since our trigger is a timed event, our &lt;a href="https://github.com/veekaybee/soviet-art-bot/blob/master/soviet_art_bot/event.json"&gt;JSON file&lt;/a&gt; doesn't have any information about that specific event and bucket, and we can exclude the event, in order to create a function that normally opens a given bucket and key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bucket_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;json_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Body'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'utf-8'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2) Pull out the metadata and pull it into a dictionary with the filename as the key and the metadata as the value. We can pull it into a &lt;code&gt;defaultdict&lt;/code&gt;, because those are ordered by default (all dictionaries &lt;a href="https://mail.python.org/pipermail/python-dev/2016-September/146327.html"&gt;will be orded&lt;/a&gt; as of 3.6, but we're still playing it safe here.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;indexed_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;json_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;artist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'artistName'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'title'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'year'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;artist&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# return only image name at end of URL
&lt;/span&gt;        &lt;span class="n"&gt;find_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'image'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;rfind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'/'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;img_suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'image'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;find_index&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
        &lt;span class="n"&gt;img_link&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img_suffix&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;indexed_json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;img_link&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;KeyError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;indexed_json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;img_link&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(By the way, a neat Python string utility that I didn't know before which really helped with the filename parsing was (rsplit)&lt;br&gt;
[&lt;a href="http://python-reference.readthedocs.io/en/latest/docs/str/rsplit.html"&gt;http://python-reference.readthedocs.io/en/latest/docs/str/rsplit.html&lt;/a&gt;]. )&lt;/p&gt;

&lt;p&gt;3) Pick a random filename to tweet  (&lt;code&gt;single_image_metadata = random.choice(list(indexed_json.items()))&lt;/code&gt;)&lt;/p&gt;

&lt;p&gt;4) Tweet the image and associated metadata&lt;/p&gt;

&lt;p&gt;There are a couple of Python libraries in use for Twitter. I initially started using Tweepy, but much to my sadness, I found out &lt;a href="https://github.com/tweepy/tweepy/issues/803."&gt;it was no longer being maintained.&lt;/a&gt; (Thanks for the tip, &lt;a href="https://twitter.com/tkoola/status/963480840574590978"&gt;Timo.&lt;/a&gt; )&lt;/p&gt;

&lt;p&gt;So I switched to &lt;a href="https://twython.readthedocs.io/en/latest/"&gt;Twython&lt;/a&gt;, which is a tad more convoluted, but is up-to-date. &lt;/p&gt;

&lt;p&gt;The final piece of code that actually ended up sending out the tweet is here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;twitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Twython&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CONSUMER_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CONSUMER_SECRET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ACCESS_TOKEN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ACCESS_SECRET&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

        &lt;span class="n"&gt;tmp_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gettempdir&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="c1"&gt;#clears out lambda dir from previous attempt, in case testing lambdas keeps previous lambda state
&lt;/span&gt;        &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'rm -rf /tmp/*'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shell&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
        &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;s3_resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket_name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;download_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"file moved to /tmp"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp_dir&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'rb'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Path"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;twit_resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;twitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;upload_media&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;media&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;twitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;%s&lt;/span&gt;&lt;span class="se"&gt;\"\n&lt;/span&gt;&lt;span class="s"&gt;%s, %s"&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;painter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;media_ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;twit_resp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'media_id'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;TwythonError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this does is take advantage of a Lambda's temp space: &lt;/p&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--RiOhXAdr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/463852826435674115/yjewjpOI_normal.jpeg" alt="Vicki Boykis profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Vicki Boykis
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        &lt;a class="mentioned-user" href="https://dev.to/vboykis"&gt;@vboykis&lt;/a&gt;

      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      TIL that AWS Lambda Functions have miniature file systems that you can use as temporary storage (&lt;a href="https://t.co/egCKwu6GJB"&gt;stackoverflow.com/questions/3564…&lt;/a&gt;).
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      21:03 PM - 03 Jan 2018
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=948661197011914757" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=948661197011914757" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=948661197011914757" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;p&gt;Pulls the file from S3 into the Lambda's &lt;code&gt;/tmp/&lt;/code&gt; folder, and matches it by filename with the metadata, which at this point is in key-value format. &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;twitter.upload_media&lt;/code&gt; method uploads the image and gets back a media id that is then passed into the &lt;code&gt;update_status&lt;/code&gt; method with the &lt;code&gt;twit_resp['media_id']&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;And that's it. The image and text are posted. &lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h2&gt;
  
  
  Development: Scheduling the Lambda
&lt;/h2&gt;

&lt;p&gt;The second part was configuring the function. to run on a schedule. Lambdas can be triggered by two things: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;An event occurring&lt;/li&gt;
&lt;li&gt;A timed schedule. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Events &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/invoking-lambda-function.html"&gt;can be anything&lt;/a&gt; from a file landing in an S3 bucket, to polling a Kinesis stream.  &lt;/p&gt;

&lt;p&gt;Scheduled events can be written either in cron, or at a fixed-rate. I started out writing &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html"&gt;cron rules&lt;/a&gt;, but since my bot didn't have any specific requirements, only that it needed to post every six hours, the fixed rate turned out to be enough for what I needed: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--D5eqI_Dn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/cron-rule.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--D5eqI_Dn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/cron-rule.png" alt="high-level"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, I needed to package the lambda for distribution. Lambdas run on Linux machines which don't have a lot of Python libraries pre-installed (other than boto3, the Amazon Python client library I used previously that connects the Lambda to other parts of the AWS ecosystem, and json. )&lt;/p&gt;

&lt;p&gt;In my script, I have a lot of library imports. Of these, Twython is an external library that needs to be &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html"&gt;packaged with the lambda and uploaded.&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;from twython import Twython, TwythonError&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h1&gt;
  
  
  Deployment: Bot Tweets!
&lt;/h1&gt;

&lt;p&gt;So I packged the Lambda based on those instructions, manually the first time, by uploading a zip file to the Lambda console.  &lt;/p&gt;

&lt;p&gt;And, that's it! My two one-off scripts were ready, and my bot was up and running. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3xYvEhoC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/lambda.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3xYvEhoC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/lambda.png" alt="high-level"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--soYZPFmD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/cronedjob.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--soYZPFmD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/cronedjob.png" alt="high-level"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And here's the final flow I ended up with: &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--VVtJfylZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/architecture.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--VVtJfylZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/images/architecture.png" alt="architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;
&lt;h1&gt;
  
  
  Where to Next?
&lt;/h1&gt;

&lt;p&gt;There's a lot I still want to get to with Soviet Art Bot. &lt;/p&gt;

&lt;p&gt;The most important first step is tweaking the code so that no painting repeats more than once a week. That seems like the right amount of time for Twitter followers to not get annoyed.&lt;/p&gt;

&lt;p&gt;In parallel, I want to focus on testing and maintenance. &lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;
&lt;h2&gt;
  
  
  Testing and Maintenance
&lt;/h2&gt;

&lt;p&gt;The first time I worked through the entire flow, I started by working in a local Python project I had started in PyCharm and had version-controlled on &lt;a href="https://github.com/veekaybee/soviet-art-bot"&gt;GitHub&lt;/a&gt;. &lt;/p&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--timVMRoG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/media/DTyiZBSW4AA0ljc.jpg" alt="unknown tweet media content"&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--RiOhXAdr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/463852826435674115/yjewjpOI_normal.jpeg" alt="Vicki Boykis profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Vicki Boykis
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        &lt;a class="mentioned-user" href="https://dev.to/vboykis"&gt;@vboykis&lt;/a&gt;

      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      Me, trying to explain the cases when I use PyCharm, when I use Sublime Text, and when I use Jupyter Notebooks for development. 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      02:26 AM - 18 Jan 2018
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=953815826393595904" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=953815826393595904" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=953815826393595904" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;p&gt;So, when I made changes to any part of the process, my execution flow would be: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run Wikiart download functionality locally &lt;/li&gt;
&lt;li&gt;Test the &lt;a href="https://medium.com/@bezdelev/how-to-test-a-python-aws-lambda-function-locally-with-pycharm-run-configurations-6de8efc4b206"&gt;lambda "locally"&lt;/a&gt; with &lt;code&gt;python-lambda-local&lt;/code&gt; &lt;/li&gt;
&lt;li&gt;Zip up the lambda and upload to Lambda&lt;/li&gt;
&lt;li&gt;Make mistakes in the Lambda code&lt;/li&gt;
&lt;li&gt;Zip up the lambda and run again. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This was not really an ideal workflow for me, because I didn't want to have to manually re-uploading the lambda every time, so I decided to use Travis CI, which &lt;a href="https://gizmodo.com/the-mess-at-meetup-1822243738"&gt;integrates with GitHub really well.&lt;/a&gt;. The problem is that there's a lot of setup involved: virtualenvs, syncing to AWS credentials, setting up IAM roles and profiles that allow Travis to access the lambda, setting up a test Twitter and AWS environment to test travis integration, and more. &lt;/p&gt;

&lt;p&gt;For now, the bot is working in production, and while it works, I'm going to continue to automate more and more parts of deployment in my &lt;a href="https://github.com/veekaybee/soviet-art-bot/tree/dev"&gt;dev branch&lt;/a&gt;. (&lt;a href="https://joarleymoraes.com/hassle-free-python-lambda-deployment/"&gt;This post&lt;/a&gt; was particularly helpful in zipping up a lambda, and my &lt;a href="https://github.com/veekaybee/soviet-art-bot/blob/dev/lambda_deploy.sh"&gt;deploy script is here.&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;After these two are complete, I want to: &lt;/p&gt;

&lt;p&gt;1) Refactor lambda code to take advantage of &lt;code&gt;pathlib&lt;/code&gt; instead of OS so my code is standardized (should be a pretty small change)&lt;/p&gt;

&lt;p&gt;2) Source more paintings. WikiArt is fantastic, but has only 500ish paintngs available in the socialist realism category. I’d like to find more sources with high-quality metadata and a significant collection of artworks. Then, I'd like to &lt;/p&gt;

&lt;p&gt;3) Create a front-end where anyone can upload a work of socialist realism for the bot to tweet out. This would probably be easier than customizing a scraper and would allow me to crowdsource data. As part of this process, I'd need a way to screen content before it got to my final S3 bucket. &lt;/p&gt;

&lt;p&gt;Which leads to: &lt;/p&gt;

&lt;p&gt;4) Go through current collection and make sure all artwork is relevant and SWF. See if there's a way I can do that programmatically. &lt;/p&gt;

&lt;p&gt;And: &lt;/p&gt;

&lt;p&gt;5) Machine learning and deep learning potential possibilities: Look for a classifier to filter out artworks with nudity/questionable content and figure out how to decide what "questionable" means. Potentially with &lt;a href="https://aws.amazon.com/about-aws/whats-new/2017/04/detect-explicit-or-suggestive-adult-using-amazon-rekognition/"&gt;AWS Rekognition&lt;/a&gt;, or building my own CNN. &lt;/p&gt;

&lt;p&gt;Other machine learning opportunities: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Mash with #devart to see if the bot can create fun headlines for paintings based on painting content&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extract colors from artworks by genre and see how they differ between genres and decades&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Software development can be a long, exhausting process with a lot of moving parts and decision-making involved, but it becomes much easier and more interesting if you you break up a project into byte-sized chunks that you can continuously work on to stop yourself from getting overwhelemed with the entire task at hand. The other part, of course, is that it has to be fun and interesting for you so that you make it through all of the craziness with a fun, finished product at the end. &lt;/p&gt;

</description>
      <category>python</category>
      <category>aws</category>
      <category>programming</category>
      <category>showdev</category>
    </item>
    <item>
      <title>So, you want to data science</title>
      <dc:creator>Vicki Boykis</dc:creator>
      <pubDate>Mon, 26 Jun 2017 15:43:22 +0000</pubDate>
      <link>https://forem.com/vboykis/so-you-want-to-data-science</link>
      <guid>https://forem.com/vboykis/so-you-want-to-data-science</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fveekaybee%2Fveekaybee.github.io%2Fmaster%2Fimages%2Frake.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fveekaybee%2Fveekaybee.github.io%2Fmaster%2Fimages%2Frake.jpg" alt="rake"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The original article was posted on &lt;a href="http://veekaybee.github.io/" rel="noopener noreferrer"&gt;veekaybee.github.io&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The hype of data science has dominated the tech scene for the last five years, an eternity an industry when the average interest cycle for, say, a &lt;a href="https://encrypted.google.com/search?q=javascript+framework+popularity+graph&amp;amp;hl=en&amp;amp;source=lnms&amp;amp;tbm=isch&amp;amp;sa=X&amp;amp;ved=0ahUKEwjbyLDtssrUAhVDNz4KHawsD9AQ_AUIBigB&amp;amp;biw=1217&amp;amp;bih=780#hl=en&amp;amp;tbm=isch&amp;amp;q=javascript+framework+popularity+chart&amp;amp;imgrc=Mp99AGGqggqtPM:" rel="noopener noreferrer"&gt;JavaScript framework&lt;/a&gt; is 3.7 days.&lt;/p&gt;

&lt;p&gt;And yet, data science, (and data engineering), is still a &lt;a href="http://veekaybee.github.io/data-strategy/" rel="noopener noreferrer"&gt;really young field&lt;/a&gt; that is not easily parsed by recruiters, managers making department budgeting decisions, and even data science practitioners themselves.  &lt;/p&gt;

&lt;p&gt;Add into the mix the latest buzz around AI and deep learning, and there are a lot of misconceptions about what data science is and what data scientists do.&lt;/p&gt;

&lt;p&gt;Inspired by machine learning myths &lt;a href="https://dev.to/kasperfred/the-machine-learning-myth"&gt;post on Dev.to&lt;/a&gt;, I decided to write about some of the issues I've seen in the field over the past couple years.&lt;/p&gt;

&lt;p&gt;1) &lt;strong&gt;The data scientist is a Swiss Army knife.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;A data scientist is some kind of &lt;a href="https://aphyr.com/posts/341-hexing-the-technical-interview" rel="noopener noreferrer"&gt;otherwordly super-being&lt;/a&gt; that can write production-quality engineering code, produce PhD-level statistics analysis, display Pulitzer-winning visualizations, and understand the business logic underpinning it all, with equal aplomb.&lt;/p&gt;

&lt;p&gt;It's true that, in theory, data science is an interdisciplinary job where tech and business knowledge have to meet in order to process data, then make sense of it.   &lt;/p&gt;

&lt;p&gt;But the high expectations that have stemmed from the original &lt;a href="http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram" rel="noopener noreferrer"&gt;data science Venn diagram&lt;/a&gt; have led companies and recruiters to believe that unless a candidate can write Java code AND do Bayesian methods AND create charts in D3, they're no good.&lt;/p&gt;

&lt;p&gt;More likely, data scientists' skills are t-shaped, meaning they're broadly exposed to a variety of skills and tools, but focused deeply on a few. Most fall somewhere in the spectrum between what's known as &lt;a href="https://medium.com/@rchang/my-two-year-journey-as-a-data-scientist-at-twitter-f0c13298aee6" rel="noopener noreferrer"&gt;A and B&lt;/a&gt; - analysis and building, and are either better at creating pipelines, or working with them and surfacing insights.&lt;/p&gt;

&lt;p&gt;When someone asks what that means, I bring out this chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fveekaybee%2Fveekaybee.github.io%2Fmaster%2Fimages%2Fengsciflow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fveekaybee%2Fveekaybee.github.io%2Fmaster%2Fimages%2Fengsciflow.png" alt="engsciflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What kind of role you need will depend on what you need to do in a data pipeline.  All of these roles are slightly different and require different levels of skill and expertise, depending on the kind of data you have and what kind of shape it's in.&lt;/p&gt;

&lt;p&gt;I, personally have not seen anyone good at everything in this diagram. Either you know stats really well, or you grok distributed systems, or you can present well to executives. Pick two.  The people who know how to do all three are &lt;a href="https://research.google.com/pubs/jeff.html" rel="noopener noreferrer"&gt;Jeff Dean&lt;/a&gt;, and you can't afford him.&lt;/p&gt;

&lt;p&gt;2) &lt;strong&gt;A data scientist works alone&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Along with the myth that one person knows everything, there is a myth that you only need one data scientist working in isolation, to do everything. In reality, since a data scientist specializes, as do most people, you need to have a solid team of people who &lt;a href="https://www.coursera.org/learn/build-data-science-team" rel="noopener noreferrer"&gt;complement each other's skill sets&lt;/a&gt; to build a successful data product or analysis pipeline.&lt;/p&gt;

&lt;p&gt;Ideally, you'll have a data engineer, a data scientist, and a product owner/UX specialist (this role includes communicating with the business, documentation, and architecture), or some variation of those three roles in a number of people not to exceed five.  &lt;/p&gt;

&lt;p&gt;A data scientist also needs to be in touch with other parts of the business as much as possible, by sitting in on business meetings to understand where the questions directed to them are coming from. Organizations that see data scientists as an asset as opposed to sitting them in a corner and handing them one-off questions to answer like a report generation machine are organizations that do well.&lt;/p&gt;

&lt;p&gt;3) &lt;strong&gt;A good data scientist can only do analysis&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;It's true that a single person can't do absolutely everything in the data science toolchain. But, from the perspective of an individual practitioner,  saying that you can only do statistical analysis is not as valuable as being able to carry that analysis through to other environments and to the final location for the data and the model.&lt;/p&gt;

&lt;p&gt;It doesn't mean that you need to be able to write production-quality code. But it does mean you need to understand the considerations your model will need to run under, and to be able to understand the constraints people further down in the data pipeline face.   &lt;/p&gt;

&lt;p&gt;4) &lt;strong&gt;The data scientist as the genius:&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Even though companies often don't understand what they need from a data scientist, I often see many positions asking for a PhD in statistics or machine learning.&lt;/p&gt;

&lt;p&gt;Unless you are doing ground-breaking research at SpaceX or CERN, you probably don't need a PhD, because most of the problems in industry revolve around questions like these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We have customers, but we don't understand them. How do we understand them better?&lt;/li&gt;
&lt;li&gt;How do we get more people to click on thing X or Y?&lt;/li&gt;
&lt;li&gt;How do we move our data to Hadoop? Should we move our data to Hadoop? How much&lt;/li&gt;
&lt;li&gt;How do we count the number of products we sell? How do we increase the number of products we sell?&lt;/li&gt;
&lt;li&gt;We have data in two different places. How do we get it into one place?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, here's a typical ad I see these days (a composite of several I've seen lately):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Qualifications:&lt;/p&gt;

&lt;p&gt;Masters or PhD in a quantitative/analytical discipline required (e.g. Computer Science, Statistics, Economics, Mathematics, Finance, Operations Research or similar field)&lt;/p&gt;

&lt;p&gt;3+ years in a Data Scientist role&lt;/p&gt;

&lt;p&gt;Highly skilled with Python, Java, and SQL.&lt;/p&gt;

&lt;p&gt;Demonstrable experience in developing Machine Learning and Deep Learning algorithms&lt;/p&gt;

&lt;p&gt;Proven track record of developing, maintaining, and deploying data services.&lt;/p&gt;

&lt;p&gt;Experience with Hadoop stack (Hive, Pig, Hadoop Streaming) and MapReduce.&lt;/p&gt;

&lt;p&gt;Experience with Spark.&lt;/p&gt;

&lt;p&gt;Develop dashboards, reports, charts, graphs and tables displaying the outcomes of analyses for use by internal and external stakeholders.&lt;/p&gt;

&lt;p&gt;Ability to work autonomously and take ownership of a project.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you find a data scientist who fulfills the union of all of these circles, they're either lying or you can't afford them, because there is no way someone can have spent enough time to be an expert at everything listed. If they have a PhD, they probably haven't spent as much time in industry and therefore can't be highly-skilled at Python or Java. If they have a proven track record of deploying data services, they probably don't have much machine learning experience. If they're developing dashboards, they're not doing deep learning and statistical analysis, they are essentially a report builder.&lt;/p&gt;

&lt;p&gt;The real problem here is that the company doesn't understand what they're looking for, and therefore has lumped everything all together.&lt;/p&gt;

&lt;p&gt;The reason you hire a data scientist, fundamentally, is to increase your company's revenue in some way, by optimizing some data process you had no insight into before.&lt;/p&gt;

&lt;p&gt;Unless you're at a company whose revenue is dependent on cars driving themselves or on breakthroughs in medical research, you don't need a deep learning specialist, or even deep learning itself.&lt;/p&gt;

&lt;p&gt;There was a &lt;a href="http://partiallyderivative.com/podcast/2017/05/30/dont-gatekeep-me-bro" rel="noopener noreferrer"&gt;great podcast recently&lt;/a&gt; about how good data scientists come from all sorts of backgrounds, which don't all entail writing papers about deep learning.&lt;/p&gt;

&lt;p&gt;If you're only looking for PhDs OR people who have had ten years in industry OR Java developers OR R experts OR ex-Googlers, you'll miss the people working in all corners of industry and across academic disciplines who have worked with synthesizing analysis and know the pain of trying to make sense of data intimately, which is all data science is, at its core.&lt;/p&gt;

&lt;p&gt;5) &lt;strong&gt;You need a data scientist&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;This brings me to the question of whether you even need a data scientist at all. Remember, you're hiring a data scientist because you want them to provide some value from your datasets by answering questions: should we build out product x? Will City Y be a competitive market for us?  How many widgets do we make a day, and is that an efficient number? How happy are our customers?&lt;/p&gt;

&lt;p&gt;If you can't figure out whether you have enough data available to answer these questions, and whether the data is well-organized enough to do so, the data scientist won't be able to, either.&lt;/p&gt;

&lt;p&gt;Or rather, the data scientist will, but they'll spend all their time doing janitorial data anthropology,  which is extremely important work, but probably not why you hired a data scientist.  &lt;/p&gt;

&lt;p&gt;If that is the kind of work you need, make it clear up-front, or your hire, who was ready to solve business problems, will become extremely frustrated trying to figure out why they're labeling training data for the fifth week in a row.&lt;/p&gt;

&lt;p&gt;6) &lt;strong&gt;You need a data scientist to work on deep learning with big data&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;This is probably the most pervasive myth of the hype cycle. Just as you don't always need a software developer to work with Go and microservices on containers, you don't need a machine learning engineer specializing in Julia who works on neural nets.&lt;/p&gt;

&lt;p&gt;Remember what I wrote before - most data science problems are similar and simple, and both Ferraris and Civics can travel on the same highway, Civics sometimes more efficiently.  I've also written before about &lt;a href="http://veekaybee.github.io/hadoop-or-laptop/" rel="noopener noreferrer"&gt;big data problems&lt;/a&gt;, and why you should think long and hard about whether you need that Hadoop cluster.&lt;/p&gt;

&lt;p&gt;7) &lt;strong&gt;Data science is a growing field and there is a lot of opportunity.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Unfortunately, &lt;a href="http://blog.indeed.com/2016/08/16/do-you-need-a-data-scientist/" rel="noopener noreferrer"&gt;the same hype cycle&lt;/a&gt; that has driven up the demand for data scientists has also &lt;a href="https://www.stitchdata.com/resources/reports/the-state-of-data-science/" rel="noopener noreferrer"&gt;driven up the supply&lt;/a&gt;. If you're just entering the field, it's going to be harder for you to break through and get a job than it would have been in 2012.&lt;/p&gt;

&lt;p&gt;That doesn't mean you shouldn't try, it just means you're up against 50, 60, even upwards of 200 people for a junior-level position, which is something you want to keep in mind.&lt;/p&gt;

&lt;p&gt;That there is a lot of opportunity in data - data analysis is not going to go away for companies. But it now comes in different forms, such, as, particularly, data engineering. The more engineering you know and understand how to do, the more valuable of an addition you are to any data team.&lt;/p&gt;

&lt;p&gt;8) &lt;strong&gt;Data scientists, engineers, and AI will make all our jobs obsolete.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;There has been an endless barrage of articles in the media lately about how AI is going to take all of our jobs.&lt;/p&gt;

&lt;p&gt;This is pretty at odds with how the data industry is today, because I and most of the data scientists I know spend an inordinate amount of time cleaning data. Even when the data is clean, it gives wrong answers, weird answers you didn't expect, and results in things like &lt;a href="https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist" rel="noopener noreferrer"&gt;TayBot&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This is just the nature of data: a byproduct of messy, human-generated processes that will take a long time to resolve and an even longer time to fully replace.&lt;/p&gt;

&lt;p&gt;As an industry, we have a really long way to go to understand what our data means, how to analyze it, and even how we, as data practitioners fit into the meaning of what data science is these days. Rest assured that we need a long time to figure it out.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>dataengineering</category>
      <category>architecture</category>
      <category>strategy</category>
    </item>
  </channel>
</rss>
