<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Amit Bendor </title>
    <description>The latest articles on Forem by Amit Bendor  (@amitbendorartlist).</description>
    <link>https://forem.com/amitbendorartlist</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F722738%2F12ee4991-33af-4952-952a-c45f9fd45722.png</url>
      <title>Forem: Amit Bendor </title>
      <link>https://forem.com/amitbendorartlist</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/amitbendorartlist"/>
    <language>en</language>
    <item>
      <title>Lessons learned on the road to MLOps</title>
      <dc:creator>Amit Bendor </dc:creator>
      <pubDate>Thu, 07 Jul 2022 09:49:45 +0000</pubDate>
      <link>https://forem.com/artlist/lessons-learned-on-the-road-to-mlops-22lj</link>
      <guid>https://forem.com/artlist/lessons-learned-on-the-road-to-mlops-22lj</guid>
      <description>&lt;h2&gt;
  
  
  The start
&lt;/h2&gt;

&lt;p&gt;Hello and welcome to our series of posts on MLOps at Artlist!&lt;/p&gt;

&lt;p&gt;The goal of this series is to share our MLOps journey (which is still undergoing) and how we applied our vision and perspective to a &lt;strong&gt;real-world production infrastructure&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In the very early days of our department, when I (Amit), just joined the company and was the first Data science employee, I had the rare opportunity to stop and think with myself about how to "build things right" this time. &lt;/p&gt;

&lt;p&gt;After managing a few data science teams and projects, and seeing failures and success stories I felt like I had a quite clear vision of the “values” which should guide us as we build our infrastructure and practices. &lt;/p&gt;

&lt;p&gt;It took me about 2 days of just distilling and spilling out into a notion page all of my thoughts and we were ready to go.&lt;/p&gt;

&lt;p&gt;At that time the buzzword “MLOps” was rising in popularity and we felt like it has many similar notions to ours - although some of them are vague in terms of implementation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Core values
&lt;/h2&gt;

&lt;h3&gt;
  
  
  A battle of ultimate goals
&lt;/h3&gt;

&lt;p&gt;Before starting with values we need to understand our main goals. they will be our "guiding star" whenever we have a decision to make.&lt;/p&gt;

&lt;p&gt;If I had to choose 1 superior goal it would be “&lt;strong&gt;business impact&lt;/strong&gt;”. Quite a high level but eventually we want to bring value to our users and to impact the company’s bottom line.&lt;/p&gt;

&lt;p&gt;Also, we can measure it for every decision we're taking - for example what features to focus on next? which implementation to choose for our pipeline tool? we can answer all of those with an estimation of the business impact.&lt;/p&gt;

&lt;p&gt;In order to get to this goal - we can say we’d like to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Independent&lt;/strong&gt; - can bring value without relying on the development teams or data engineers (ideally)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Focus on DS&lt;/strong&gt; - in order to bring our unique value to the company, we need to do mostly data science/research work and not general engineering tasks.   &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As you can figure out these two goals are conflicting with each other - and this is exactly the fine line each team needs to define. We defined our own guidelines as you’ll see below. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0iq1luiaubcd4mk9q798.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0iq1luiaubcd4mk9q798.jpg" alt="Complex"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The modern Data Science team
&lt;/h3&gt;

&lt;p&gt;One of our first decisions was to build the team the &lt;strong&gt;“modern way”&lt;/strong&gt;. And by that I mean we decided we’re not going for the traditional way algorithm teams used to provide some dirty research code - and letting a software engineer decipher and convert it into a running production code. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz608p6bu5hewpni4dei2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz608p6bu5hewpni4dei2.png" alt="Team Goals"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We decide to rather strive towards a “&lt;a href="https://towardsdatascience.com/fcds-b2d2e6b08d34" rel="noopener noreferrer"&gt;Full Cycle Data Science&lt;/a&gt;” paradigm, where we take ownership over every activity related to our core data science activity.&lt;/p&gt;

&lt;p&gt;What does it means exactly? What are the bounds?&lt;/p&gt;

&lt;p&gt;We call it the “&lt;strong&gt;Up to the API&lt;/strong&gt;” approach- all activities from research, to automation using pipelines and holding APIs to externalize our predictions.&lt;/p&gt;

&lt;p&gt;What not?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Any operational backend/frontend work&lt;/li&gt;
&lt;li&gt;Data engineering ETLs&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Turning “values” into infrastructure
&lt;/h2&gt;

&lt;p&gt;Ok, so now let's move on to the most important subject - values.&lt;br&gt;
You might be asking - so the rest of the articles just going to be a bunch of clichés? I promise not. let’s see how we took every “value” and created &lt;strong&gt;actionable decisions&lt;/strong&gt; &amp;amp; &lt;strong&gt;guidelines&lt;/strong&gt; we use &lt;strong&gt;every day&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx3ry5envzt94d37g8zf5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx3ry5envzt94d37g8zf5.png" alt="Core values"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  Enabling creativity
&lt;/h3&gt;
&lt;h4&gt;
  
  
  What does it means?
&lt;/h4&gt;

&lt;p&gt;Inspired by our company, we believe that researchers should “&lt;strong&gt;push the boundaries&lt;/strong&gt;” -  and that our MLOps infrastructure should &lt;strong&gt;enable&lt;/strong&gt; it.&lt;/p&gt;

&lt;p&gt;We truly believe we should enable the &lt;strong&gt;Maximum&lt;/strong&gt; flexibility in the research side of our work - because this is our &lt;strong&gt;core&lt;/strong&gt; &lt;strong&gt;value&lt;/strong&gt; in the company.&lt;/p&gt;
&lt;h4&gt;
  
  
  Decisions
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Research with notebooks/scripts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;👉 As long as it supports fast iterations - use Jupyter notebooks, scripts, or a combination of both&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;CYOF (Choose your own framework)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;👉 Prefer Tensorflow over PyTorch - not a problem. Use the framework/tools that make the most sense to you&lt;/p&gt;


&lt;h3&gt;
  
  
  Simplicity
&lt;/h3&gt;
&lt;h4&gt;
  
  
  What does it means?
&lt;/h4&gt;

&lt;p&gt;We already talking about conflicts so here is another one.&lt;/p&gt;

&lt;p&gt;We don’t want to activate our human decision-making process for operations that are outside of our core contribution.&lt;/p&gt;

&lt;p&gt;Therefore, we decided to give &lt;strong&gt;minimal&lt;/strong&gt; flexibility on the “engineering side” of our work.&lt;/p&gt;
&lt;h4&gt;
  
  
  Decisions
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;One implementation &lt;sup&gt;TM &lt;/sup&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;👉 minimal set of tools and just one implementation for 3 variants of our main components: pipeline, API, Python package&lt;/p&gt;

&lt;p&gt;Example: for API use only FastAPI &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write once &lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;
  
  
  How is it being reflected?
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Write code templates (cookiecutters in Python) - to provide a fully functional environment - full of nifty automations&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use it when starting every project&lt;/li&gt;
&lt;li&gt;Look below the links section for a few examples&lt;/li&gt;
&lt;li&gt;Deep dive post is coming up soon&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Standard operations - a library we developed covering “must-haves” of any script or notebook. (production or research)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Establish a good standard&lt;/li&gt;
&lt;li&gt;Includes: Logging, configuration, and tracking modules&lt;/li&gt;
&lt;li&gt;Deep dive post is coming up soon&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Everything is containerized&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A standard way to deploy our code in different services without changing anything (!)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5b75soryjgficxcm1xd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5b75soryjgficxcm1xd.png" alt="Standard ops"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  Transparency (of tools &amp;amp; infra)
&lt;/h3&gt;
&lt;h4&gt;
  
  
  What does it means?
&lt;/h4&gt;

&lt;p&gt;This one is connected to the previous value. (simplicity)&lt;/p&gt;

&lt;p&gt;We want things to “just work” without touching them when possible.&lt;/p&gt;

&lt;p&gt;We rather sacrifice flexibility and cost &lt;/p&gt;
&lt;h4&gt;
  
  
  Decisions
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;No DevOps&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;👉 We’ll prefer serverless, self-managed, cloud-native solutions. Even in the cost of flexibility or cost of operation&lt;/p&gt;

&lt;p&gt;👉 For example we chose vertex pipelines as our main pipelines tool and GCP native monitoring for those features.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fewest interactions - we chose libraries and implementations that don’t require the least lines of code to run&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;from aistdops.logging import logger logger.info()&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;over&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;logger = Logger(), logger.info()&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h4&gt;
  
  
  How is it being reflected?
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Cloud - prefer managed services, serverless&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Choice of transparent libraries - for example, ClearML for experiment tracking as it logs most things *&lt;em&gt;implicitly *&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0qajuaddu7g399cws0l.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0qajuaddu7g399cws0l.jpg" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjsh6szo557g68bbelbp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjsh6szo557g68bbelbp.png" alt="Transparent implementation"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Robustness + Well engineered
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What does it means?
&lt;/h4&gt;

&lt;p&gt;So frankly, we really don’t like waking up in the middle of the night from a pager duty call.&lt;/p&gt;

&lt;p&gt;Therefore, we’re doing all we can to provide testable, robust solutions&lt;/p&gt;

&lt;h4&gt;
  
  
  Decisions
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Invest in testing code, data validation, and model monitoring&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Choose production-ready tools&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Track everything we can&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;To build a lineage map, reproducibility&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Reduce production risks as possible&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prefer batch jobs over online APIs&lt;/li&gt;
&lt;li&gt;Always have a fallback&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;/ol&gt;

&lt;h4&gt;
  
  
  How is it being reflected?
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Data validation - with great expectations&lt;/li&gt;
&lt;li&gt;Tracking experiments with ClearML&lt;/li&gt;
&lt;li&gt;Embrace best practices in core components - configuration management, logging, dataset and model management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What's next?
&lt;/h3&gt;

&lt;p&gt;In the next articles in this series, we'll go deeper into our core implementations of those values into our infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  About the writer
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0axvjvefm4exu5z9lrm5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0axvjvefm4exu5z9lrm5.png" alt="Amit Bendor"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Amit is the head of data science at Artlist. He’s an active contributor to the Association of software architecture and Cloud security alliance organizations.&lt;/p&gt;

&lt;p&gt;If you google his name, you’ll see he is talking about technology at every given chance.&lt;/p&gt;

&lt;p&gt;Co-hosting the award-winning podcast “Osim Tochna” in Israel, recording videocasts, and speaking at conferences and meetups.&lt;/p&gt;

&lt;p&gt;When he is not running into walls with the VR headset, he is spreading the world of AI for developers with &lt;a href="//cloudaiworld.com"&gt;cloudaiworld.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>cloud</category>
      <category>ai</category>
      <category>leadership</category>
    </item>
  </channel>
</rss>
