<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: omarkhater</title>
    <description>The latest articles on Forem by omarkhater (@omarkhater).</description>
    <link>https://forem.com/omarkhater</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F840982%2Ff97cecc1-9bb0-48c7-89d0-4caa463ff71c.jpeg</url>
      <title>Forem: omarkhater</title>
      <link>https://forem.com/omarkhater</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/omarkhater"/>
    <language>en</language>
    <item>
      <title>Service in review: Sagemaker Modeling Pipelines</title>
      <dc:creator>omarkhater</dc:creator>
      <pubDate>Sat, 18 Mar 2023 09:26:39 +0000</pubDate>
      <link>https://forem.com/aws-builders/service-in-review-sagemaker-modeling-pipelines-83a</link>
      <guid>https://forem.com/aws-builders/service-in-review-sagemaker-modeling-pipelines-83a</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;




&lt;p&gt;Welcome back to my blog, where I share insights and tips on machine learning workflows using Sagemaker Pipelines. If you're new here, I recommend checking out my &lt;a href="https://dev.to/omarkhater/revolutionize-your-machine-learning-workflow-with-sagemaker-pipelines-build-train-and-deploy-models-with-ease-2c8d"&gt;first post&lt;/a&gt; to learn more about this AWS fully managed machine learning service. In my &lt;a href="https://dev.to/aws-builders/unlocking-flexibility-and-efficiency-how-to-leverage-sagemaker-pipelines-parameterization-3jl0"&gt;second post&lt;/a&gt;, I discussed how parameterization can help you customize the workflow and make it more flexible and efficient.&lt;/p&gt;

&lt;p&gt;After using Sagemaker Pipelines extensively in real-life projects, I've gained a comprehensive understanding of the service. In this post, I'll summarize the key benefits of using Sagemaker Pipelines and the limitations you should consider before implementing it. Whether you're a newcomer to the service or a seasoned user, you'll gain valuable insights from this concise review.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;




&lt;p&gt;1.&lt;strong&gt;Sagemaker Integration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This service is integrated with Sagemaker directly, so the user doesn't have to deal with other AWS services. Also, one can create pipelines programmatically thanks to the Sagemaker Python SDK Integration. Further, it can be used from within the console due to the seamless integration with Sagemaker Studio. &lt;/p&gt;

&lt;p&gt;2.&lt;strong&gt;Data Lineage Tracking&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data lineage is the process of tracking and documenting the origin, movement, transformation, and destination of data throughout its lifecycle. It refers to the ability to trace the path of data from its creation to its current state and provides a complete view of data movement, including data sources, transformations, and destinations.&lt;/p&gt;

&lt;p&gt;Sagemaker pipelines make this process easier as can visualized below &lt;sup&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking.html" rel="noopener noreferrer"&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbb0yekynd8d3eoetr9b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbb0yekynd8d3eoetr9b.png" alt="Lineage Metadata"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3.&lt;strong&gt;Curated list of steps for all ML life cycle&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sagemaker pipeline provides a convenient way to manage the highly iterative process of ML development through something called steps. This enables easier development and maintenance either individually or within a team. Currently, it contains the following list of &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#build-and-manage-steps-types" rel="noopener noreferrer"&gt;step types&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A more comprehensive guide on these steps is articulated in &lt;a href="https://dev.to/omarkhater/revolutionize-your-machine-learning-workflow-with-sagemaker-pipelines-build-train-and-deploy-models-with-ease-2c8d"&gt;this post&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3j8lsbpelq2aa0tdbryj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3j8lsbpelq2aa0tdbryj.png" alt="Steps By Functionalityn"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;4.&lt;strong&gt;Parallelism&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There are several ways to run the ML workflows in parallel using Sagemaker pipelines. For example, it can be used to either change the data, algorithm or both. The ability to smoothly integrate with other Sagemaker capabilities greatly simplifies the process of creating repeatable and well-organized machine learning workflows.&lt;/p&gt;

&lt;p&gt;a simple Python program to lunch many pipelines in parallel can be something like the code snippet below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.workflow.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;multiprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Process&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;concurrent.futures&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ThreadPoolExecutor&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Pipeline_Parameters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;execution_parameters&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ct_start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Executing pipeline: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;execution_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pipeline_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; with the following parameters:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Pipeline_Parameters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;Pipeline_Parameters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;execution_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pipeline_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="n"&gt;execution&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;execution_display_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;execution_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;disp_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                                       &lt;span class="n"&gt;execution_description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;execution_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;execution_description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                                       &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Pipeline_Parameters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;execution_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wait&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Waiting for the pipeline to finish...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

            &lt;span class="c1"&gt;## Wait for maximum 8.3 (30 seconds * 1000 attempts)  hours before raising waiter error. 
&lt;/span&gt;            &lt;span class="n"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# The polling interval
&lt;/span&gt;                           &lt;span class="n"&gt;max_attempts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="c1"&gt;# The maximum number of polling attempts. (Defaults to 60 polling attempts)
&lt;/span&gt;                          &lt;span class="p"&gt;)&lt;/span&gt; 
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_steps&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Executing the pipeline without waiting to finish...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Executing pipeline: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;execution_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pipeline_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; done&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;ct_end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;ET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct_end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;ct_start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Time Elapsed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ET&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (hh:mm:ss.ms)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;execution&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Couldn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t run pipeline: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;execution_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;disp_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; due to:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;worer_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exitcode&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;proc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="c1"&gt;# List of all required executions such as display name. Each configuation should be a dictionary
&lt;/span&gt;    &lt;span class="n"&gt;Execution_args_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; 
    &lt;span class="c1"&gt;# List of parameters per execution. Each configuation should be a dictionary
&lt;/span&gt;    &lt;span class="n"&gt;pipeline_parameters_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;Execution_args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pipeline_parameters&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Execution_args_list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pipeline_parameters_list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;start_pipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pipeline_parameters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Execution_args&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;proc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;proc&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;proc&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;worer_func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;as_completed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;proc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                            &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;terminate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;PermissionError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                            &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Limitations and areas of improvment
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-condition" rel="noopener noreferrer"&gt;Pipelines with conditions&lt;/a&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SageMaker Pipelines doesn't support the use of nested condition steps. You can't pass a condition step as the input for another condition step.&lt;/li&gt;
&lt;li&gt;A condition step can't use identical steps in both branches. If you need the same step functionality in both branches, duplicate the step and give it a different name.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Loops&lt;/strong&gt;: &lt;/p&gt;

&lt;p&gt;Sagemaker pipelines doesn't provide a direct way to iterate some steps of the ML flow. For example, if you need to repeat data processing and model training until certain accuracy is met, you will have to implement this logic yourself. &lt;/p&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Passing data between steps&lt;/strong&gt;: &lt;/p&gt;

&lt;p&gt;It is a typical situation in ML development to pass many data arrays between different steps. While Pipelines can be customized such that the developer can save and load data files from S3, this create a development bottleneck in rapid prototyping. Reading/writing data to files is an error prone in nature and the developer needs to effectively handle the errors of this process to avoid failed executions of the pipelines. &lt;/p&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Operations on pipeline parameters&lt;/strong&gt;: &lt;/p&gt;

&lt;p&gt;Sagemaker enables using variables within the pipeline that can be changed in run time by using &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-parameters.html" rel="noopener noreferrer"&gt;Pipeline Parameters&lt;/a&gt;. I dedicated &lt;a href="https://dev.to/aws-builders/unlocking-flexibility-and-efficiency-how-to-leverage-sagemaker-pipelines-parameterization-3jl0"&gt;this post&lt;/a&gt; to summarize in-depth this key feature of the pipeline. &lt;/p&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;




&lt;p&gt;In conclusion, Sagemaker Model Building Pipelines is a valuable service that simplifies the creation, management, and monitoring of machine learning workflows. Its integration with Sagemaker makes it easy to use without the need to deal with other AWS services, and the availability of a Python SDK enables the creation of pipelines programmatically. The service provides a curated list of steps for all stages of the ML life cycle and enables data lineage tracking, making it easier to trace the path of data throughout its lifecycle. Additionally, the service supports the parallel execution of ML workflows, which is helpful when processing large amounts of data. However, there are still some limitations that the service needs to address, such as the inability to loop through specific part of the pipelines steps. Overall, Sagemaker Model Building Pipelines is a powerful tool for data scientists and machine learning engineers, and its many features make it a valuable addition to the machine learning ecosystem.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>machinelearning</category>
      <category>automation</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Unlocking Flexibility and Efficiency: How to Leverage Sagemaker Pipelines Parameterization</title>
      <dc:creator>omarkhater</dc:creator>
      <pubDate>Sun, 05 Mar 2023 07:09:45 +0000</pubDate>
      <link>https://forem.com/aws-builders/unlocking-flexibility-and-efficiency-how-to-leverage-sagemaker-pipelines-parameterization-3jl0</link>
      <guid>https://forem.com/aws-builders/unlocking-flexibility-and-efficiency-how-to-leverage-sagemaker-pipelines-parameterization-3jl0</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;




&lt;p&gt;Are you using &lt;a href="https://dev.to/omarkhater/revolutionize-your-machine-learning-workflow-with-sagemaker-pipelines-build-train-and-deploy-models-with-ease-2c8d"&gt;Sagemaker Pipelines for your machine learning workflows&lt;/a&gt; yet? If not, you're missing out on a powerful tool that simplifies the entire process from building to deployment. But even if you're already familiar with this service, you may not know about one of its key features: parameterization. With parameterization, you can customize the whole workflow, making it more flexible and dynamic. In this article, we'll take a deep dive into this feature, share some real-world examples, and discuss the pros and cons. We'll even offer some workarounds for addressing the limitations of this service. So grab your coffee and let's get started!&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of Sagemaker Pipelines Parameterization
&lt;/h2&gt;




&lt;p&gt;Before we dive into the pros and cons of Sagemaker Pipelines parameterization, let's take a quick look at how it works. Sagemaker Pipelines allows users to specify input parameters using the Parameter class. These parameters can be specified when defining the pipeline, and their values can be set at execution. Parameterization allows users to create flexible pipelines that can be customized for different use cases.&lt;/p&gt;

&lt;p&gt;In addition, Sagemaker studio provides a stunning GUI to execute the pipeline using any other values for the defined parameters. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3laheaw3eayh3uv93yfs.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3laheaw3eayh3uv93yfs.gif" alt="Parameterization overview" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Generally speaking&lt;/em&gt;, Sagemaker Modeling Pipeline &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-parameters.html" rel="noopener noreferrer"&gt;supports&lt;/a&gt; 4 different type of parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ParameterString&lt;/strong&gt; – Representing a string parameter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ParameterInteger&lt;/strong&gt; – Representing an integer parameter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ParameterFloat&lt;/strong&gt; – Representing a float parameter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ParameterBoolean&lt;/strong&gt; – Representing a Boolean Python type&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and the syntax is as simple as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;parameter&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;parameter_type&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;parameter_name&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;default_value&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;default_value&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-world Examples
&lt;/h2&gt;




&lt;p&gt;Amazon Sagemaker Example Notebooks provides several complete posts about Sagemaker pipelines parameterization. Posts below are really helpful to get started with: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipeline-parameterization/parameterized-pipeline.html" rel="noopener noreferrer"&gt;Parameterize Sagemaker Pipelines (Introductory example)&lt;/a&gt;: shows how to create a parameterized Sagemaker Pipeline using the Amazon Sagemaker SDK.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipeline-compare-model-versions/notebook.html" rel="noopener noreferrer"&gt;Comparing model metrics with Sagemaker Pipelines and Sagemaker Model Registry (Advanced)&lt;/a&gt;: provides an example of how to use Sagemaker Pipelines in deploying the model based on a parmeterized performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Benefits of Sagemaker Pipelines Parameterization
&lt;/h2&gt;




&lt;p&gt;Clearly, parameterization is a key advantage of using Sagemaker Pipelines in automating ML workflows. &lt;/p&gt;

&lt;p&gt;There are numerous advantages such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GUI-based executions&lt;/strong&gt;: Typically, one can define the Pipeline once then execute the whole work flow smoothly. This is a significant benefit if you work with a colleague data scientist who prefer low-code solutions. Yet, we still can execute it using the &lt;a href="https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipeline-parameterization/parameterized-pipeline.html#Starting-the-pipeline-with-the-SDK" rel="noopener noreferrer"&gt;Sagemaker SDK&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rapid prototyping&lt;/strong&gt;: Evidently, it enables more efficient experimentation and testing by allowing for easy modification of pipeline components without the need for extensive manual changes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Colloboration&lt;/strong&gt;: By divding the ML workflow into modular, parameterized parts. It is now more practical and efficient for team-work. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automation&lt;/strong&gt;: It facilitates automation by enabling the use of scripts and code to modify pipeline parameters, allowing for fully automated end-to-end machine learning workflows.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The list goes on and on. &lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations of Sagemaker Pipelines Parameterization
&lt;/h2&gt;




&lt;p&gt;While parameterization is a useful feature of Sagemaker Pipelines, it also has some limitations that can make it difficult to use in certain situations. Here are some common limitations to be aware of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Limited support for dynamic or runtime parameters&lt;/strong&gt;: Sagemaker Pipelines only supports static parameters that are set during pipeline definition. There is no support for runtime or dynamic parameters that can be set during pipeline execution.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Limited support for nested parameters&lt;/strong&gt;: Sagemaker Pipelines does not support nested parameters or hierarchical parameters, which can be limiting in more complex pipeline use cases.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Limited parameter validation&lt;/strong&gt;: Sagemaker Pipelines does not provide extensive parameter validation capabilities, which can make it harder to catch errors or issues during pipeline execution. For example, it may not automatically validate the format or type of the input data, or ensure that the parameters are within acceptable ranges or limits. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Quota limiations&lt;/strong&gt;: AWS sets a non-adjustable quota limit of 200 for the maximum number of parameters in the pipeline. This might be a problem in large-scale pipelines. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition, the &lt;a href="https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#limitations" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt; listed some other limitations: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Not 100% compatible with other Sagemaker Python SDK modules&lt;/strong&gt;: For example, pipeline parameters can't be used to pass image_uri for &lt;a href="https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.FrameworkProcessor" rel="noopener noreferrer"&gt;Framework Processors&lt;/a&gt; but can be used with &lt;a href="https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.Processor" rel="noopener noreferrer"&gt;Processor&lt;/a&gt;. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not all arguments can be parameterized&lt;/strong&gt;: Remember to read the documentation carefully to see whether a certain parameter can be a Pipeline variable or not. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, the &lt;code&gt;role&lt;/code&gt; can be parameterized while the &lt;code&gt;base_job_name&lt;/code&gt; can not be parameterized in the &lt;a href="https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.Processor" rel="noopener noreferrer"&gt;Processor&lt;/a&gt; API as shown below. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj107x7o1r34tijhbu0pp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj107x7o1r34tijhbu0pp.png" alt="readdocs" width="800" height="802"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Not all built-in Python operations can be applied to parameters&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# An example of what not to do
&lt;/span&gt;&lt;span class="n"&gt;my_string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://{}/training&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ParameterString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MyBucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Another example of what not to do
&lt;/span&gt;&lt;span class="n"&gt;int_param&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ParameterInteger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MyBucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Instead, if you want to convert the parameter to string type, do
&lt;/span&gt;&lt;span class="n"&gt;int_param&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Useful Workarounds
&lt;/h2&gt;




&lt;p&gt;While these limitations can make parameterization in Sagemaker Pipelines challenging, there are solutions for overcoming some of them. Here are a few examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Using Lambda functions for dynamic parameters&lt;/strong&gt;: To work around the limitation of static parameters, you can use a Lambda function to determine the parameter value dynamically based on other pipeline inputs or external data. For example, you could use a Lambda function to calculate the minimum star rating to include in your analysis based on the average star rating of all customer reviews. You can use Lambda step in this context. All different steps are summarized in &lt;a href="https://dev.to/omarkhater/revolutionize-your-machine-learning-workflow-with-sagemaker-pipelines-build-train-and-deploy-models-with-ease-2c8d"&gt;this post&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Using &lt;a href="https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#property-file" rel="noopener noreferrer"&gt;ProperyFiles&lt;/a&gt; for nested parameters&lt;/strong&gt;: If you need to specify nested or hierarchical parameters, you can write a JSON file to be used within the Pipeline using both JsonGet and PropertyFile as shown in the code snippet below:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sagemaker&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.workflow.properties&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PropertyFile&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.workflow.steps&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ProcessingStep&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.processing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt;  &lt;span class="n"&gt;FrameworkProcessor&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.workflow.functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;JsonGet&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.sklearn.estimator&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SKLearn&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.workflow.pipeline_context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PipelineSession&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.workflow.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.processing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt;  &lt;span class="n"&gt;ProcessingOutput&lt;/span&gt;

&lt;span class="n"&gt;pipeline_session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PipelineSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;pp_outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;ProcessingOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paths&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                               &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/opt/ml/processing/output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="n"&gt;Paths_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PropertyFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NestedParameter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paths&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nested_parameter.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FrameworkProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_execution_role&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                        &lt;span class="n"&gt;instance_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ml.t3.medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;instance_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;estimator_cls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SKLearn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;framework_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.23-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;sagemaker_session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline_session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;step_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DoNothing.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp_outputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;step_process&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ProcessingStep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dummystep&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;step_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step_args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;property_files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Paths_file&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;train_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;JsonGet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;step_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step_process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;property_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Paths_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;json_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paths.train.URI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;test_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;JsonGet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;step_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step_process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;property_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Paths_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;json_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paths.test.URI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;   
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;The JSON file could be something like this:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"paths"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"train"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"URI"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"s3://&amp;lt;path_to_train_data&amp;gt;"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"test"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"URI"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"s3://&amp;lt;path_to_test_data&amp;gt;"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Building custom validation scripts&lt;/strong&gt;: To catch errors or issues with your pipeline parameters, you can build custom validation scripts that check the parameter values before the pipeline runs. This can help catch errors early and prevent pipeline failures due to invalid parameters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Careful design&lt;/strong&gt;: All in all, you need to design the exposed parameters to ensure not passing the quota value. If you need further increase, you may contact the support about your case. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;




&lt;p&gt;Parameterization is a useful feature in Sagemaker Pipelines, but it does have some limitations that can make it challenging to use in certain situations. By using Lambda functions, PropertyFiles, and custom validation scripts, you can work around some of these limitations and create more flexible pipelines. By following best practices for parameterization, you can also ensure that your pipelines are well-organized and easy to use. With these tips and tricks, you'll be able to make the most of Sagemaker Pipelines parameterization and create powerful machine learning workflows.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>aws</category>
    </item>
    <item>
      <title>Revolutionize Your Machine Learning Workflow with SageMaker Pipelines: Build, Train, and Deploy Models with Ease!</title>
      <dc:creator>omarkhater</dc:creator>
      <pubDate>Mon, 27 Feb 2023 03:39:02 +0000</pubDate>
      <link>https://forem.com/omarkhater/revolutionize-your-machine-learning-workflow-with-sagemaker-pipelines-build-train-and-deploy-models-with-ease-2c8d</link>
      <guid>https://forem.com/omarkhater/revolutionize-your-machine-learning-workflow-with-sagemaker-pipelines-build-train-and-deploy-models-with-ease-2c8d</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;




&lt;p&gt;Are you interested in machine learning and looking for ways to optimize your workflow? Look no further than Sagemaker, Amazon Web Services' (AWS) fully managed machine learning service. With Sagemaker, one can develop, train, and deploy machine learning models at scale. Impressively, the &lt;strong&gt;Sagemaker Pipelines&lt;/strong&gt; help automate the highly iterative procedure of training, tuning, and deploying machine learning models at scale.&lt;/p&gt;

&lt;p&gt;In recent months, I had the opportunity to use &lt;strong&gt;Sagemaker Pipelines&lt;/strong&gt; extensively within my team. So, I decided to start a series of posts to review this service in-depth. In this first post, we are going to break down the basic elements of this service, also called &lt;strong&gt;Steps&lt;/strong&gt;, and providing some real world illustrations for combining these steps elegantly. &lt;/p&gt;

&lt;h2&gt;
  
  
  Steps Summary
&lt;/h2&gt;




&lt;p&gt;Are you ready to supercharge your machine-learning workflow with Sagemaker pipelines? With 15 different step types at your fingertips, organizing your pipeline has never been easier. Plus, these steps are grouped based on functionality for effortless recall. Keep reading for a summary of the steps below. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybyrxmn7p6p29pxe9a4s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybyrxmn7p6p29pxe9a4s.png" alt="Steps Summary" width="800" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Data Processing
&lt;/h3&gt;

&lt;p&gt;There are two ways to process data in SageMaker Pipelines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Processing&lt;/strong&gt;: This step offers the flexibility to preprocess data using a custom script or a pre-built container. The resulting output can be utilized as an input to the Training step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EMR&lt;/strong&gt;: This step enables data processing using Amazon Elastic MapReduce (EMR) clusters. EMR offers a managed Hadoop framework that allows processing of massive amounts of data quickly and efficiently.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Modeling
&lt;/h3&gt;

&lt;p&gt;Under the Training category, the following steps are included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Training&lt;/strong&gt;: This step trains a machine learning model using the data that was preprocessed in the previous Processing step. You can specify the algorithm to be used, along with input/output channels and hyperparameters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tuning&lt;/strong&gt;: This step tunes the hyperparameters of the model generated in the previous Training step. It tries a range of values for each hyperparameter and selects the combination of hyperparameters that yields the best performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AutoML&lt;/strong&gt;: This step automatically selects the optimal machine learning algorithm and hyperparameters for a specific problem. It uses various techniques, such as feature engineering and hyperparameter tuning, to generate the most effective model.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Monitoring
&lt;/h3&gt;

&lt;p&gt;This category includes the following steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ClarifyCheck&lt;/strong&gt;: This step examines the input data to detect any instances of bias and fairness problems. It generates a report that can assist in improving the data's quality and guaranteeing that machine learning models are impartial and equitable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;QualityCheck&lt;/strong&gt;: This step examines the results of the Training step to confirm that the model fulfills predefined quality standards. It can establish the standards based on metrics like accuracy, precision, and recall&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Deployment
&lt;/h3&gt;

&lt;p&gt;The Deployment category includes the following steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model&lt;/strong&gt;: This step creates a model artifact that can be used for inference. The artifact includes the trained model parameters, as well as any additional files or libraries required for inference.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CreateModel&lt;/strong&gt;: This step creates an Amazon Sagemaker model using the model artifact generated by the Model step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RegisterModel&lt;/strong&gt;: This step registers the created model with Amazon Sagemaker, which allows you to easily deploy the model to different endpoints.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transform&lt;/strong&gt;: This step creates a batch or real-time endpoint to serve predictions using the registered model.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Control Flow
&lt;/h3&gt;

&lt;p&gt;The Control Flow category includes the following steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Condition&lt;/strong&gt;: This step allows to define conditional logic in the pipeline. It can be used to branch the pipeline based on a specified condition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fail&lt;/strong&gt;: This step allows to intentionally fail the pipeline if certain conditions are met. It can be used to ensure that the pipeline stops if something unexpected happens.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Custom Functionalities
&lt;/h3&gt;

&lt;p&gt;The Custom Functionalities category includes the following steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Callback&lt;/strong&gt;: This step enables designating a custom script to be executed during the pipeline operation. It can be utilized for executing custom actions, like dispatching notifications or running extra processing steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lambda&lt;/strong&gt;: This step allows to execute an AWS Lambda function during pipeline operation. It can be utilized to accomplish various tasks, such as extracting metadata or transforming data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example Scenarios
&lt;/h2&gt;




&lt;p&gt;The &lt;a href="https://sagemaker-examples.readthedocs.io/en/latest/index.html" rel="noopener noreferrer"&gt;Sagemaker official documentation&lt;/a&gt; contains several great examples on using Sagemaker pipelines. We shed the light on few examples that demonstrate how one can use these steps together, effectively. &lt;/p&gt;

&lt;p&gt;These examples show just a few of the many ways Sagemaker Pipeline Steps can be combined to create end-to-end machine learning workflows. With the flexibility and scalability of Sagemaker, the possibilities are endless!&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 1: Create a Sagemaker Pipeline to Automate All the Steps from Data Preparation to Model Deployment
&lt;/h3&gt;

&lt;p&gt;In &lt;a href="https://sagemaker-examples.readthedocs.io/en/latest/end_to_end/fraud_detection/pipeline-e2e.html" rel="noopener noreferrer"&gt;Fraud Detection for Automobile Claim&lt;/a&gt; example, Sagemaker Pipelines is used to facilitate the collaboration within a team of data scientist, machine learning engineer, and ML Ops Engineer. &lt;/p&gt;

&lt;p&gt;The architecture below is used to automate all the steps from data preparation to model deployment. (&lt;a href="https://sagemaker-examples.readthedocs.io/en/latest/end_to_end/fraud_detection/pipeline-e2e.html" rel="noopener noreferrer"&gt;source&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7fvheqalec42vroc0oz0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7fvheqalec42vroc0oz0.png" alt="Scenario 1 architecture" width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The actual pipeline used in this demo after completion would be as shown below. (&lt;a href="https://sagemaker-examples.readthedocs.io/en/latest/end_to_end/fraud_detection/pipeline-e2e.html" rel="noopener noreferrer"&gt;source&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5qxeie4krbe3z3ysl55.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5qxeie4krbe3z3ysl55.png" alt="Executed Pipeline for scenario 1" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The mapping between each block name and type is described as well. &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step Number&lt;/th&gt;
&lt;th&gt;Step Name&lt;/th&gt;
&lt;th&gt;Step Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;ClaimsDataWranglerProcessingStep&lt;/td&gt;
&lt;td&gt;Processing Step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;CustomerDataWranglerProcessingStep&lt;/td&gt;
&lt;td&gt;Processing Step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;CreateDataset&lt;/td&gt;
&lt;td&gt;Processing Step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;XGBoostTrain&lt;/td&gt;
&lt;td&gt;Training Step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;ClarifyProcessor&lt;/td&gt;
&lt;td&gt;Processing Step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;ModelPreDeployment&lt;/td&gt;
&lt;td&gt;CreateModel Step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;XgboostRegisterModel&lt;/td&gt;
&lt;td&gt;Register Model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;DeployModel&lt;/td&gt;
&lt;td&gt;Processing Step&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Note that this example creates many processing steps using different methods (e.g., sagemaker.processing.Processor, SKLearnProcessor). This is an extremely powerful feature for sagemaker as it enables processing data using either pre-built containers provided by Sagemaker or fully custom docker images. Similarly, this can be used with training step. While this example use pre-built XGBoost container, it is also possible to use either a built-in algorithm, extend existing container, build custom docker image from scratch. &lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 2: Orchestrate Jobs to Train and Evaluate Models with Amazon Sagemaker Pipelines
&lt;/h3&gt;

&lt;p&gt;In this &lt;a href="https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.html" rel="noopener noreferrer"&gt;abalone age regression problem&lt;/a&gt;, Sagemaker Pipelines are used to conditionally approve trained models for deployment or flag an error. &lt;/p&gt;

&lt;p&gt;The pipeline for this scenario is shown below (&lt;a href="https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform_outputs.html" rel="noopener noreferrer"&gt;source&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcckozhthstvv6026nyyz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcckozhthstvv6026nyyz.png" alt="Architecture for problem: Age of an abalone snail from its physical measurements" width="800" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In conclusion, Sagemaker pipelines offer a powerful and flexible solution for building, training, and deploying machine learning models at scale. By automating the end-to-end machine learning process, Sagemaker pipelines can save significant time and effort for data scientists, machine learning engineers, and ML Ops engineers. With a wide range of steps available, including data processing, training, monitoring, deployment, control flow, and custom functionalities, Sagemaker pipelines provide a comprehensive solution for building complex machine learning workflows. The example scenarios provided in this post demonstrate just a few of the many ways in which Sagemaker pipeline steps can be combined to create end-to-end machine learning workflows. Overall, Sagemaker pipelines are an excellent tool for teams looking to collaborate efficiently and deploy machine learning models with ease.&lt;/p&gt;

&lt;p&gt;Stay tuned for future updates of this series to speak more about practical and up-to-date tips and tricks of this service, cost-optimization and more. Do you want me to inlcude or review specific part of the service? Let's connect and discuss.  &lt;/p&gt;

</description>
    </item>
    <item>
      <title>Amazon CodeWhisperer in review: The newest AI code companion</title>
      <dc:creator>omarkhater</dc:creator>
      <pubDate>Sat, 22 Oct 2022 23:50:30 +0000</pubDate>
      <link>https://forem.com/omarkhater/amazon-codewhisperer-the-newest-ai-code-companion-1ji4</link>
      <guid>https://forem.com/omarkhater/amazon-codewhisperer-the-newest-ai-code-companion-1ji4</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;




&lt;p&gt;Recently, AWS announced a &lt;a href="https://aws.amazon.com/about-aws/whats-new/2022/06/aws-announces-amazon-codewhisperer-preview/#:~:text=Amazon%20CodeWhisperer%20is%20a%20machine,integrated%20development%20environment%20(IDE)"&gt;machine-learning powered service&lt;/a&gt; that helps improve developer productivity by generating code recommendations based on developers' comments in natural language and their code in the integrated development environment. The service, &lt;strong&gt;called Amazon CodeWhisperer&lt;/strong&gt;, is still in preview to be used at no cost. This service is similar to &lt;a href="https://github.com/features/copilot"&gt;GitHub copilot&lt;/a&gt;, which Microsoft launched last year. &lt;/p&gt;

&lt;p&gt;In the past few months, I had a chance to experiment with this new service in a few use cases. As a machine learning (ML) developer, I had the advantage of utilizing ML to help develop ML solutions. Thus, I am writing about some observations after getting early access to this service. Besides, I am giving specific suggestions on how to make it &lt;em&gt;smarter&lt;/em&gt; and more accessible. &lt;/p&gt;

&lt;h2&gt;
  
  
  The service in action
&lt;/h2&gt;




&lt;p&gt;The service provides real-time code suggestions based on the comments in the code editor and previous codes in the same document. The service may suggest line completion or complete code blocks (e.g., methods). &lt;/p&gt;

&lt;p&gt;On Visual Studio, there are some handy shortcuts that make the use of the service more convenient. While the extension is enabled, the service provides online inference similar to the auto-complete feature supported by many IDEs. However, the user could hit the (Alt+C) to see the recommendations without waiting for the response.  &lt;/p&gt;

&lt;p&gt;Below is an example of writing the famous binary search method&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XHYv1Zgt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dfg4uvghd6etlegdofrr.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XHYv1Zgt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dfg4uvghd6etlegdofrr.gif" alt="Binary Search" width="880" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Interestingly, the service may suggest multiple code snippets that could be easily navigated (with left/right arrows) to choose the most suitable recommendation. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xuzVgr00--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/i28l2a6go5smg96vrypw.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xuzVgr00--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/i28l2a6go5smg96vrypw.gif" alt="Recursive BS" width="880" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Amazon CodeWhisprer is like the companion that tries to whisper in your ears with the right code. Hence, it is a pretty fancy and super descriptive name. Good job on naming the service. &lt;/p&gt;

&lt;h2&gt;
  
  
  A deep dive, How to get the most out of the service?
&lt;/h2&gt;




&lt;p&gt;AI code companion is a powerful tool that can boost developer productivity. Despite it may be argued that such a tool could replace the developers in the future, it is still too early to jump to this conclusion as the service is like any other service: &lt;strong&gt;&lt;em&gt;Garbage in Garbage out&lt;/em&gt;&lt;/strong&gt;. That's, it heavily depends on the input it takes to return good results. Below is an example of how the input quality can totally affect the output quality. &lt;/p&gt;

&lt;p&gt;Here, the provided description was so vague without any clear requirements, so the output is chaotic imports after waiting for a relatively long time. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QEXOF9R2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8jvm6ub03ge2h37qrw0n.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QEXOF9R2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8jvm6ub03ge2h37qrw0n.gif" alt="Vague Input" width="880" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As the input description becomes more clear, the output becomes so much better as shown below for a similar, yet clearer problem. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--S9wPAGw0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pfyjulpxile0wpio60qv.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--S9wPAGw0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pfyjulpxile0wpio60qv.gif" alt="Better Input" width="880" height="314"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In addition, the quality of the recommendations significantly improves as the user adds more context i.e., as the developer writes more code. For example, it is expected to get faster and more personalized results while working on one project in comparison to either isolated tasks on the same document or at the early moments of the project when the context is just still not enough. &lt;/p&gt;

&lt;p&gt;Nevertheless, the service is not expected to return helpful answers for infamous custom tasks. Below is an example of the same binary search problem but with slight modifications to the input format. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vDAUyDAW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qq6h1jooktu30grt5yly.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vDAUyDAW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qq6h1jooktu30grt5yly.gif" alt="Binary Search Duplicates" width="880" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Apparently, the engine couldn't understand the slight modification to the problem (i.e., allowing duplicated elements) and still produce the same code suggested earlier.&lt;/p&gt;

&lt;p&gt;As the service is still in preview, it is expected to have areas for improvement. Below is a curated list of actions that can make the service much better. &lt;/p&gt;

&lt;h3&gt;
  
  
  Inference speed:
&lt;/h3&gt;

&lt;p&gt;As might be noted in the examples above, the service takes non-trivial time to suggest recommendations. I believe there is considerable space for improvement in this aspect. &lt;/p&gt;

&lt;h3&gt;
  
  
  Consistency and real-time capabilities:
&lt;/h3&gt;

&lt;p&gt;The service is expected to give real-time recommendations while the developer writes the code. However, the real-time suggestions might not give any output at a specific time instant. Surprisingly, hitting the (Alt+C) shortcut returns workable solutions without changing anything (i.e., at the same time instant).&lt;/p&gt;

&lt;h3&gt;
  
  
  End-user Customization:
&lt;/h3&gt;

&lt;p&gt;The recommendation engine under the hood uses a huge code library from many sources that were written for different purposes. It is justified to enable more customization for the sources that you accept for some projects. &lt;/p&gt;

&lt;p&gt;Also, it might be beneficial to predict the codes based on the project theme. For example, Machine Learning development is completely different from developing mobile applications. &lt;/p&gt;

&lt;p&gt;As another example, the user might want to work on a project that requires multiple blocks of codes that need to be designed and aggregated. On other projects, it might be needed to prioritize line completion more than block suggestions. &lt;/p&gt;

&lt;p&gt;The list of examples for customization is huge and needs careful design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solutions Ranking:
&lt;/h3&gt;

&lt;p&gt;It is a great feature to suggest multiple solutions. However, in practice, the ranking of these solutions is not optimal and the user needs to navigate all solutions to find the right suggestion. This is can be tedious and reduce the overall productivity gain.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Problem Customization:
&lt;/h3&gt;

&lt;p&gt;The engine effectively understands common problems found in the training corpus. However, it is a challenge to adapt to the subtle modifications of the same problem. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;




&lt;p&gt;In summary, AWS CodeWhisprer (and AI code companions in general) is not magic after all that can solve all problems. However, it is a great tool to enhance developer productivity by focusing on the right problems instead of tedious repetitive tasks.&lt;/p&gt;

&lt;p&gt;To get the most out of the AWS CodeWhisprer (and AI code companions in general), the following actions might help in achieving the desired goals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Concise comments&lt;/strong&gt;: the clearer and more well-defined the input tasks, the higher the probability of getting quality results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified Projects&lt;/strong&gt;: The AI engine collects information from the whole document. Hence, it enriches the context continuously. Thus, it is more beneficial to use it on tasks that have a connection in one way or another.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid advanced custom problems&lt;/strong&gt;: the less popular the problem, the higher the probability it will not return any helpful answer.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LC_j0JZV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/04egizvhvjezsja4rp5g.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LC_j0JZV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/04egizvhvjezsja4rp5g.gif" alt="Binary Search" width="880" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
