<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ryan Nazareth</title>
    <description>The latest articles on Forem by Ryan Nazareth (@ryankarlos).</description>
    <link>https://forem.com/ryankarlos</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F918266%2F27533df8-701e-4623-9e1c-23135f4d37b6.jpeg</url>
      <title>Forem: Ryan Nazareth</title>
      <link>https://forem.com/ryankarlos</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ryankarlos"/>
    <language>en</language>
    <item>
      <title>AWS Sagemaker Notebook Jobs for Accelerating Data Science Experimentation Workflows with Mlflow and Optuna</title>
      <dc:creator>Ryan Nazareth</dc:creator>
      <pubDate>Mon, 05 Jan 2026 03:21:13 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-sagemaker-notebook-jobs-for-accelerating-data-science-experimentation-workflows-with-mlflow-and-546j</link>
      <guid>https://forem.com/aws-builders/aws-sagemaker-notebook-jobs-for-accelerating-data-science-experimentation-workflows-with-mlflow-and-546j</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Hyperparameter tuning across multiple models presents a common challenge for ML practitioners. Tracking experiment results, managing configurations, and ensuring reproducibility becomes increasingly difficult as the number of models grows. This post walks through a solution that combines &lt;a href="https://aws.amazon.com/sagemaker/" rel="noopener noreferrer"&gt;Amazon SageMaker&lt;/a&gt;, &lt;a href="https://mlflow.org/" rel="noopener noreferrer"&gt;MLflow&lt;/a&gt;, and &lt;a href="https://optuna.org/" rel="noopener noreferrer"&gt;Optuna&lt;/a&gt; to create an automated, scalable hyperparameter optimization pipeline.&lt;/p&gt;

&lt;p&gt;The use case that motivated this work involved training separate demand forecasting models for different product categories—smartphones, laptops, tablets, and accessories. Each category exhibits distinct patterns, making category-specific models more effective than a single unified model. The goal was to automate the hyperparameter search, centralize experiment tracking, and enable parallel training across all categories.&lt;/p&gt;

&lt;p&gt;Manual hyperparameter tuning workflows often suffer from several issues. Experiment results end up scattered across notebooks and spreadsheets. Configurations from previous runs get lost or forgotten. Comparing results across different models requires tedious manual aggregation. And scaling to additional models means duplicating effort. A viable solution needs to address these pain points while integrating smoothly with existing ML workflows and AWS infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;The architecture leverages several AWS services working together. &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html" rel="noopener noreferrer"&gt;SageMaker Studio&lt;/a&gt; provides the development environment for notebook-based experimentation. When ready for full optimization runs, &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html" rel="noopener noreferrer"&gt;SageMaker Pipelines&lt;/a&gt; orchestrates notebook jobs for each product category in parallel. Each job uses Optuna to search for optimal XGBoost hyperparameters, with all experiments logged to a &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow.html" rel="noopener noreferrer"&gt;managed MLflow server&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Model artifacts, metrics, and visualizations are stored in Amazon S3. The entire infrastructure is defined in CloudFormation, enabling consistent deployments across accounts and regions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F93cigklorjqerna7mebk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F93cigklorjqerna7mebk.png" alt=" " width="800" height="764"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The stack can be deployed by running the following command in bash, by setting your user (for sagemaker domain), bucket name for storing the artifacts, and the region.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;infrastructure
./infrastructure/deploy.sh &lt;span class="nt"&gt;--user&lt;/span&gt; ryan &lt;span class="nt"&gt;--bucket&lt;/span&gt; sm-mlflow-optuna &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6cw8qs3coy5n0cfwbq1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6cw8qs3coy5n0cfwbq1.png" alt=" " width="800" height="811"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can monitor the deployment of the resources in the cloudformation console under the name of the stack.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; Deployment typically  minutes, with most of that time spent provisioning the MLflow server (however you could also update the cloudformation template and use mlflow serverless option in Studio as well).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcrf96rhn47buahqk4y1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcrf96rhn47buahqk4y1.png" alt=" " width="800" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After deployment, access your SageMaker Studio domain via your created user from the cloudformation template. You will see the private space provisioned. The cloudformation stack deploys this with the latest Sagemaker distribution image and default &lt;code&gt;ml.t3.medium&lt;/code&gt; instance size. A lifecycle configuration script is also attached to install additional dependencies. Furthermore, auto shutdown is also enabled to shut the space after 60 mins of idle activity. Run the space and wait for it to show in running state.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi55y9cc4kvhtg0g9h246.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi55y9cc4kvhtg0g9h246.png" alt=" " width="800" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Navigate to the mlflow app and under tracking server, you should see your tracking server provisioned, with the artifact location as a prefix within the bucket deployed in cloudformation. Make a note of the mlflow tracking server arn as you will need to update it in the notebooks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg78zrcgp4skm9ooyyxl7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg78zrcgp4skm9ooyyxl7.png" alt=" " width="800" height="263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the space is in running state, click open. Navigate to git branch icon in the sidebar and  clone the repository &lt;code&gt;https://github.com/ryankarlos/sagemaker_mlflow_optuna.git&lt;/code&gt; using https.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9lxv9v3y6u1ircxj5jv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9lxv9v3y6u1ircxj5jv.png" alt=" " width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The repository has two notebooks which will be used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/ryankarlos/sagemaker_mlflow_optuna/blob/master/fm_train.ipynb" rel="noopener noreferrer"&gt;fm_train.ipynb&lt;/a&gt;: Notebook that runs the execution of the data preparation, processing and model training, logging to mlflow server using optuna as backend for hyperparameter tuning. When running the notebook, the execution will run for the category and the parameters defined in the notebook cell. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Open the notebook at update the section for the mlflow arn you noted previously. We will briefly go over what each of the cells is doing in the next sections.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu6yx81e76nqhlmrcf9ej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu6yx81e76nqhlmrcf9ej.png" alt=" " width="800" height="374"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/ryankarlos/sagemaker_mlflow_optuna/blob/master/fm_train.ipynb" rel="noopener noreferrer"&gt;nb-job-pipeline.ipynb&lt;/a&gt;: This is the main orchestration notebook, here we execute different configurations of &lt;code&gt;fm_train.ipynb&lt;/code&gt; notebook as Notebook Job Steps and stitch them together into a singular SageMaker Pipeline. This will run the training of models for  each of the 4 product categories in the dataset, so we will have 4 models. We will describe how we accomplish this in future sections as it will require a few settings in Sagemaker Studio from the user.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You will need to update the variables for &lt;code&gt;bucket&lt;/code&gt; and &lt;code&gt;region&lt;/code&gt; in the cell in screenshot below if you have deployed the cloudformation stack with different values.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge67yjngfd1a1u6d4uv8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge67yjngfd1a1u6d4uv8.png" alt=" " width="800" height="474"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Preparation
&lt;/h2&gt;

&lt;p&gt;This example uses synthetic electronics sales data generated through Claude Opus 4.5 in Kiro, for number of daily units sold for laptops, smartphones, accessories and tablets. The data generator creates features with realistic correlations to the target variable, including price sensitivity, promotional effects, seasonality, and competitive dynamics, weekends. Some of the requirements in the prompt, when generating The synthetic data was to produce feature correlations in the 0.2-0.76 range against the target, providing Optuna with meaningful signal for optimization. Weak or nonexistent correlations would limit the effectiveness of any hyperparameter search. The target variable &lt;code&gt;units_sold&lt;/code&gt; was generated with based on a combination of these features with some added noise. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd5e5f5u9o0ovx8jjk5p3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd5e5f5u9o0ovx8jjk5p3.png" alt=" " width="800" height="187"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Hyperparameter Optimization with Optuna
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/optuna/optuna?tab=readme-ov-file" rel="noopener noreferrer"&gt;Optuna&lt;/a&gt; is an automatic hyperparameter optimization software framework, particularly designed for machine learning. it handles the hyperparameter search via Bayesuan Optimisation using a &lt;a href="https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.TPESampler.html" rel="noopener noreferrer"&gt;Tree-structured Parzen Estimator (TPE)&lt;/a&gt; sampler by default, although users have the option of choosing other sampler options. TPE models the relationship between hyperparameters and objective values, focusing exploration on promising regions of the search space.&lt;/p&gt;

&lt;p&gt;For this example, we are going to use XgBoost for predicting the number of units sold for each category. The XGBoost search space includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Booster type (gbtree, gblinear, dart)&lt;/li&gt;
&lt;li&gt;Regularization parameters (lambda, alpha) with log-uniform distributions&lt;/li&gt;
&lt;li&gt;Tree depth and learning rate&lt;/li&gt;
&lt;li&gt;Growth policy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Log-uniform distributions work well for regularization parameters where optimal values can span several orders of magnitude. The &lt;a href="https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/002_configurations.html" rel="noopener noreferrer"&gt;Optuna documentation on search spaces&lt;/a&gt; covers the available distribution options.&lt;/p&gt;

&lt;p&gt;Optuna uses the concepts of study and trial. A study is the optimization based on an objective function. A trial is a single execution of the objective function&lt;/p&gt;

&lt;p&gt;The objective function for this use case is defined as below. The goal of a study is to find out the optimal set of hyperparameter values through multiple trials (e.g., n_trials=50).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Optuna objective function with MLflow child runs.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trial-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nested&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Suggest hyperparameters
&lt;/span&gt;        &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;objective&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reg:squarederror&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eval_metric&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rmse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;booster&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_categorical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;booster&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gbtree&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gblinear&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e-8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;alpha&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;alpha&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e-8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# Conditional hyperparameters based on booster type
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;booster&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gbtree&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e-8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gamma&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gamma&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e-8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grow_policy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_categorical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grow_policy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;depthwise&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lossguide&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Train model
&lt;/span&gt;        &lt;span class="n"&gt;dtrain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DMatrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;dvalid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DMatrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;valid_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;valid_y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtrain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_boost_round&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;preds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dvalid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Calculate metrics
&lt;/span&gt;        &lt;span class="n"&gt;mse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mean_squared_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;valid_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;preds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;rmse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Log to MLflow
&lt;/span&gt;        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_params&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rmse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rmse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;mse&lt;/span&gt;  &lt;span class="c1"&gt;# Optuna minimizes this value
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Optuna provides integration with mlflow which allows every trial to be systematically recorded.  Mlflow allows nesting runs for each experiment.  Each iteration (or trial) in Optuna can be considered a 'child run' in mlflow. Each child run will track the specific hyperparameters used and the resulting metrics, providing a consolidated view of the entire optimization process. All child runs can be grouped into a parent run in mlflow, which represents the entire optimization study for a particular product catogory e.g. laptops. This structure keeps experiments organized in the MLflow where the overall best result appears at the parent level, with individual trials available for detailed inspection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parameterizing Notebooks for Pipeline Execution
&lt;/h2&gt;

&lt;p&gt;SageMaker Pipelines executes notebooks as jobs, but proper parameterization is essential. The mechanism relies on cell tags—specifically, ptagging a cell with "parameters"](&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run-troubleshoot-override.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run-troubleshoot-override.html&lt;/a&gt;) using the JupyterLab metadata editor.&lt;/p&gt;

&lt;p&gt;In this example, you will need to tag the cell in the notebook in the screenshot below, with the parameters tag, This cell defines all the parameters/configuration which may need to be changed with every model training run for each category e.g. category name, model starting parameters, number of optuna trials etc. &lt;br&gt;
Open the &lt;code&gt;fb_train.ipynb&lt;/code&gt; notebook, select the cell titled &lt;code&gt;Configuration&lt;/code&gt; and expand the &lt;code&gt;common tools&lt;/code&gt; section in the right sidebar. You should see a parameters tag already in the tag box, click on it to apply the tab to the cell. This will appear with a check mark as in the screenshot below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmufxvbrpkzi0l24olvzt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmufxvbrpkzi0l24olvzt.png" alt=" " width="800" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a notebook job run, the notebook job executor searches for a Jupyter cell tagged with the parameters tag and applies the new parameters or parameter overrides immediately after this cell. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note&lt;/em&gt; All parameter values must be strings, so any constants that are overridden are injected as strings. Hence, you will see one of the cells in the notebook casts, the variables back to int and float as required. The &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run.html" rel="noopener noreferrer"&gt;notebook jobs documentation&lt;/a&gt; provides complete details on parameterization.&lt;/p&gt;
&lt;h2&gt;
  
  
  SageMaker Pipeline with Notebook Job Steps for Running Training Across Multiple Categories
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4pb2hcwz98sp6qc3e0w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4pb2hcwz98sp6qc3e0w.png" alt=" " width="800" height="947"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run.html" rel="noopener noreferrer"&gt;SageMaker Notebook Step Jobs&lt;/a&gt; enable automated execution of Jupyter notebooks as managed compute jobs. When integrated with SageMaker Pipelines, they provide a scalable mechanism for running parameterized training workflows across multiple model categories.&lt;/p&gt;

&lt;p&gt;The pipeline creates a &lt;code&gt;NotebookJobStep&lt;/code&gt; for each product category using the SageMaker Pipelines SDK. Each step is configured with category-specific parameters, compute resources, and execution policies. The &lt;a href="https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.notebook_job_step.NotebookJobStep" rel="noopener noreferrer"&gt;NotebookJobStep API reference&lt;/a&gt; details all available configuration options.&lt;br&gt;
The following snippet from the notebook, is how the notebook job step is created using the [python sdk] in the &lt;a href="https://github.com/ryankarlos/sagemaker_mlflow_optuna/blob/master/notebook_pipeline.ipynb" rel="noopener noreferrer"&gt;notebook_pipeline.ipynb)&lt;/a&gt; notebook (&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/create-notebook-auto-run-sdk.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/sagemaker/latest/dg/create-notebook-auto-run-sdk.html&lt;/a&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.workflow.notebook_job_step&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NotebookJobStep&lt;/span&gt;

&lt;span class="n"&gt;nb_step&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;NotebookJobStep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XGBoost training for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;notebook_job_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;image_uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;kernel_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;kernel_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_execution_role&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;s3_root_uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;notebook_artifacts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;additional_dependencies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/home/sagemaker-user/sagemaker_mlflow_optuna/scripts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;initialization_script&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nb_job_init.sh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;input_notebook&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_notebook&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;instance_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;instance_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;nb_job_params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_runtime_in_seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_retry_attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are few more non-default additional settings that need to be included. The &lt;em&gt;parameters&lt;/em&gt; to override in the notebook need to be passed as a dictionary - an example as below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
"category: "smartphones",
"n_trials": 50,
"experiment_name": "electronics-smartphones",
"test_size": 0.25
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;em&gt;additional dependencies&lt;/em&gt; option allows us to include any additional files or folders along with the notebook to be made available when the job is running in the Sagemaker managed instance.  Here the &lt;code&gt;scripts&lt;/code&gt; folder path is included as the notebook imports from some python modules in this folder.  An &lt;em&gt;initialization script&lt;/em&gt; option, allows installing the necessary libraries in the instance which may not be present in the base image uri defined. We also need to include some other scripts  along with the notebook, as it imports from functions from these scripts. Sagemaker does not include any other file besides the main &lt;em&gt;input_notebook&lt;/em&gt; file by default when initialisation the training job instance. The &lt;em&gt;additional_dependencies&lt;/em&gt; option allows different folder or files to passed as a list, The Notebook Job will now have access to all files under the input file's folder, in this case &lt;code&gt;scripts&lt;/code&gt;. While the notebook job is running the file structure of directory remains unchanged&lt;/p&gt;

&lt;p&gt;Whilst, we use the python sdk to automate this, a Notebook Job can also be initiated via the console as explained in the &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/create-notebook-auto-run-studio.html" rel="noopener noreferrer"&gt;docs&lt;/a&gt;. At the top of notebook tab, click on the Notebook Jobs widget in blue &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7be8lzwfmkuzyct12dvg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7be8lzwfmkuzyct12dvg.png" alt=" " width="800" height="250"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the next configuration tab, we can input all the options required e.g. adding in parameters, including additional files/scripts folder the scheduler tries to infer a selection of default options and automatically populates the form to help you get started quickly. If you are using Studio, at minimum you can submit an on-demand job without setting any options. You can also submit a (scheduled) notebook job definition supplying just the time-specific schedule information. However, you can customize other fields if your scheduled job requires specialized settings. If you are running a local Jupyter notebook, the scheduler extension provides a feature for you to specify your own defaults (for a subset of options) so you don't have to manually insert the same values every time.&lt;/p&gt;

&lt;p&gt;When you create a notebook job, you can include additional files such as datasets, images, and local scripts. To do so, choose Run job with input folder. The Notebook Job will now have access to all files under the input file's folder. While the notebook job is running the file structure of directory remains unchanged.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxgmrouf6j8ukmo0rynq0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxgmrouf6j8ukmo0rynq0.png" alt=" " width="800" height="660"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each notebook job runs on its own compute instance, enabling true parallel execution. The &lt;code&gt;ml.m5.xlarge&lt;/code&gt; instance type (4 vCPUs, 16 GB RAM) provides sufficient resources for XGBoost training with 50 Optuna trials. For larger workloads or GPU-accelerated training, you can specify different instance types. The &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-available-instance-types.html" rel="noopener noreferrer"&gt;SageMaker instance types documentation&lt;/a&gt; lists all available options for notebook jobs.&lt;/p&gt;

&lt;p&gt;By defining an iterable of Notebook Job Steps with different parameter configurations for each of the categories, we can then execute these as a Sagemaker Pipeline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PipelineSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_execution_role&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pipeline_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pipeline_steps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sagemaker_session&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role_arn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;execution&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pipeline: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pipeline_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Execution: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arn&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the pipeline starts, all four categories begin training simultaneously. Each runs 50 Optuna trials, logs results to MLflow, and saves the best model. You can monitor the pipeline execution from the Sagemaker Studio UI under the Pipelines section and check the logs for each step. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fil12u86yzg7izen87w7m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fil12u86yzg7izen87w7m.png" alt=" " width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Notebook Job Logs and Executed Notebooks
&lt;/h3&gt;

&lt;p&gt;After execution, notebook job outputs for all the steps in the pipeline are stored in S3 at the specified &lt;code&gt;s3_root_uri&lt;/code&gt; under a prefix associated with the Sagemaker pipeline execution id, as shown in screenshot below &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F90qb871ido91sjcje645.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F90qb871ido91sjcje645.png" alt=" " width="800" height="662"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Download the output.tar.gz file, unzip and the executed notebook should be named after the name of the step. In addition, the sagemaker execution log file is also attached. Open the notebook in sagemaker to view the cell executions or any execution log errors. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5vv7pgxcwwlb5tqvndhc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5vv7pgxcwwlb5tqvndhc.png" alt=" " width="800" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Integration with MLflow and Experiment Tracking
&lt;/h3&gt;

&lt;p&gt;The MLflow UI displays all experiments organized by category. Each experiment shows optimization history—how the objective value improved across trials. The nested run structure (parent run per category, child runs per trial) provides clear organization. Runs can be compared, parameter distributions examined, and artifacts downloaded for offline analysis.&lt;/p&gt;

&lt;p&gt;Notebook jobs integrate seamlessly with MLflow because they run in isolated environments with the same MLflow tracking URI configured. Each job connects to the managed MLflow server independently, ensuring all experiments are logged centrally regardless of which compute instance executed the training.&lt;/p&gt;

&lt;p&gt;The nested run structure provides clarity. At a glance, the best result for each category is visible. Expanding a parent run reveals all child trials with their logged parameters and metrics.&lt;/p&gt;

&lt;p&gt;Optuna's &lt;a href="https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/005_visualization.html" rel="noopener noreferrer"&gt;built-in visualizations&lt;/a&gt;—parameter importance plots, parallel coordinate plots, optimization history—are logged as artifacts alongside the models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa7btfmsu4qpc1gscz670.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa7btfmsu4qpc1gscz670.png" alt="MlflowRunsPipeline" width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clicking on any child runs (tuning trial) reveals logged metrics associated with the trial&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gved4dyn3easfri1fkc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gved4dyn3easfri1fkc.png" alt=" " width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The parent run, stores the best model metrics and logged model artifacts, signatures, plots, code etc. The model can then be retrieved for inference or  added to the mdoel registry for versioning, promotion and deployment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gved4dyn3easfri1fkc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gved4dyn3easfri1fkc.png" alt=" " width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Plots that were logged such as feature importance and residual plots are visible directly in the Mlflow console&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhmgjsapovqtgmep0m4cc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhmgjsapovqtgmep0m4cc.png" alt=" " width="800" height="562"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzyam1iut6x8pnrnieq6b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzyam1iut6x8pnrnieq6b.png" alt=" " width="800" height="522"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By setting the environment variable &lt;code&gt;MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING&lt;/code&gt; to true in the code, we can also log system metrics automatically for each run and child run to help decide on optimal instance types for future runs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdb9pxrcoykpn7wy2hc67.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdb9pxrcoykpn7wy2hc67.png" alt=" " width="800" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tearing Down Resources
&lt;/h2&gt;

&lt;p&gt;Once you are done experimenting, you can tear down the resources to save cost by navigating to the cloudformation console and selecting delete stack. Before doing this, make sure that you shut down any running Sagemaker apps and empty the bucket contents. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7nw2p25j2w00saxyjfd3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7nw2p25j2w00saxyjfd3.png" alt=" " width="779" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can monitor the resources state. Note, that deletion of mlflow may take over 20 mins.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6d0qroi0fg4okwtclxk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6d0qroi0fg4okwtclxk.png" alt=" " width="800" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>sagemaker</category>
      <category>mlflow</category>
      <category>optuna</category>
      <category>aws</category>
    </item>
    <item>
      <title>Building a Sports Marketing Video Generator using Nova Canvas and Nova Reel</title>
      <dc:creator>Ryan Nazareth</dc:creator>
      <pubDate>Mon, 19 May 2025 03:04:42 +0000</pubDate>
      <link>https://forem.com/aws-builders/transforming-static-images-into-dynamic-videos-with-amazon-nova-for-sports-marketing-1he3</link>
      <guid>https://forem.com/aws-builders/transforming-static-images-into-dynamic-videos-with-amazon-nova-for-sports-marketing-1he3</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In today's fast-paced digital marketing landscape, creating engaging sports content quickly is essential. Traditional video production is time-consuming and expensive, often requiring specialized skills and equipment. What if marketers could generate professional sports marketing videos from a single image with just a few clicks?&lt;/p&gt;

&lt;p&gt;In this blog post, we'll explore a solution called that builds a streamlit application hosted n ECS and leverages Amazon Nova's generative AI capabilities to transform sports images into dynamic marketing videos.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Business Challenge
&lt;/h2&gt;

&lt;p&gt;Marketing teams across industries face common challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Limited resources for video production&lt;/li&gt;
&lt;li&gt;Need for rapid content creation across multiple channels&lt;/li&gt;
&lt;li&gt;Maintaining brand consistency across all visual assets&lt;/li&gt;
&lt;li&gt;Scaling content production without proportionally scaling costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This solution addresses these challenges by providing an intuitive web interface that allows non-technical users to generate professional-quality video content from static images with just a few clicks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The Canvas Video application is built on a modern, serverless architecture that leverages several AWS services. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;App: Dockerised streamlit application deployed in AWS ECS (Fargate) as a service, with tasks running in multiple availability zones to ensure high availability.&lt;/li&gt;
&lt;li&gt;Authentication: Amazon Cognito user pool with username and password for authentication.&lt;/li&gt;
&lt;li&gt;Load Balancer: ALB intercepts requests, redirects unauthenticated users to Cognito for authentication, and then forwards authenticated users to the backend application with user claims, enabling secure access to the app running as ECS tasks. &lt;/li&gt;
&lt;li&gt;AI Services: AWS Bedrock (Nova Reel, Nova Canvas) and Amazon Rekognition&lt;/li&gt;
&lt;li&gt;Storage: S3 for storing the videos generated from Nova Reel. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmb7qda5dcyj69tuadpb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmb7qda5dcyj69tuadpb.png" alt=" " width="761" height="626"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The basic flow is as follows. A user navigates to url alias record in Route53 and authenticates via Cognito. User uploads a sports image. AWS Rekognition analyzes the image to confirm it is a sports-related image. The user has the option of editing the image using Nova Canvas by providing inpainting/outpainting related prompts. Nova Reel generates a sports marketing video based on the image and prompt. The video is stored in S3 and a presigned URL is provided to the user in the streamlit application.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Sports Image Classification
&lt;/h3&gt;

&lt;p&gt;The app uses Amazon Rekognition to analyze uploaded images and determine if they're sports-related. It can identify specific sports like basketball, football, soccer, etc.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_sports_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Determine if the image is sports-related using Amazon Rekognition&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;rekognition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rekognition&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rekognition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;detect_labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bytes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Extract labels from the response
&lt;/span&gt;        &lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Labels&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;

        &lt;span class="c1"&gt;# Determine the specific sport type from labels
&lt;/span&gt;        &lt;span class="n"&gt;sport_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;determine_sport_type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Labels&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="c1"&gt;# Check if any sports keywords are in the labels
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sports_keywords&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sport_type&lt;/span&gt;

        &lt;span class="c1"&gt;# Check for confidence scores on sports-related activities
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Labels&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sports_keywords&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Confidence&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sport_type&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;General Sports&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error in sports image classification: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;General Sports&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Image Editing with Nova Canvas
&lt;/h3&gt;

&lt;p&gt;Before generating a video, users can enhance their images using AWS Bedrock Nova Canvas. The app supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inpainting&lt;/strong&gt;: A technique used in image processing to fill in missing or damaged parts of an image in a way that blends seamlessly with the surrounding areas. This process can reconstruct removed elements or extend the background of an image while preserving the visual coherence of textures, colors, and patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outpainting&lt;/strong&gt;: A technique used to expand an image beyond its original borders by generating new visual content that seamlessly extends the existing scene. Unlike inpainting, which fills in missing areas within an image, outpainting creatively imagines what might lie beyond the current frame, effectively "uncropping" it to add context or detail.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The method implementing this logic is included in the snippet below.&lt;br&gt;
The mask prompt tells Canvas which parts of the image to edit or preserve For inpainting, the mask is what you want to change&lt;br&gt;
whilst dor outpainting the mask is what you want to keep.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;negative_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;main_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mask_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;operation_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process image using Amazon Nova Canvas for inpainting or outpainting with sports focus&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Check if the image is sports-related
&lt;/span&gt;        &lt;span class="n"&gt;is_sports&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sports_classifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_sports_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;is_sports&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NOT_SPORTS_IMAGE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Convert image bytes to base64
&lt;/span&gt;        &lt;span class="n"&gt;image_base64&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Use default config if none provided
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_IMAGE_CONFIG&lt;/span&gt;

        &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;taskType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;operation_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;imageGenerationConfig&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# Add the appropriate parameters based on operation type
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;operation_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INPAINTING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inPaintingParams&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;main_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maskPrompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;mask_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;negativeText&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;negative_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image_base64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;operation_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OUTPAINTING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;outPaintingParams&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;main_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maskPrompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;mask_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;negativeText&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;negative_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image_base64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bedrock_runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amazon.nova-canvas-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;accept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accept&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contentType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content_type&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;base64_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;base64_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64_image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ascii&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base64_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error in Nova Canvas processing: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Video Generation with Nova Reel
&lt;/h3&gt;

&lt;p&gt;The core feature is video generation using AWS Bedrock Nova Reel. This generates dynamic videos using using the context provided in the image and a video prompt that is autogenerated in the backend. Users can select from different marketing styles to influence/update the prompt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic action&lt;/li&gt;
&lt;li&gt;Athlete showcase&lt;/li&gt;
&lt;li&gt;Team spirit&lt;/li&gt;
&lt;li&gt;Fan experience&lt;/li&gt;
&lt;li&gt;Product in action&lt;/li&gt;
&lt;li&gt;Inspirational&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The video generation request to Amazon Nova Reel is invoked asynchronously using the context provided in the image and the results are stored in Amazon S3 for later retrieval. The video object is accessible by the user in the frontend through an auto generated presigned url.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; Amazon Nova currently only allows a video duration of 6 secs   if using text prompt with an Image. &lt;/p&gt;

&lt;h3&gt;
  
  
  4. Prompt Enhancement
&lt;/h3&gt;

&lt;p&gt;The app uses a base prompt template and enhances it with sport-specific details:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enhance_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;marketing_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;brand&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sport_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Enhance the marketing prompt with the base Nova Reel prompt&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOVA_REEL_BASE_PROMPT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;marketing_prompt&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;brand&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sport_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; in the context of &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sport_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deploying the application to AWS
&lt;/h2&gt;

&lt;p&gt;To deploy this solution in your AWS environment, first clone the repository and install the python dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git clone https://github.com/ryankarlos/llm-use-cases.git
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;image_and_video

&lt;span class="nv"&gt;$ &lt;/span&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; venv/bin/activate

pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; src/image_and_video/requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we will need to deploy the resources in AWS shown in the architecture diagram using terragrunt and terraform. You can install it &lt;a href="https://terragrunt.gruntwork.io/docs/getting-started/install/" rel="noopener noreferrer"&gt;here&lt;/a&gt; and &lt;a href="https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli#install-terraform" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;inputs&lt;/code&gt; and &lt;code&gt;locals&lt;/code&gt; block in terragrunt.hcl file in the terragrunt folder has a list of default variables set which will be passed to the terraform scripts during deployment of the resources.&lt;br&gt;
You can update some of the variables listed below (the email and username will need to be updated to your username and email so you can reset your password set by Cognito&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;locals {
  region = "us-east-1"
  username = &amp;lt;replace with your own username for cognito&amp;gt;
  email = &amp;lt;replace with your own email&amp;gt;
  ecr_repo_name = "canvas-video"
}

inputs = {
  hosted_zone_name = "awscommunity.com"
  ....
  bucket_name = "image-llm-example"
  subdomain = "nova-video"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we will run terragrunt commands following your installation to generate a plan of the resources to be deployed and then apply the changes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;terragrunt
&lt;span class="nv"&gt;$ &lt;/span&gt;terragrunt plan
&lt;span class="nv"&gt;$ &lt;/span&gt;terragrunt apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj8d9knupa3z8gmtk4e9z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj8d9knupa3z8gmtk4e9z.png" alt=" " width="800" height="279"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the resources are deployed, we will need to build and push the docker image with the app code, to ECR, which will be automatically deployed to the ECS service.  Execute the &lt;code&gt;ecr-build-push.sh&lt;/code&gt; script which will handle the Docker build and push process. Note that this uses the defaults for region and ecr repo name. If you have selected different values, then you will need to update these defaults or set the environment variables accordingly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Configuration&lt;/span&gt;
&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;span class="nv"&gt;ECR_REPOSITORY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ECR_REPOSITORY&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="s2"&gt;"canvas-video"&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;span class="nv"&gt;IMAGE_TAG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;IMAGE_TAG&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="s2"&gt;"latest"&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Felwpilcl0dlunezu6lif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Felwpilcl0dlunezu6lif.png" alt=" " width="800" height="560"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the image is pushed, a manual step will need to be performed to deploy the task in ECS. Navigate to the ECS console, and navigate to the service in the cluster that was deployed. Click on the &lt;code&gt;Update Service&lt;/code&gt; option and select &lt;code&gt;force new deployment&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gciptu8qmeu8a8hdrfa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gciptu8qmeu8a8hdrfa.png" alt=" " width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Wait for a few minutes for the deployment. The tasks will change status from &lt;code&gt;PROVISIONING&lt;/code&gt; to &lt;code&gt;PENDING&lt;/code&gt; to &lt;code&gt;ACTIVATING&lt;/code&gt; and if successful finally show as &lt;code&gt;RUNNING&lt;/code&gt; status. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F496uh5p1szqttcbdlzes.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F496uh5p1szqttcbdlzes.png" alt=" " width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmdhek7bz7n1qxdvh293m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmdhek7bz7n1qxdvh293m.png" alt=" " width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Navigate to the Route53 url and you should be able to see the Cognito login page. Enter your username and temp password sent to your email address, which will prompt for a password reset. Reset your password and you should now see the streamlit application home page.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the Application
&lt;/h2&gt;

&lt;p&gt;Upload the image using the upload image tab. This app uses Amazon Rekognition behind the scenes to label the image. If the image is not sports-related, it will throw an error.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4c2ltxbrfnyglg3asnuz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4c2ltxbrfnyglg3asnuz.png" alt=" " width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the image is sports-related, you will see the image uploaded and the tags. The app automatically identifies elements like "basketball," "soccer," or "athlete" in your image, helping tailor the marketing content to your specific sport. &lt;/p&gt;

&lt;p&gt;In the example below, an image of a tennis racquet with balls on a clay court has been uploaded.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxq5po94zplfnqypx8zik.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxq5po94zplfnqypx8zik.png" alt=" " width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Image Processing options
&lt;/h3&gt;

&lt;p&gt;In the left tab, you will see different options for performing inpainting, outpainting, or no processing on the image&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;No Processing: Use your image as-is for video generation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Inpainting: Replace or modify specific areas within your image. Perfect for adding brand elements or removing unwanted objects. For example in the image above, we can ask to replace the yellow balls (mask) with striped balls (processing prompt). We can set the negative prompt to &lt;code&gt;blur&lt;/code&gt; to prevent any blurring.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftlny6bomwibv1i7tnysr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftlny6bomwibv1i7tnysr.png" alt=" " width="800" height="255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftuk6kzku2nywl5ukxrph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftuk6kzku2nywl5ukxrph.png" alt=" " width="800" height="612"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Outpainting: Extend your image beyond its original boundaries. Great for creating more dynamic compositions or adding space for text. In the image above, we can set the mask as the racquet and balls, and set the processing prompt to garden to show an image of the racquet and balls in a garden instead of a clay court.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnb02flq9fxd3eckgfaqb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnb02flq9fxd3eckgfaqb.png" alt=" " width="800" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmv8yzrnxdt7yaawgcph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmv8yzrnxdt7yaawgcph.png" alt=" " width="800" height="612"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you choose inpainting or outpainting, you'll need to provide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A main prompt describing what you want to add or modify. &lt;/li&gt;
&lt;li&gt;A mask prompt indicating which area to modify&lt;/li&gt;
&lt;li&gt;An optional negative prompt to specify what to avoid&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Image Video Generation
&lt;/h3&gt;

&lt;p&gt;In the sidebar, you'll find several options to customize your marketing video under the Video Settings options. Choose from templates in the dropdown menu for Marketing Video Style like "Dynamic Action," "Team Spirit," or "Product in Action" to set the tone of your video. You can also ddd your brand name to personalize the marketing message.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc5lm85h9f2ultycuu920.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc5lm85h9f2ultycuu920.png" alt=" " width="534" height="604"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The app uses these selections to craft a specialized prompt for the AI video generator, optimizing it for sports marketing content.&lt;br&gt;
The "Review Final Prompt" expander allows you to see and edit this final prompt. This is helpful if you want to fine-tune specific aspects of your marketing message. You can experiment with different combinations of image processing, marketing styles, and sport types to create the perfect sports marketing video for your brand.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9lht5f74r4ujy1h7hhh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9lht5f74r4ujy1h7hhh.png" alt=" " width="800" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once you're satisfied with your settings, click the "Generate Sports Marketing Video" button. The system will create a specialized marketing prompt based on your selections and  generate a dynamic sports marketing video using AWS Bedrock Nova Reel During generation, you'll see progress updates. This process typically takes a 3-5 mins as the AI creates your custom video. The video is stored securely in your AWS S3 bucket that you selected when deploying the terraform infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftuurxltz7bmoo7hdu838.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftuurxltz7bmoo7hdu838.png" alt=" " width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is also accessible from the frontend through a presigned URl.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Improvements
&lt;/h2&gt;

&lt;p&gt;There are several ways this application could be enhanced:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multi-image input&lt;/strong&gt; - Allow users to upload multiple images for more dynamic videos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom audio&lt;/strong&gt; - Add options for background music or voiceovers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Video templates&lt;/strong&gt; - More specialized templates for different sports and marketing needs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analytics integration&lt;/strong&gt; - Track video performance metrics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch processing&lt;/strong&gt; - Generate multiple videos at once&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Sports Marketing Video Generator demonstrates how Amazon Nova's generative AI capabilities can transform sports marketing. By combining image classification, image editing, and video generation in a simple interface, marketers can create professional videos in minutes instead of days.&lt;/p&gt;

&lt;p&gt;This project showcases the power of combining multiple AWS services (Bedrock, Rekognition, Cognito, ECS, S3) with a user-friendly frontend (Streamlit) to solve real business problems. This approach not only democratizes video creation but also enables marketing teams to produce more content, faster, and at a lower cost—all while maintaining brand consistency and quality standards.&lt;/p&gt;

&lt;h2&gt;
  
  
   References
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/exploring-creative-possibilities-a-visual-guide-to-amazon-nova-canvas/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/machine-learning/exploring-creative-possibilities-a-visual-guide-to-amazon-nova-canvas/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/aws/amazon-nova-reel-1-1-featuring-up-to-2-minutes-multi-shot-videos/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/aws/amazon-nova-reel-1-1-featuring-up-to-2-minutes-multi-shot-videos/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/aws-samples/deploy-streamlit-app" rel="noopener noreferrer"&gt;https://github.com/aws-samples/deploy-streamlit-app&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>bedrock</category>
      <category>streamlit</category>
      <category>ecs</category>
    </item>
    <item>
      <title>Data Transfer from S3 to Cloud Storage using GCP Storage Transfer Service</title>
      <dc:creator>Ryan Nazareth</dc:creator>
      <pubDate>Sat, 04 Jan 2025 03:57:02 +0000</pubDate>
      <link>https://forem.com/ryankarlos/data-transfer-from-s3-to-cloud-storage-using-gcp-storage-transfer-service-25m</link>
      <guid>https://forem.com/ryankarlos/data-transfer-from-s3-to-cloud-storage-using-gcp-storage-transfer-service-25m</guid>
      <description>&lt;p&gt;Storage Transfer Service automates the transfer of data to, from, and between object and file storage systems, including Google Cloud Storage, Amazon S3, Azure Storage, on-premises data, and more. It can be used to transfer large amounts of data quickly and reliably, without the need to write any code. Depending on your source type, you can easily create and run Google-managed transfers, or configure self-hosted transfers that give you full control over network routing and bandwidth usage. Storage transfer service only allows transfer into GCP and does not support bi-directional transfer e.g. from GCP to AWS.&lt;/p&gt;

&lt;p&gt;In this blog, we will demonstrate how to create a on-off storage transfer job to transfer data from S3 bucket to GCP Cloud Storage. In addition, we will also demonstrate how to setup an event transfer job to transfer objects by continuously listen to event notifications associated with objects being added or modified in source S3 bucket&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before you begin, make sure you have the following prerequisites:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A GCP account with the necessary permissions to create and manage storage buckets and transfer jobs.&lt;/li&gt;
&lt;li&gt;An AWS account with the necessary permissions to create and manage S3 buckets.&lt;/li&gt;
&lt;li&gt;The AWS CLI installed and configured on your local machine.&lt;/li&gt;
&lt;li&gt;The gcloud CLI installed and configured on your local machine.&lt;/li&gt;
&lt;li&gt;The necessary IAM roles and permissions set up in both AWS and GCP.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Create a source S3 bucket &lt;code&gt;demo-s3-transfer&lt;/code&gt; and destination cloud storage bucket &lt;code&gt;demo-storage-transfer&lt;/code&gt;. In the source S3 bucket, we will upload some parquet files in a prefix &lt;code&gt;2024/12&lt;/code&gt;. We will be transferring the parquet files in this prefix into the &lt;code&gt;demo-storage-transfer&lt;/code&gt; bucket. &lt;/p&gt;

&lt;h2&gt;
  
  
  Storage Transfer REST API
&lt;/h2&gt;

&lt;p&gt;Storage Transfer Service uses a Google-managed service account to move your data. This service account is automatically created the first time you create a transfer job or call &lt;code&gt;googleServiceAccounts.get&lt;/code&gt;, or visit the job creation page in the Google Cloud console. The service account's format is typically &lt;code&gt;project-PROJECT_NUMBER@storage-transfer-service.iam.gserviceaccount.com&lt;/code&gt;. &lt;code&gt;googleServiceAccounts.get&lt;/code&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;We can use the &lt;code&gt;googleServiceAccounts.get&lt;/code&gt; method to retrieve the managed Google service account that is used by Storage Transfer Service to access buckets in the project where transfers run or in other projects. Each Google service account is associated with one Google Cloud project. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Navigate to the googleServiceAccounts.get reference page &lt;a href="https://cloud.google.com/storage-transfer/docs/reference/rest/v1/googleServiceAccounts/get" rel="noopener noreferrer"&gt;here&lt;/a&gt;. &lt;br&gt;
On the right, you will see an window open, where you can enter the project ID under the request parameters. Executing this will return the subjectId in the response, along with the storage transfer account email. Keep a note of the subject ID and storage service managed account as we will require it in the latter sections.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fte3q7o5dp2oxjly7w07l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fte3q7o5dp2oxjly7w07l.png" alt="Image description" width="800" height="621"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Alternatively, we can do the same via cli, using curl command and passing the bearer token in the header.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; GET &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud auth print-access-token&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"x-goog-user-project: &amp;lt;project-id&amp;gt;"&lt;/span&gt; https://storagetransfer.googleapis.com/v1/googleServiceAccounts/&amp;lt;project-id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The x-goog-user-project header key is required to set the default project quota for the request see the &lt;a href="https://cloud.google.com/docs/authentication/adc-troubleshooting/user-creds" rel="noopener noreferrer"&gt;troubleshooting guide&lt;/a&gt;. If excluded, you may get the following error:&lt;code&gt;The storagetransfer.googleapis.com API requires a quota project, which is not set by default&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS IAM role permissions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In the AWS console, navigate to IAM and create a new role. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select Custom trust policy and paste the following trust policy.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Principal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"Federated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"accounts.google.com"&lt;/span&gt;&lt;span class="w"&gt;

            &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRoleWithWebIdentity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Condition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"StringEquals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"accounts.google.com:sub"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;subject-id&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt; Replace the  value with the subjectID of the Google-managed service account that you retrieved from the previous section using the &lt;code&gt;googleServiceAccounts.get&lt;/code&gt; reference page. It should look like the screenshot below.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ggnahjgnp1kkesz82th.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ggnahjgnp1kkesz82th.png" alt="Image description" width="800" height="424"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Paste the following json policy to grant permissions to the role to list bucket and get objects from the S3 bucket.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"s3:GetObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"s3:ListBucket"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3katrx42n72jq4n49zcc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3katrx42n72jq4n49zcc.png" alt="Image description" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Once the role is created, note down the ARN value, which will be passed to Storage Transfer Service when initiating the transfer programatically in python.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Transfer permissions in GCP
&lt;/h2&gt;

&lt;p&gt;The GCP service account used to create the transfer job will need to granted the Storage Transfer User role (roles/storagetransfer.user) and &lt;code&gt;roles/iam.roleViewer&lt;/code&gt;.  In addition, we need to give the Google-managed service account retrieved in the previous section, access to resources needed to complete transfers. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Navigate to the Cloud Storage Bucket &lt;code&gt;demo-storage-transfer&lt;/code&gt;. In the permissions tab, click grant access. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi1o3q89u14zaq8gh361f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi1o3q89u14zaq8gh361f.png" alt="Image description" width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the new window, enter the principal as the managed gcp transfer service email. Assign the Storage Admin Role.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frtwnoihuxprbqsr2d6hu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frtwnoihuxprbqsr2d6hu.png" alt="Image description" width="800" height="363"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Create one-off batch Storage Transfer Job
&lt;/h2&gt;

&lt;p&gt;We can interact with Storage Transfer Service programmatically with Python. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;copy this &lt;a href="https://github.com/ryankarlos/GCP_patterns/tree/master/storage_transfer_service" rel="noopener noreferrer"&gt;folder&lt;/a&gt; which contains the requirements.txt and script for initiating the storage transfer job, checking status and verifying completion.&lt;/li&gt;
&lt;li&gt;in command line terminal window, run &lt;code&gt;pip install - requirements.txt&lt;/code&gt;, to install the google-cloud-storage-transfer and cloud-storage libraries.&lt;/li&gt;
&lt;li&gt;If you use a service account json, then set the environment variable &lt;strong&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/strong&gt; to the path to this service account. Otherwise, use one of the other &lt;a href="https://cloud.google.com/docs/authentication" rel="noopener noreferrer"&gt;GCP authentication options&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Now, run the following command to execute the storage_transfer_batch.py job script in a terminal of your choosing. This will transfer the data from the 2024/12 prefix in the S3 bucket to the GCP bucket with a Data prefix.  We pass in the arn of the role we created earlier, which will be assumed during the transfer to generate temp credentials with the required permissions.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python python/storage_transfer.py &lt;span class="nt"&gt;--gcp_project_id&lt;/span&gt; &amp;lt;your-gcp-project-id&amp;gt; &lt;span class="nt"&gt;--gcp_bucket&lt;/span&gt; &amp;lt;your-gcp-bucket&amp;gt; &lt;span class="nt"&gt;--s3_bucket&lt;/span&gt; &amp;lt;your-s3-bucket&amp;gt; &lt;span class="nt"&gt;--s3_prefix&lt;/span&gt; &amp;lt;s3-prefix&amp;gt; &lt;span class="nt"&gt;--gcp_prefix&lt;/span&gt; &amp;lt;gcp-prefix&amp;gt; &lt;span class="nt"&gt;--role_arn&lt;/span&gt; &amp;lt;aws-role-arn&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;You should see the logs as in the screenshot below. Wait for the job to show as completed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fprhn6qj6g04wykjo8b4g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fprhn6qj6g04wykjo8b4g.png" alt="Image description" width="800" height="98"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Navigate to the cloud storage bucket and you should see the data in the bucket in the Data prefix&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsh4ktgwy8gqmy18hbag.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsh4ktgwy8gqmy18hbag.png" alt="Image description" width="800" height="307"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can monitor and check your transfer jobs from the Google Cloud Console UI.  Open the Google Cloud Console and navigate to "Transfer Service". The jobs executed will be listed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy86bkc2elvy8gvc2tl9w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy86bkc2elvy8gvc2tl9w.png" alt="Image description" width="800" height="237"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the monitoring tab, we can see plots for performance metrics (bytes transferred, objects processed, transfer rate etc).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr9h72ile6crz3nquxyrd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr9h72ile6crz3nquxyrd.png" alt="Image description" width="800" height="385"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the operations and configuration tabs, we can get more details regarding Transfer specifications e.g. Run history, data transferred and other configuration details we set for the transfer job.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5bfe29uy9gh6kqlia3bz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5bfe29uy9gh6kqlia3bz.png" alt="Image description" width="800" height="355"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Create event driven transfer job
&lt;/h2&gt;

&lt;p&gt;Event-driven transfers listen to Amazon S3 Event Notifications sent to Amazon SQS to know when objects in the source bucket have been modified or added. &lt;/p&gt;

&lt;h3&gt;
  
  
  Create an SQS queue in AWS
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;In AWS management console, go to the SQS service, click on "Create queue"  and provide a name for the queue.&lt;/li&gt;
&lt;li&gt;In the Access policy section, select Advanced. A JSON object is displayed. Paste the policy below, replacing the values for ,  and . This will only permit SQS:SendMessage action on the SQS queue from the S3 bucket in the AWS account.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"example-ID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Sid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"example-statement-ID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Principal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"s3.amazonaws.com"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SQS:SendMessage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;SQS-RESOURCE-ARN&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Condition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"StringEquals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"aws:SourceAccount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;AWS-ACCOUNT-ID&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"ArnLike"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"aws:SourceArn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;S&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;_BUCKET_ARN&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw48w6s4jvh1txr2fhdf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw48w6s4jvh1txr2fhdf.png" alt="Image description" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we need to enable notifications in the S3 bucket, setting the SQS queue as destination.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate go the S3 bucket and select the Properties tab. In the Event notifications section, click Create event notification.&lt;/li&gt;
&lt;li&gt;Specify a name for this event.In the Event types section, select "All object create events", as in the screenshot below.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qumxkqx15ppizkur23w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qumxkqx15ppizkur23w.png" alt="Image description" width="800" height="399"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;As the Destination select SQS queue and select the queue you created previously.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcx5stfr8k67aq14173z1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcx5stfr8k67aq14173z1.png" alt="Image description" width="800" height="279"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Create an event driven Storage transfer job
&lt;/h2&gt;

&lt;p&gt;We will now use the GCP cloud console to create an event driven transfer job. Navigate to the GCP Transfer Service page and click Create transfer job&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select Amazon S3 as the source type, and Cloud Storage as the destination.&lt;/li&gt;
&lt;li&gt;For the Scheduling mode select Event-driven and click Next.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxg23occ1tu5zwy5qxcn0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxg23occ1tu5zwy5qxcn0.png" alt="Image description" width="800" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enter the S3 bucket name. We will use the same bucket we used previously for the one-off transfer but you can use a different one if you wish.&lt;/li&gt;
&lt;li&gt;Enter the Amazon SQS queue ARN that you created earlier, as in the screenshot below&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flat0i8yz7fq3rthiqyb5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flat0i8yz7fq3rthiqyb5.png" alt="Image description" width="800" height="669"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select the destination Cloud Storage bucket path (which can optionally include a prefix) as in the screenshot below.&lt;/li&gt;
&lt;li&gt;Leave the rest of the options as defaults and click create.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzgwcjtvgt3aexi9hvvl5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzgwcjtvgt3aexi9hvvl5.png" alt="Image description" width="800" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The transfer job starts running and an event listener waits for notifications on the SQS queue. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6o97qg5mzbmc7qjaadu6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6o97qg5mzbmc7qjaadu6.png" alt="Image description" width="800" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can test this by putting some data into S3 bucket source location. Observe your objects being replicated from AWS S3 to GCS bucket. You can also view monitoring details in the SQS queue.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qcyiarrfgwjq2nnv3kq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qcyiarrfgwjq2nnv3kq.png" alt="Image description" width="800" height="606"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;GCP's Storage Transfer Service is a powerful tool for transferring data from S3 to GCS. It offers a cost-effective, scalable, and secure solution for data migration, with flexible scheduling and data filtering options.  In this practical blog, we walked you through the steps required to set up GCP's Storage Transfer Service for transferring data from S3 to GCS. By following these steps, you can easily migrate your data from S3 to GCS with minimal effort and maximum efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/storage-transfer/docs/create-transfers/agentless/s3" rel="noopener noreferrer"&gt;https://cloud.google.com/storage-transfer/docs/create-transfers/agentless/s3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/storage-transfer/docs/libraries" rel="noopener noreferrer"&gt;https://cloud.google.com/storage-transfer/docs/libraries&lt;/a&gt;]&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/storage-transfer/docs/source-amazon-s3" rel="noopener noreferrer"&gt;https://cloud.google.com/storage-transfer/docs/source-amazon-s3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/storage-transfer/docs/iam-cloud" rel="noopener noreferrer"&gt;https://cloud.google.com/storage-transfer/docs/iam-cloud&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>googlecloud</category>
      <category>aws</category>
      <category>cloudstorage</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>AWS Neptune for analysing event ticket sales between users - Part 2</title>
      <dc:creator>Ryan Nazareth</dc:creator>
      <pubDate>Mon, 29 May 2023 21:41:05 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-neptune-for-analysing-event-ticket-sales-between-users-part-2-3i5g</link>
      <guid>https://forem.com/aws-builders/aws-neptune-for-analysing-event-ticket-sales-between-users-part-2-3i5g</guid>
      <description>&lt;p&gt;This blog follows on from the setup of Neptune DB with Worldwide Events data in the &lt;a href="https://dev.to/aws-builders/aws-neptune-for-analysing-event-ticket-sales-between-users-part-1-4ag"&gt;first part&lt;/a&gt;. Here we will run some queries and investigate node relationships in the Neptune Notebook. &lt;br&gt;
We can use some of the magic commands available in &lt;a href="https://github.com/aws/graph-notebook" rel="noopener noreferrer"&gt;graph notebook&lt;/a&gt; open sourced by AWS and available to use in the Neptune Notebook instance configured in the &lt;a href="https://dev.to/aws-builders/aws-neptune-for-analysing-event-ticket-sales-between-users-part-1-4ag"&gt;first part&lt;/a&gt; of this blog. We can configure the visualisation options available in the notebook when we execute the query, using the&lt;code&gt;%graph_notebook_vis_options&lt;/code&gt; command. This will output a json containing the default configuration options for rendering the graphs. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbib7arux0ht2dy7hebje.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbib7arux0ht2dy7hebje.png" alt="Image description" width="800" height="556"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To modify the executing notebook's &lt;a href="https://visjs.github.io/vis-network/docs/network/#options" rel="noopener noreferrer"&gt;vis.js options&lt;/a&gt;, we can use &lt;code&gt;%%graph_notebook_vis_options&lt;/code&gt; with the modified JSON payload provided in the cell body. For example, in the screenshot below I have switched the &lt;a href="https://visjs.github.io/vis-network/docs/network/physics.html" rel="noopener noreferrer"&gt;physics solver&lt;/a&gt; from &lt;strong&gt;barnesHut&lt;/strong&gt; to  &lt;strong&gt;forceAtlas2Based&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0d0cbakku2vd7zxg13e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0d0cbakku2vd7zxg13e.png" alt="Image description" width="800" height="967"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will also create a mapping between node label and property used to label the node in the visualisation. Run the following in the notebook cell.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;display_var = '{"user":"name","event":"name"}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will reference &lt;code&gt;$display_var&lt;/code&gt;, the value of the variable when running the first query. The &lt;code&gt;%%oc&lt;/code&gt; magic command indicates we want to execute an openCypher query. The &lt;code&gt;-d&lt;/code&gt; hint is used to enable the mappings defined above, so we pass &lt;code&gt;$display_var&lt;/code&gt;. The &lt;code&gt;-l&lt;/code&gt; hint sets the maximum length of the text to 20, that can be displayed in a node. The cypher query will return all the connected nodes (along with the edges) in the database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="o"&gt;%%&lt;/span&gt;&lt;span class="n"&gt;oc&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;$display_var&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;l20&lt;/span&gt;
&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;((&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="ss"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By default, we see the results in text form in the console tab &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdqqlph9d8xuyy4d3mv6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdqqlph9d8xuyy4d3mv6.png" alt="Image description" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we switch to the graph tab, we can see the rendered visualisation. We can zoom in and out using then +/- icons or move the nodes by clicking and dragging. Zooming into a cluster of nodes and relationships should also show the associated labels. Hovering over any nodes should also show the label. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F58uqqn2qsl043a3p57bz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F58uqqn2qsl043a3p57bz.png" alt="Image description" width="800" height="583"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Users from New York
&lt;/h4&gt;

&lt;p&gt;Now lets find out all users (buyers and sellers) who are from New York. We will use the &lt;code&gt;user&lt;/code&gt; when to only match the user nodes and then use the &lt;code&gt;WHERE&lt;/code&gt; clause to filter only nodes whose property &lt;code&gt;city&lt;/code&gt; is New York.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="o"&gt;%%&lt;/span&gt;&lt;span class="n"&gt;oc&lt;/span&gt; 
&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;t:&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;t.city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"New York"&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Finxsdjviapy3gl5zwhgx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Finxsdjviapy3gl5zwhgx.png" alt="Image description" width="800" height="387"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Users buying and selling for events in Toronto
&lt;/h4&gt;

&lt;p&gt;To find paths containing users who listed tickets and bought tickets for events in Toronto. We can use the following query to match the path &lt;code&gt;(n:user)-[]-(e:event)-[]-(u:user)&lt;/code&gt; and then filter the city property of the event node to &lt;code&gt;Toronto&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="o"&gt;%%&lt;/span&gt;&lt;span class="n"&gt;oc&lt;/span&gt;  &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;$display_var&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;l20&lt;/span&gt;
&lt;span class="k"&gt;MATCH&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;n:&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;e:&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;u:&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;e.city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Toronto"&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy0ev7bpvd99dsv7diu70.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy0ev7bpvd99dsv7diu70.png" alt="Image description" width="800" height="636"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Match event property directly in path
&lt;/h4&gt;

&lt;p&gt;Let us now match all sellers and buyers for tickets to &lt;code&gt;The Police&lt;/code&gt; event. Instead of using the &lt;code&gt;WHERE&lt;/code&gt; clause after &lt;code&gt;MATCH&lt;/code&gt; we can filter the required paths directly in the first &lt;code&gt;MATCH&lt;/code&gt; clause by specifying the property &lt;code&gt;name&lt;/code&gt; as &lt;code&gt;The Police&lt;/code&gt; in the event node in MATCH statement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="o"&gt;%%&lt;/span&gt;&lt;span class="n"&gt;oc&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;$display_var&lt;/span&gt;
&lt;span class="k"&gt;MATCH&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;seller:&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;--&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;name:&lt;/span&gt; &lt;span class="s1"&gt;'The Police'&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;&lt;span class="o"&gt;--&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;buyer:&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks like we have two events, for which tickets listed by a user were purchased by another user. However, maybe we need some more granular information regarding the transactions and whether all the tickets listed by the seller were bought.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8gd3yuozhiyt5rqe1qpz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8gd3yuozhiyt5rqe1qpz.png" alt="Image description" width="800" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can then return the associated properties of the relationships as separate columns in a table. The previous query can be modified to return the properties and relationship type instead of the path and alias the names (which will be the table column names). We also do not need to match the full path (explicitly the directions between users and events) as in the previous query as we are interested in all relationship types connected to the &lt;code&gt;The Police&lt;/code&gt; events node(s). This output will not give the option of displaying a graph in the widget as we have not returned a path.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="o"&gt;%%&lt;/span&gt;&lt;span class="n"&gt;oc&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;$display_var&lt;/span&gt;
&lt;span class="k"&gt;MATCH&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="ss"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;name:&lt;/span&gt; &lt;span class="s1"&gt;'The Police'&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;e.date&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;event_date&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;
&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;e.quantity&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;number_of_tickets&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;e.price&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79v40wt7oz63hfpoeewy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79v40wt7oz63hfpoeewy.png" alt="Image description" width="800" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Path length and hops
&lt;/h4&gt;

&lt;p&gt;Here we will try and find a user has listed a ticket for the event &lt;code&gt;Mary Poppins&lt;/code&gt; and is at least 11 hops away from any other node. In the cypher query below, we have matched a user node with a directed relationship to event node with property  name &lt;code&gt;Mary Poppins&lt;/code&gt;. Since this already accounts for the first hop, we need to match the remaining minimum 10 hops from the event node to any other user node. We can achieve this by using &lt;a href="https://neo4j.com/docs/cypher-manual/current/syntax/patterns/#cypher-pattern-varlength" rel="noopener noreferrer"&gt;variable length pattern matching&lt;/a&gt; in cypher which allows users to specify a range of lengths in the relationship description of a pattern. Here we use a lower bound for the range followed by ellipsis to signify no upper bound. Finally we return the username.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;u:&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;--&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;name:&lt;/span&gt; &lt;span class="s1"&gt;'Mary Poppins'&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mf"&gt;10.&lt;/span&gt;&lt;span class="n"&gt;.&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:user&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;u.name&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This returns user &lt;code&gt;QRG30DIY&lt;/code&gt;. Now let's return the path so we can visualise who this person is connected to.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0qj0jdnrjz3n4qwom42.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0qj0jdnrjz3n4qwom42.png" alt="Image description" width="800" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can modify the query to match the user &lt;code&gt;QRG30DIY&lt;/code&gt; who listed the ticket for &lt;code&gt;Mary Poppins&lt;/code&gt; event and then return all subsequent relationships and nodes connected any number of hops away from the &lt;code&gt;Mary Poppins&lt;/code&gt; event node (using the &lt;code&gt;*&lt;/code&gt; notation).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;u:&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;name:&lt;/span&gt;&lt;span class="s1"&gt;'QRG30DIY'&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;&lt;span class="o"&gt;--&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;name:&lt;/span&gt; &lt;span class="s1"&gt;'Mary Poppins'&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user node &lt;code&gt;QRG30DIY&lt;/code&gt; has been highlighted in the visual below. If we count the number of connections from this node, there are two paths which have at least 11, ending at user nodes have a single connection to the &lt;code&gt;Macbeth&lt;/code&gt; event node. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ze98dhoypehwgzkd3wn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ze98dhoypehwgzkd3wn.png" alt="Image description" width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Deleting Resources
&lt;/h3&gt;

&lt;p&gt;Once you have finished with the queries and analysis, you will need to delete the Neptune DB instance and Redshift Serverless namespace (and associated workgroup). The Neptune DB instance can be deleted with or without final snapshot by following the instructions in the &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/manage-console-instances-delete.html" rel="noopener noreferrer"&gt;docs&lt;/a&gt;. Then delete the Redshift Serverless workgroup by following the steps &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless_delete-workgroup.html" rel="noopener noreferrer"&gt;here&lt;/a&gt; followed by the &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-console-namespace-delete.html" rel="noopener noreferrer"&gt;namespace&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>networks</category>
      <category>neptune</category>
      <category>visualisation</category>
      <category>cypher</category>
    </item>
    <item>
      <title>AWS Neptune for analysing event ticket sales between users - Part 1</title>
      <dc:creator>Ryan Nazareth</dc:creator>
      <pubDate>Mon, 29 May 2023 21:39:57 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-neptune-for-analysing-event-ticket-sales-between-users-part-1-4ag</link>
      <guid>https://forem.com/aws-builders/aws-neptune-for-analysing-event-ticket-sales-between-users-part-1-4ag</guid>
      <description>&lt;p&gt;This is the first of a two part blog series, where we will walk through the setup for using AWS Neptune to anaylse a property graph modelled from the &lt;a href="https://aws.amazon.com/marketplace/pp/prodview-4ozlpl4r3k7cg" rel="noopener noreferrer"&gt;Worldwide Event Attendance&lt;/a&gt; from AWS Marketplace Data Exchange, which is free to subscribe to. This contains data for user ticket purchases and sales for fictional daily events(operas, plays, pop concerts etc) across 2008. in the USA. This data is accessible from Redshift so part of this setup will involve loading the data in required format from Redshift to S3 bucket and then loading it into a Neptune DB instance for running queries and generating visualisations in &lt;a href="https://dev.to/aws-builders/aws-neptune-for-analysing-event-ticket-sales-between-users-part-2-3i5g"&gt;Part 2&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up the Neptune Cluster and Notebook
&lt;/h2&gt;

&lt;p&gt;First we will need to create the Neptune Cluster and database instance. I have configured this from the AWS console, which can be followed using the steps in the &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/manage-console-launch-console.html" rel="noopener noreferrer"&gt;docs&lt;/a&gt; but this could also be automated via one of the Cloudformation templates &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/get-started-cfn-create.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the &lt;strong&gt;Engine&lt;/strong&gt; options, select provisioned mode and the latest version of Neptune&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;Settings&lt;/strong&gt;, select the &lt;strong&gt;Development and testing&lt;/strong&gt; option rather than Production as this will give use the option to select the cheaper burstable class (db.t3.medium). &lt;/li&gt;
&lt;li&gt;We will not create any Neptune replicas in different availability zones so click &lt;strong&gt;No&lt;/strong&gt; for &lt;strong&gt;Multi-AZ Deployment&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhz8tlrye2lzytr2mpi0c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhz8tlrye2lzytr2mpi0c.png" alt="Image description" width="800" height="1273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the &lt;strong&gt;Connectivity&lt;/strong&gt; option, I have selected my default VPC for which I already have the security group configured with an inbound rule to allow access to any port in range with existing security group id as source. Alternatively you could add another custom rule to only allow inbound traffic to the specific default port for Neptune (8182). &lt;/li&gt;
&lt;li&gt;You can also choose to create a new VPC and new security group if you do not want to use the existing ones.
&lt;/li&gt;
&lt;li&gt;We will configure the notebook separately after creating the cluster, so skip the &lt;strong&gt;Notebook configuration&lt;/strong&gt; option.&lt;/li&gt;
&lt;li&gt;You can either skip the &lt;strong&gt;Additional configuration&lt;/strong&gt; option and accept the defaults, which enable deletion protection, encryption at rest and auto minor version upgrades or disable the options you do not want.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foz5ygdktps0tihd6yzs0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foz5ygdktps0tihd6yzs0.png" alt="Image description" width="800" height="1255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will now configure a &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/graph-notebooks.html" rel="noopener noreferrer"&gt;Neptune graph notebook&lt;/a&gt; to access the cluster, so we can run queries and generate interactive visualisations. &lt;strong&gt;Neptune Workbench&lt;/strong&gt; allows users to run fully managed jupyter notebook environment in Sagemaker with the latest release of the open source &lt;a href="https://github.com/aws/graph-notebook" rel="noopener noreferrer"&gt;graph Neptune project&lt;/a&gt;. This has the benefit of offering in-built capabilities like &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/notebooks-visualization.html" rel="noopener noreferrer"&gt;visualisation of queries&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click &lt;strong&gt;Notebooks&lt;/strong&gt; from the navigation pane on the left and select &lt;strong&gt;Create notebook&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;In the Cluster list, choose your Neptune DB cluster. If you don't yet have a DB cluster, choose &lt;strong&gt;Create cluster&lt;/strong&gt; to create one.&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;Notebook instance type&lt;/strong&gt;, select &lt;strong&gt;ml.t3.medium&lt;/strong&gt; which should be sufficient for this example.&lt;/li&gt;
&lt;li&gt;Under &lt;strong&gt;IAM role name&lt;/strong&gt;, select &lt;strong&gt;create an IAM role&lt;/strong&gt; for the notebook, and enter a name for the new role. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fesl936ewufcb8n4ylry4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fesl936ewufcb8n4ylry4.png" alt="Image description" width="800" height="817"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, we need to create an IAM role for Neptune to assume to be able to load data from S3. Also, since Neptune DB instance is within a VPC, we need to create an S3 gateway endpoint to allow access to S3. This can be achieved by following the steps in the &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-tutorial-IAM.html#bulk-load-tutorial-vpc" rel="noopener noreferrer"&gt;IAM prerequites for the Neptune Bulk Loader&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Redshift Serverless Data Query and Unload
&lt;/h2&gt;

&lt;p&gt;In this &lt;a href="https://dev.to/aws-builders/sagemaker-501e-temp-slug-9236610?preview=850aebb1bd0ecf9213710c0b676e1f93a3f2ce4ce9b852476448ed854ca96c1ca0b803afcd1ca165e02198f35013dd28e398083bf31486ac98bc5e64"&gt;previous blog&lt;/a&gt;, I have described how to configure AWS Redshift Serverless with access to AWS Marketplace Worldwide Events Dataset. Follow the steps to configure datashare to access this database from the redshift cluster. &lt;/p&gt;

&lt;p&gt;We will model the users and events as nodes and relationship between each user and event as an edge. For example, a seller (node) would list (relationship) a ticket for a given event (node) for which one or many buyers (node(s)) would purchase (relationship(s)) tickets for (or unluckily noone may pruchase from the seller).&lt;/p&gt;

&lt;p&gt;Open the query editor in the navigation pane in the Redshift Serverless console.  We will first create a view which will filter the &lt;code&gt;all_users&lt;/code&gt; view in the worldwide events datashare, to only contain users who like theatre, concerts and opera. The additional constraint is that we will only keep data that has no nulls in any of the entries for the boolean columns selected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;user_sample_vw&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;userid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;liketheatre&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;likeconcerts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;likeopera&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; 
&lt;span class="nv"&gt;"worldwide_event_data_exchange"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"all_users"&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;liketheatre&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt;  &lt;span class="n"&gt;likeconcerts&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt;  &lt;span class="n"&gt;likeopera&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;no&lt;/span&gt; &lt;span class="k"&gt;schema&lt;/span&gt; &lt;span class="n"&gt;binding&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1q98y0sgpbm99kaycpdw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1q98y0sgpbm99kaycpdw.png" alt="Image description" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lets also create another view containing a snapshot of events and related transactions between selected buyers and sellers in our &lt;strong&gt;user_sample_vw&lt;/strong&gt; for the month of January. We also need to pull in additional columns corresponding to venue, event and ticket purchase and listing details (.e.g number of tickets and price). Hence we need to join to the respective tables.&lt;br&gt;
&lt;strong&gt;Note&lt;/strong&gt; We also only want records where either the buyer or seller cannot be NULL and all users must be from the subset we sampled in &lt;strong&gt;user_sample_vw&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;network_vw&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;  &lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;saletime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;L&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sellerid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;L&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listtime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;buyerid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eventid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eventname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;venuename&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;C&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;catname&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;venuecity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;venuestate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="n"&gt;pricepaid&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;qtysold&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;caldate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;priceperticket&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;listprice&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;numtickets&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;listtickets&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"worldwide_event_data_exchange"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"date"&lt;/span&gt; &lt;span class="n"&gt;D&lt;/span&gt; 
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"worldwide_event_data_exchange"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"sales"&lt;/span&gt; &lt;span class="n"&gt;S&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dateid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dateid&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"worldwide_event_data_exchange"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"listing"&lt;/span&gt; &lt;span class="n"&gt;L&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;L&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listid&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"worldwide_event_data_exchange"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"event"&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eventid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eventid&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt;  &lt;span class="nv"&gt;"worldwide_event_data_exchange"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"category"&lt;/span&gt; &lt;span class="k"&gt;C&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;catid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;C&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;catid&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"worldwide_event_data_exchange"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"venue"&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;venueid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;venueid&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"dev"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"user_sample_vw"&lt;/span&gt; &lt;span class="n"&gt;U&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;buyerid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;U&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;userid&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;qtr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"dev"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"user_sample_vw"&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sellerid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;userid&lt;/span&gt; 
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;no&lt;/span&gt; &lt;span class="k"&gt;schema&lt;/span&gt; &lt;span class="n"&gt;binding&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see the &lt;strong&gt;network_vw&lt;/strong&gt; view visible if you refresh the dev database and expand the view dropdown in the tree. A sample of the rows and columns of the view looks like below. We will use this later to simplify the creation of edge records for our csv to export to S3. We will also use the &lt;strong&gt;eventid&lt;/strong&gt; and related properties to create nodes csv.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F83rt1wi1ql4kcy0lu7ah.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F83rt1wi1ql4kcy0lu7ah.png" alt="Image description" width="800" height="230"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will need to generate two csv files(one containing all the nodes and other containing all the relationship records) in the S3 bucket. This is a requirement when we will subsequently use the Neptune Bulk Loader to load the data into Neptune using the openCypher-specific csv format (since we will be using openCypher to query the graph data). In addition, the openCypher load format requires system column headers in node and relationship files as detailed in the &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-tutorial-format-opencypher.html" rel="noopener noreferrer"&gt;docs&lt;/a&gt;. Any column that holds the values for a particular property needs to use a property column header &lt;strong&gt;propertyname:type&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;We will need to create a role to associate with redshift serverless endpoint so it can unload data into S3. &lt;br&gt;
In the Redshift Serverless console, go to Namespace configuration and select the namespace. Then go to Security and Encryption Tab and click on Manage IAM roles under the Permissions section. Click the &lt;strong&gt;Create IAM role&lt;/strong&gt; option in the &lt;strong&gt;Manage IAM roles&lt;/strong&gt; dropdown. This will create an IAM role as the default with AWS managed policy &lt;strong&gt;AmazonRedshiftAllCommandsFullAccess&lt;/strong&gt; attached. This includes permissions to run SQL commands to COPY, UNLOAD, and query data with Amazon Redshift Serverless.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2n5j5mre505c3y03xbpz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2n5j5mre505c3y03xbpz.png" alt="Image description" width="800" height="752"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Select the option &lt;strong&gt;Specific S3 buckets&lt;/strong&gt; and select the S3 bucket created for unloading the nodes and relationship data to. Then click &lt;strong&gt;Create IAM role as default&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This default role created does allow permissions to run select statements on other services besides S3, including Sagemaker, Glue etc. The policy attached to the new role created would need to be updated from IAM if you want to limit permissions to fewer services.&lt;/p&gt;

&lt;p&gt;If you navigate back to the Namespace, you should see the IAM role and the associated arn (highlighted in yellow) which you will need to specify when running commands to unload data to S3.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwfnd4otcmbo03yrc5jh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwfnd4otcmbo03yrc5jh.png" alt="Image description" width="800" height="559"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will use the &lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html" rel="noopener noreferrer"&gt;UNLOAD&lt;/a&gt; command to unload the results of the queries above to S3 in csv format. We need to add the following options below.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CSV DELIMITER AS&lt;/strong&gt;: to use csv format with delimiter as ','&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HEADER&lt;/strong&gt;: specify first row as header row&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CLEANPATH&lt;/strong&gt;:  to remove any existing S3 file before unloading new query&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PARALLEL OFF&lt;/strong&gt;: turn off parallel writes as we want a single CSV files rather than multiple partitions.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;unload ('&amp;lt;query&amp;gt;')
to &amp;lt;s3://object-path/name-prefix&amp;gt;
iam_role &amp;lt;your role-arn&amp;gt;
CSV DELIMITER AS ','
HEADER
cleanpath
parallel off;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The query below will unload the results for all the user and event node records to an S3 bucket &lt;strong&gt;s3://redshift-worldwide-events&lt;/strong&gt; with object name prefix &lt;strong&gt;nodes&lt;/strong&gt;. Replace the iam role arn with your role arn. The first line will force the column names to be the same case as used in the query (by default all column names are overriden to lowercase).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;enable_case_sensitive_identifier&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;unload&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="s1"&gt;'SELECT DISTINCT *
FROM
(
    SELECT CONCAT(&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;u&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;, A.buyerid) AS ":ID", B.username AS "name:String", 
    B.liketheatre AS "liketheatre:Bool", B.likeconcerts AS "likeconcerts:Bool", B.likeopera AS "likeopera:Bool", 
    NULL AS "venue:String", NULL AS "category:String",  B.city AS "city:String",  B.state AS "state:String",  &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt; AS ":LABEL" 
    FROM "dev"."public"."network_vw" A 
    JOIN user_sample_vw B
    ON A.buyerid = B.userid
)
UNION 
(
    SELECT CONCAT(&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;u&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;, A.sellerid) AS ":ID", B.username AS "name:String", 
    B.liketheatre AS "liketheatre:Bool", B.likeconcerts AS "likeconcerts:Bool", B.likeopera AS "likeopera:Bool", 
    NULL AS "venue:String", NULL AS "category:String",  B.city AS "city:String",  B.state AS "state:String",  &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt; AS ":LABEL" 
    FROM "dev"."public"."network_vw" A 
    JOIN user_sample_vw B
    ON A.sellerid = B.userid
)
UNION
(
    SELECT CONCAT(&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;e&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;, eventid) AS ":ID",  eventname  AS "name:String", 
    NULL AS "liketheatre:Bool", NULL AS "likeconcerts:Bool", NULL AS "likeopera:Bool",
    venuename AS "venue:String", catname AS "category:String", venuecity AS "city:String", venuestate AS "state:String", &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;event&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt; AS ":LABEL"
    FROM "dev"."public"."network_vw" B
)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="s1"&gt;'s3://redshift-worldwide-events/nodes'&lt;/span&gt; 
&lt;span class="n"&gt;iam_role&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;your-iam-role&amp;gt;'&lt;/span&gt;
&lt;span class="n"&gt;CSV&lt;/span&gt; &lt;span class="k"&gt;DELIMITER&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="s1"&gt;','&lt;/span&gt;
&lt;span class="n"&gt;HEADER&lt;/span&gt;
&lt;span class="n"&gt;cleanpath&lt;/span&gt;
&lt;span class="n"&gt;parallel&lt;/span&gt; &lt;span class="k"&gt;off&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it ran successfully, we should see a warning saying that 239 rows loaded successfully.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F66bmc07qwidyurh0e4t2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F66bmc07qwidyurh0e4t2.png" alt="Image description" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lets break down the query and see what its doing. The first and second subqueries create records for buyer and seller nodes respectively by aliasing the column names to openCypher format and setting the event property columns to NULL. We need to join the &lt;strong&gt;network_vw&lt;/strong&gt; (which contains the list of seller and buyer pairs) and the &lt;strong&gt;user_sample_vw&lt;/strong&gt; (which contains the properties of all users) to select additional information per user like username, city and whether they like concerts, theatre and/or opera. The final subquery  creates the records for the events nodes from &lt;strong&gt;network_vw&lt;/strong&gt; and similarly aliasing the column names to the required format and setting the values for the columns corresponding to the users nodes to NULL. We  then &lt;strong&gt;UNION&lt;/strong&gt; the separate sub queries to combine them in the same results set. &lt;/p&gt;

&lt;p&gt;We can similarily run a query for unloading the edge records results set. Here the S3 location option is slightly modified to use an object name prefix 'edges'&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;enable_case_sensitive_identifier&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;unload&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="s1"&gt;'SELECT ROW_NUMBER() OVER() AS ":ID",":START_ID",":END_ID", ":TYPE", "price:Double", "quantity:Int", 
"date:DateTime"
FROM 
    (
        ( 
        SELECT  CONCAT(&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;u&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;, sellerid) AS ":START_ID", 
        CONCAT(&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;e&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;, eventid) AS ":END_ID",&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;TICKETS_LISTED_FOR&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt; AS ":TYPE",
        pricepaid AS "price:Double" ,qtysold AS "quantity:Int", caldate AS "date:DateTime"
        FROM "dev"."public"."network_vw"
        )
    UNION 
        (
            SELECT CONCAT(&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;e&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;, eventid) AS ":START_ID", 
            CONCAT(&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;u&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;, buyerid) AS ":END_ID",&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;TICKET_PURCHASE&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt; AS ":TYPE", 
            pricepaid AS "price:Double" ,qtysold AS "quantity:Int" , caldate AS "date:DateTime"
            FROM "dev"."public"."network_vw"
        )
    )'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="s1"&gt;'s3://redshift-worldwide-events/edges'&lt;/span&gt; 
&lt;span class="n"&gt;iam_role&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;your-iam-role&amp;gt;'&lt;/span&gt;
&lt;span class="n"&gt;CSV&lt;/span&gt; &lt;span class="k"&gt;DELIMITER&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="s1"&gt;','&lt;/span&gt; 
&lt;span class="n"&gt;HEADER&lt;/span&gt; 
&lt;span class="n"&gt;cleanpath&lt;/span&gt; 
&lt;span class="n"&gt;parallel&lt;/span&gt; &lt;span class="k"&gt;off&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that we have also used a window function to rank the edge records for same node ids by date, so we can only take the latest transaction between pair of same users.&lt;br&gt;
The screenshot below shows the edge records where there are multiple transactions between same buyer and seller on different dates. We will only keep the latest record.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqde0e5oj9ohimuf1dzl2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqde0e5oj9ohimuf1dzl2.png" alt="Image description" width="800" height="125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the query has loaded successfully, check that the two objects are visible in the S3 bucket.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmdixupxp6qc9ote8qb3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmdixupxp6qc9ote8qb3.png" alt="Image description" width="800" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Loading S3 Data into Neptune
&lt;/h2&gt;

&lt;p&gt;Now we will load the data from the S3 bucket to the Neptune cluster. To do this, we will open the notebook we configured in Sagemaker to access the Neptune cluster. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to the Sagemaker console, Notebook tab and select Notebook instances. &lt;/li&gt;
&lt;li&gt;You should see the Notebook instance status &lt;strong&gt;in service&lt;/strong&gt; if the create notebook task ran successfully. &lt;/li&gt;
&lt;li&gt;Under &lt;strong&gt;Actions&lt;/strong&gt;, click on Open Jupyter or Jupyter lab. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gdrbj2az4wa8omx2j9w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gdrbj2az4wa8omx2j9w.png" alt="Image description" width="800" height="78"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You should see a number of subfolders containing sample notebooks on various topics, one level below the Neptune parent folder. Either open one of the existing notebooks or start a blank new one. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuoc6widul9kbpblp04t9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuoc6widul9kbpblp04t9.png" alt="Image description" width="800" height="251"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First we will check if the notebook configurations are as we expect. Graph notebook offers a number of &lt;a href="https://github.com/aws/graph-notebook#features" rel="noopener noreferrer"&gt;magic extensions&lt;/a&gt; in ipython3 kernel to run specific tasks in a cell such as run query in specific language (cypher, gremlin), check the status of load job/query, configurations settings, visualisation options etc. &lt;/p&gt;

&lt;p&gt;In a new cell, use the magic command &lt;code&gt;%graph_notebook_config&lt;/code&gt; and execute. This should return a json payload containing connection information for the Neptune host instance the notebook is connected to.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo259v04so3g84i993ib7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo259v04so3g84i993ib7.png" alt="Image description" width="800" height="572"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we want to override any of these (for example if we have set a port different to 8182, then we can copy the json output from the previous cell output and modify the required value. Run the cell with the magic command &lt;code&gt;%%graph_notebook_config&lt;/code&gt; to set the configuration to the new setting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxe4hyq7iehyis2wcvi0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxe4hyq7iehyis2wcvi0.png" alt="Image description" width="800" height="1195"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check the status of the Neptune cluster endpoint is showing as &lt;strong&gt;healthy&lt;/strong&gt; using the &lt;code&gt;%status&lt;/code&gt; magic extension.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wblx76q3g7c91chgsmf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wblx76q3g7c91chgsmf.png" alt="Image description" width="762" height="568"&gt;&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;We can use the Neptune loader command to send a post request to the Neptune endpoint as described &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/load-api-reference-load.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;. For the request parameters we will use the following: &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;source&lt;/strong&gt; : "s3://redshift-worldwide-events/",&lt;br&gt;
&lt;strong&gt;format&lt;/strong&gt; : "opencypher",&lt;br&gt;
&lt;strong&gt;iamRoleArn&lt;/strong&gt; : &lt;br&gt;
&lt;strong&gt;region&lt;/strong&gt; : "us-east-1",&lt;br&gt;
&lt;strong&gt;failOnError&lt;/strong&gt; : "FALSE",&lt;br&gt;
&lt;strong&gt;parallelism&lt;/strong&gt; : "MEDIUM",&lt;br&gt;
&lt;strong&gt;updateSingleCardinalityProperties&lt;/strong&gt; : "FALSE",&lt;br&gt;
&lt;strong&gt;queueRequest&lt;/strong&gt; : "FALSE"&lt;/p&gt;

&lt;p&gt;This will output a &lt;code&gt;loadid&lt;/code&gt; in the payload.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffxd17mhts0x0f22i9s0w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffxd17mhts0x0f22i9s0w.png" alt="Image description" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then we can check the load status, by using the &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/load-api-reference-status-requests.html" rel="noopener noreferrer"&gt;loader get status request&lt;/a&gt;, replacing your &lt;code&gt;neptune endpoint&lt;/code&gt;, &lt;code&gt;port&lt;/code&gt; and &lt;code&gt;loadId&lt;/code&gt; in the command: &lt;code&gt;curl -G https://your-neptune-endpoint:port/loader/loadId&lt;/code&gt; &lt;br&gt;
If successful, you should see an output similar to the payload below. This returns one or more &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/loader-message.html" rel="noopener noreferrer"&gt;loader feed codes&lt;/a&gt;. If the load was successful you should see only  a &lt;strong&gt;LOAD_COMPLETED&lt;/strong&gt; code. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fteouipp7fbh0grjqhl22.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fteouipp7fbh0grjqhl22.png" alt="Image description" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If there is an issue with one or both csvs then you may see a &lt;strong&gt;LOAD_FAILED&lt;/strong&gt; code or one of the other codes listed &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/loader-message.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;. In the next section, we will investigate some options to diagnose the errors. Also if one of the loads is still in progress, you will see a &lt;strong&gt;LOAD_IN_PROGRESS&lt;/strong&gt; key with the value corresponding to the number of S3 object loads which are still in progress. Running the curl command to check the load status again, should hopefully update the code to &lt;strong&gt;LOAD_COMPLETED&lt;/strong&gt; or one of the error codes, if there was an error.&lt;/p&gt;

&lt;p&gt;Check that you can access some data by submitting an openCypher query to the openCypher HTTPS endpoint using curl as explained in the &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-opencypher-queries.html" rel="noopener noreferrer"&gt;docs&lt;/a&gt;. In this case, we will just return a single pair of connected nodes from the database by passing the query &lt;code&gt;MATCH (n)-[r]-(p) RETURN n,r,p LIMIT 1&lt;/code&gt; as the value to the query attribute as in the screenshot below.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; the endpoint is in the format &lt;code&gt;HTTPS://(cluster endpoint):(the port number)/openCypher&lt;/code&gt;. Your cluster endpoint will be different to mine in the screenshot below, so you will need to copy it from the Neptune dashboard for the database cluster identifier.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllrrc59tuyodyja8mztm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllrrc59tuyodyja8mztm.png" alt="Image description" width="800" height="615"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Debugging Neptune data load errors
&lt;/h2&gt;

&lt;p&gt;Running the check loader status command can sometime return errors. To further diagnose the error logs, we can run this &lt;a href="https://docs.aws.amazon.com/neptune/latest/userguide/load-api-reference-error-logs-examples.html" rel="noopener noreferrer"&gt;curl command&lt;/a&gt; with additional query parameters replacing neptune-endpoint, port and loadid with your values. This will give a more detailed response with  an errorLogs object listing the errors encountered as shown in the screenshot below. Here, the load failed because some of the node ids in the edge records in the relationship csv file were missing in the node csv file.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi8auoc0mh7yav16bb657.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi8auoc0mh7yav16bb657.png" alt="Image description" width="800" height="671"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next screenshot below shows a cardinality violation error because some of the edge record ids in the original data are duplicated.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2xbdiywlxawfhnvmg6q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2xbdiywlxawfhnvmg6q.png" alt="Image description" width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can also reset the db and remove any existing data in it by using the magic command &lt;code&gt;%db_reset&lt;/code&gt;. This will prompt you to &lt;br&gt;
tick an acknowledgement option and click Delete. You will then get a &lt;code&gt;checking status check&lt;/code&gt;. Wait for this to complete and the you should get a &lt;code&gt;database has been reset&lt;/code&gt; message when it is complete.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8r195c6ryvi5ywabedqu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8r195c6ryvi5ywabedqu.png" alt="Image description" width="800" height="279"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We are now setup for running more complex queries to generate insights from our data. &lt;a href="https://dev.to/aws-builders/aws-neptune-for-analysing-event-ticket-sales-between-users-part-2-3i5g"&gt;Part 2&lt;/a&gt; of this blog will run a number of openCypher queries to explore the property graph containing the model of the worldwide events network&lt;/p&gt;

</description>
      <category>neptune</category>
      <category>cypher</category>
      <category>serverless</category>
      <category>graphs</category>
    </item>
    <item>
      <title>Data Analysis with Redshift Serverless and Quicksight - Part 2</title>
      <dc:creator>Ryan Nazareth</dc:creator>
      <pubDate>Sat, 13 May 2023 23:18:09 +0000</pubDate>
      <link>https://forem.com/aws-builders/data-analysis-with-redshift-serverless-and-quicksight-part-2-5c9o</link>
      <guid>https://forem.com/aws-builders/data-analysis-with-redshift-serverless-and-quicksight-part-2-5c9o</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/aws-builders/data-analysis-with-redshift-serverless-and-quicksight-part-1-1lg8"&gt;first part&lt;/a&gt; of this blog, we have introduced Redshift Serverless offering and setup a workgroup and namespace configured with datasharing to allow access to AWS Marketplace Data Exchange. In the second part of this blog, we will focus on the accessing the data from Amazon Quicksight to generate interactive visualisations and explore some other features it has to offer. &lt;/p&gt;

&lt;p&gt;Amazon Quicksight is a fully managed business intelligence (BI) service which allows users  to publish dashboards and share amongst team members. Since it is a serverless offering, it scales to tens of thousands of users without having to worry about managing underlying infrastructure. It can connect to a wide variety of data sources in the cloud (S3, RDS, Redshift, Athena to name o few) and on-prem. &lt;/p&gt;

&lt;p&gt;It also provides some more advanced features such as integrating machine learning insights with dashboards in the form of forecasting, anomaly detection and natural language querying. We will explore some of these in this blog.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up a Quicksight subscription
&lt;/h2&gt;

&lt;p&gt;If this is the first time using Quicksight, you will need to set up an account. Sign into your IAM user account and then navigate to Quicksight service. Follow the instructions &lt;a href="https://dev.toUse%20IAM%20federated%20identities%20and%20QuickSight-managed%20users"&gt;here&lt;/a&gt; and choose the options to setup an enterprise account with method of authentication as federated users and QuickSight-managed role and grant access to Redshift. You will get a free trial subscription for 30 days for a Standard or Enterprise subscription.  If your free trial has expired, then you can sign up for one of the cheaper pay as you go Reader subscriptions which is only charged for an active session and can be stopped after this tutorial is complete. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbhesqbo1mowu1b9kmins.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbhesqbo1mowu1b9kmins.png" alt="Image description" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once you have setup the subcription, you should be able to &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/signing-in.html" rel="noopener noreferrer"&gt;sign into Quicksight as an IAM user&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/using-quicksight-menu-and-landing-page.html" rel="noopener noreferrer"&gt;manage your account&lt;/a&gt; by choosing the user icon at the upper right of the page and select &lt;code&gt;Manage Quicksight&lt;/code&gt;. You can now check your active subscription or as admin user, invite users to your account if required and manage permissions accordingly.&lt;/p&gt;

&lt;p&gt;Quicksight also uses &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/managing-spice-capacity.html" rel="noopener noreferrer"&gt;SPICE&lt;/a&gt; to run fast in-memory computations on data for visual analytics. For Enterprise subscriptions, data is also encrypted at rest by default. By default, we get a total 11GB SPICE capacity per region per subscription account, which can be shared amongst Quicksight users added to the account. We will be loading data from the Redshift into SPICE for this tutorial and the capacity would be more than enough.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gr63z1ywh7aaye2zxup.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gr63z1ywh7aaye2zxup.png" alt="Image description" width="800" height="227"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting to Redshift
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;On the Amazon QuickSight start page choose Datasets from the options on the left and on the Datasets page, choose the New data set option on the top right (screenshot below).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbl3u5ld1yqbxf5pnjblg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbl3u5ld1yqbxf5pnjblg.png" alt="Image description" width="800" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the new window, choose the Redshift Manual connect icon. 
A new window will pop up requiring the the connection information for the data source to be filled in. &lt;/li&gt;
&lt;li&gt;For Data source name, enter a name for the data source.&lt;/li&gt;
&lt;li&gt;For Database server, you will need to retrieve the endpoint 
of the cluster. You can get the endpoint value from the Endpoint field on in the general information section when clicking on the cluster workgroup in the Redshift Serverless dashboard. The server address is the first part of the endpoint before the colon as highlighted in yellow below.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnpw40w7wjmhzt0vycshe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnpw40w7wjmhzt0vycshe.png" alt="Image description" width="800" height="217"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The port will be default port for redshift (5439) unless this was set differently in setup, in which case confirm from the endpoint address (the number following the first colon).&lt;/li&gt;
&lt;li&gt;Enter the name of the database (after the second colon in the endpoint). In my case, it is dev.&lt;/li&gt;
&lt;li&gt;For UserName and Password, enter the user name and password you configured in part1 of this blog when setting up the redshift cluster&lt;/li&gt;
&lt;li&gt;Click on &lt;code&gt;Validate Connection&lt;/code&gt;. If successful, you should see a green tick, saying validated. If this has failed check that you have done the following things: 

&lt;ul&gt;
&lt;li&gt;Check that the security group attached to the Redshift cluster allows inbound traffic from IP address range associated with the region Quicksight was setup in as explained in the previous blog. &lt;/li&gt;
&lt;li&gt;Did you forget to make the VPC that the Redshift cluster resides in publicly available ?&lt;/li&gt;
&lt;li&gt;Check that you are using the correct username and/or password (this can be reset from the Redshift dashboard).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Assuming everything worked, click Create DataSource. &lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhghm48xxam2481mud6k1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhghm48xxam2481mud6k1.png" alt="Image description" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You will be presented with the schema and set of tables to connect to. The view &lt;code&gt;worldwide_events_vw&lt;/code&gt; created in the previous blog , should be visible. Select this and click next.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6ytwkuzs5cf7br701ca.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6ytwkuzs5cf7br701ca.png" alt="Image description" width="800" height="663"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the next pop up, we need to select whether we want to directly query the dataset from source or use the table data as-is and import into SPICE. The latter is the recommended method as it improves performance and quicker analytics, provided you have enough SPICE capacity. Select the &lt;code&gt;Import to SPICE&lt;/code&gt; option &lt;/li&gt;
&lt;li&gt;If you do not want to be emailed when a refresh fails, then untick the box. Then choose &lt;code&gt;Visualize&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxe59n5vmnxvxyxh6mp8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxe59n5vmnxvxyxh6mp8x.png" alt="Image description" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Accept the default settings for creating a new sheet and you should now be presented with the dashboard for creating the charts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Insights
&lt;/h2&gt;

&lt;p&gt;Quicksight offers a number of &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/working-with-visual-types.html" rel="noopener noreferrer"&gt;visual types&lt;/a&gt; which can be selected from the visual types pane using the representative visual icon. The AWS docs on &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/creating-a-visual.html" rel="noopener noreferrer"&gt;creating quicksight visuals&lt;/a&gt; goes through the steps for adding a visual to the dashboard. First we will create a line chart from the visual types to plot the fields caldate and totalprice from the fields list.&lt;/p&gt;

&lt;p&gt;Quicksight allows non technical users to generate forecasts using the built in &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/randomcutforest.html" rel="noopener noreferrer"&gt;Random Cut Forest algorithm&lt;/a&gt; to analyse historical data and generate forecast for the a specified period with a prediction interval of required confidence level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F749083j9i3kya24ufeye.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F749083j9i3kya24ufeye.png" alt="Image description" width="800" height="917"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For forecast length, we set the periods forward to 14&lt;/li&gt;
&lt;li&gt;Set the prediction interval to 90. &lt;/li&gt;
&lt;li&gt;Set the seasonality to &lt;code&gt;auto&lt;/code&gt; and leave the other settings as the default values. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We get a wide confidence interval which suggests that the forecast could lie anywhere within that range. A smaller value of prediction interval will generate a narrower band but will give less confidence in the forecast.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq04o4c0hklifmrirpa13.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq04o4c0hklifmrirpa13.png" alt="Image description" width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can also generate a forecast for a period in history and compare it to actual data. To do this, edit the forecast and for the forecast length setting, set the periods forward option to 0 and the periods backward setting to 100. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foejdvg5xp1mbhb7vk1qa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foejdvg5xp1mbhb7vk1qa.png" alt="Image description" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Amazon Quicksight also provides users with ML powered anomaly insights by analysing a number of combinations of metrics and trends in data. The &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/anomaly-detection-outliers-and-key-drivers.html" rel="noopener noreferrer"&gt;concepts for detecting outliers&lt;/a&gt; are based on whether an extreme data point occurs by random chance or is a significant event.  Quicksight notifies users when there are any anomalies in the visuals and whether they are worth investigating. Click on the bulb icon in the top right hand corner of the chart. You will see the largest anomaly detected in the time series via ML insights. Click on the more options and then view more details. On the left hand panel, you should see a list of anomalies with additional statistics on the percentage change from average expected total price. Click 'Add anomaly to sheet'&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4mle7i5ee8us1mbklzn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4mle7i5ee8us1mbklzn.png" alt="Image description" width="800" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This will open an insight widget in the same sheet. Click get started in the widget. You are now taken to a &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/anomaly-detection-adding-anomaly-insights.html" rel="noopener noreferrer"&gt;configuration screen with preview&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8rlw45dsp572tlq9ak4t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8rlw45dsp572tlq9ak4t.png" alt="Image description" width="800" height="776"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Amazon Quicksight provides &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/anomaly-detection-adding-key-drivers.html" rel="noopener noreferrer"&gt;contributions analysis&lt;/a&gt; (key drivers) that contribute to the anomalous outcomes. Expand the top contributors option and tick upto 4 features to use as key drivers for running contribution analysis. The screenshot below shows the results for day, eventname, month and venuecity. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcda0peczrfwnhvsqbf05.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcda0peczrfwnhvsqbf05.png" alt="Image description" width="800" height="319"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Choose Save to confirm your choices. You are taken back to the&lt;br&gt;
insight widget, where you can select Run now to run the anomaly detection and view your insight. This will take a few minutes to complete. Once complete you should see an update in the widget with the latest anomaly detected and an option to explore anomalies, which you can click.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3iqkvdtue19n4o5io1bm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3iqkvdtue19n4o5io1bm.png" alt="Image description" width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This will open the anomalies screen as in the screenshot below. Select &lt;code&gt;SHOW ANOMALIES BY DATE&lt;/code&gt; to display the The Number of anomalies chart which shows outliers detected over time. We can see two outliers detected end of May and end of June. On the left pane, we can re-run contribution analysis if required with different set of key drivers. In the screenshot below, I have run this between May 26, 2008 and May 27, 2008 (corresponding to the first anomaly) and selected &lt;code&gt;eventname&lt;/code&gt; and &lt;code&gt;eventcity&lt;/code&gt;. We can also explore &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/anomaly-exploring.html" rel="noopener noreferrer"&gt;anomalies per category or dimension&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57masa44xlo5hrnhdlqi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57masa44xlo5hrnhdlqi.png" alt="Image description" width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The dashboard below uses the &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/bar-charts.html" rel="noopener noreferrer"&gt;vertical bar chart&lt;/a&gt;,&lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/histogram-charts.html" rel="noopener noreferrer"&gt;histogram&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/box-plots.html" rel="noopener noreferrer"&gt;boxplot&lt;/a&gt; visual types. The bar chart shows the total number of tickets for each quarter in the year split by each of the even days in the week. We can see that the first quarter (Jan-March) has the least number of tickets sold and quarter 3 has a larger variation in tickets sold across the week, with Sunday being the most day for events. In the last 3 months of the year, we have more tickets sold between Friday-Monday compared to rest of the week.&lt;/p&gt;

&lt;p&gt;Jan is the month in the first quarter where the range and median of total transactions for an event were the least possibly due to fewer tickets sold. We can see a strong right skew in February with a long upper whisker. For the rest of the months, the median remains consistent between £12k - £15k. November showed a slight left skew and the maximum transaction values for a given event of just over £32k were seen for Dec, Feb and May.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmj4z2k5wgad876ao2k1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmj4z2k5wgad876ao2k1.png" alt="Image description" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This sheet contains &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/tree-map.html" rel="noopener noreferrer"&gt;tree map&lt;/a&gt; with &lt;code&gt;venuename&lt;/code&gt; dimension arranged by &lt;code&gt;total_tickets&lt;/code&gt; (rectangle size) and color encoded by &lt;code&gt;venueseats&lt;/code&gt;. The larger the venue size the darker the shade of green (e.g FedEx field, New York Giants stadium,,Arrowhead stadium) whilst the smaller the venue the brighter shade of yellow (e.g. Shoreline Ampitheatre). We can see that some of the events smaller venue sizes between 20k-50k sold a larger number of ticket (possibly because more events held at these venues during this timeframe). The &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/pie-chart.html" rel="noopener noreferrer"&gt;pie chart&lt;/a&gt; shows the proportion of total transactions for the top &lt;code&gt;eventname&lt;/code&gt;.Here the top 6 events are represented and the rest grouped in 'others' category. Greg Kihn and Yaz (Yazoo) bands accounted for more than 65% of total transaction sales. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnf960tba7y3t027xmdgp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnf960tba7y3t027xmdgp.png" alt="Image description" width="800" height="730"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Deleting Resources
&lt;/h3&gt;

&lt;p&gt;Finally, remember to delete all resources in Redshift and stop the Quicksight subscription created in both parts of this blog to avoid being charged further. &lt;strong&gt;Note&lt;/strong&gt; that for Redshift Serverless although you do not pay for compute capacity when you do not run any queries, you still pay for storage (more details can be found &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-billing.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;). &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To delete the Quicksight Enterprise subscription follow the instructions &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/managing-subscriptions.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;. You can also &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/export-dashboard-to-pdf.html" rel="noopener noreferrer"&gt;export&lt;/a&gt; the dashboard to pdf and &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/deleting-a-dashboard.html" rel="noopener noreferrer"&gt;delete&lt;/a&gt; the dashboard if required.&lt;/li&gt;
&lt;li&gt;The Redshift Serverless workgroup and associated namespace can be deleted by following these &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-console-namespace-delete.html" rel="noopener noreferrer"&gt;instructions&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>serverless</category>
      <category>visualisation</category>
      <category>quicksight</category>
      <category>data</category>
    </item>
    <item>
      <title>Data Analysis with Redshift Serverless and Quicksight - Part 1</title>
      <dc:creator>Ryan Nazareth</dc:creator>
      <pubDate>Sat, 13 May 2023 23:17:28 +0000</pubDate>
      <link>https://forem.com/aws-builders/data-analysis-with-redshift-serverless-and-quicksight-part-1-1lg8</link>
      <guid>https://forem.com/aws-builders/data-analysis-with-redshift-serverless-and-quicksight-part-1-1lg8</guid>
      <description>&lt;p&gt;In the first part of this blog, we will focus mainly on setting up Redshift serverless cluster and configuring access to external WorldWide Event Attendance Data Exchange via Redshift DataSharing Feature so it can be accessed from the database in the cluster provisioned. We will then run some queries and unload data to S3. The other features provided by Redshift such as cluster performance monitoring, data recovery and guarding against surprise bills will also be touched upon. In the &lt;a href="https://dev.to/aws-builders/data-analysis-with-redshift-serverless-and-quicksight-part-2-5c9o"&gt;second part&lt;/a&gt;, we will setup Quicksight and connect to our Redshift cluster, to access the data and build dashboards to generate some interesting insights. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; that the queries would cost between $20-$30 as we are using the entire dataset with minimal filtering. Serverless will try and optimise the computation by scaling to more RPUs which will increase cost. In addition, you will also be charged for data storage. This is still well within the free trial. You can adjust the RPU base capacity or set usage limits which is explained more in the &lt;code&gt;Data Recovery, Monitoring and Cost Management&lt;/code&gt; section.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/marketplace/pp/prodview-4ozlpl4r3k7cg#overview" rel="noopener noreferrer"&gt;Worldwide Event Attendance&lt;/a&gt; is a free product available in AWS Marketplace, which allows subscribers to query, analyse and build applications quickly. Instructions on how to subscribe to this product can be found &lt;a href="https://docs.aws.amazon.com/data-exchange/latest/userguide/subscriber-tutorial-RS-product.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;br&gt;
Once subscribed to, we need to create a datashare in the Redshift cluster to access the data immediately. &lt;/p&gt;

&lt;p&gt;We will use Redshift Serverless, the serverless offering of Redshift which removes the need for setting up and managing underlying cluster specs and scaling. All new users get free $300 credits for a trial period of 3 months. Sign in to the Redshift console and select &lt;a href="https://aws.amazon.com/redshift/free-trial/" rel="noopener noreferrer"&gt;Serverless free trial&lt;/a&gt;. You are only billed according to capacity used in a given duration (RPU hours), which scales automatically to optimise running the query. There is also a charge for Redshift Managed Storage (RMS). &lt;/p&gt;
&lt;h2&gt;
  
  
  Redshift Serverless Setup
&lt;/h2&gt;

&lt;p&gt;If this is the first time using Redshift Serverless, then need to create a default workgroup. A &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-workgroup-namespace.html" rel="noopener noreferrer"&gt;workgroup&lt;/a&gt; is a collection of compute resources (VPC subnet groups, security groups, RPUs) and can be associated with one namespace - collection of database objects and users comprising tables, users, schemas, KMS keys etc. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sign in to the Redshift Serverless console and choose Create workgroup.&lt;/li&gt;
&lt;li&gt;Specify a value for Workgroup name: e.g. default &lt;/li&gt;
&lt;li&gt;In Network and Security option, choose VPC and security groups. I have chosen the default VPC and associated security group. In addition, I also created another security group to allow access to Quicksight, which we will need to later connect to our database to access the data and create dashboards. Setting up the security group for this can be found &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/enabling-access-redshift.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The security group you choose should have an inbound rule to allow traffic to port 5439 (default Redshift port) from CIDR address range where Quicksight was created. This can be looked up &lt;a href="https://docs.aws.amazon.com/quicksight/latest/user/regions.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;. For example, if Quicksight is configured in &lt;code&gt;us-east-1&lt;/code&gt;, the associated IP address range for data source connectivity is &lt;code&gt;52.23.63.224/27&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpsbww7xmfkptf7lhe4dd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpsbww7xmfkptf7lhe4dd.png" alt="Image description" width="800" height="943"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select Create a New namespace and specify the name&lt;/li&gt;
&lt;li&gt;Under admin user credentials, setup a username and password to connect to the database in the cluster. We will use this later when manually connecting to the redshift endpoint from Quicksight. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F430617wxc433cnggxj4x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F430617wxc433cnggxj4x.png" alt="Image description" width="800" height="765"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will now need to create a role to associate with redshift serverless endpoint so it can unload data into S3.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the Permissions section,  Click the Create IAM role option in the Manage IAM roles dropdown in the Associate Roles subsection.
Select the option Specific S3 buckets and select the S3 bucket created for unloading the nodes and relationship data to. Then click Create IAM role as default. This will create an IAM role as the default with AWS managed policy AmazonRedshiftAllCommandsFullAccessattached. This includes permissions to run SQL commands to COPY, UNLOAD, and query data with Amazon Redshift Serverless.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwe6vcxtw7toguh7bpn2s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwe6vcxtw7toguh7bpn2s.png" alt="Image description" width="800" height="752"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Now back in the Associate IAM roles section, select Associate IAM role and tick the role just created.&lt;/li&gt;
&lt;li&gt;By default, KMS encryption is provided with AWS owned key. The Encryption and security section can be skipped unless you want to provide your own KMS key for encryption and enable database logging.&lt;/li&gt;
&lt;li&gt;Click Next&lt;/li&gt;
&lt;li&gt;In the Review Step, check that all the options and configuration are set correctly and select Create.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1aa7kdjnec5f8l1e66s5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1aa7kdjnec5f8l1e66s5.png" alt="Image description" width="800" height="1062"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgoe1kvqv2yb08okajv7t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgoe1kvqv2yb08okajv7t.png" alt="Image description" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now go back to the Serverless Dashboard and check the workgroup list to see the workgroup and namespace created, with status showing as 'Available'. &lt;/p&gt;

&lt;p&gt;We will also need to connect Quicksight to our Redshift endpoint. Hence we will need to make the database publicly accessible to allow access from outside the VPC. The VPC security group we configured earlier should have an inbound rule to only allow access from the Quicksight. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the navigation panel on the left, click on Workgroup configuration and select the Workgroup created&lt;/li&gt;
&lt;li&gt;In the Network and security panel under the Data access tab, the Publicly accessible option by default is turned off. Click the edit button (highlighted in yellow in the screenshot) and Turn on Public Accessible option.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; if our Redshift cluster was in a private subnet, we would need to create a private connection from Quicksight to the VPC in which the cluster is located, as described &lt;a href="https://repost.aws/knowledge-center/quicksight-redshift-private-connection" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcvveunh9ta62qfkmkts.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcvveunh9ta62qfkmkts.png" alt="Image description" width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Accessing the DataShare
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/redshift/features/data-sharing/" rel="noopener noreferrer"&gt;Redshift data sharing&lt;/a&gt; allows you to share live data across clusters with different accounts and regions with relative ease. This also decouples storage and compute and ensures access to live data and consistency, without the need to copy or move data.&lt;/p&gt;

&lt;p&gt;We now want to create a datashare to access the Worldwide Event Attendance data exchange from our cluster. Carry out the following steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to the Redshift Serverless  dashboard in the Amazon Redshift console and select default namespace.&lt;/li&gt;
&lt;li&gt;Navigate to the Datashares tab and scroll down to subscriptions to AWS Data Exchange datashares. &lt;/li&gt;
&lt;li&gt;Click on the datashare &lt;code&gt;worldwide_event_test_data&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Choose Create database from datashare.&lt;/li&gt;
&lt;li&gt;In the Create database from datashare pop-up, specify &lt;code&gt;worldwide_event_test_data&lt;/code&gt; as the Database name.&lt;/li&gt;
&lt;li&gt;Choose Create. You will see a message confirming successful database creation. You are now ready to run read-only queries on this database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbn1l3ag97vzyeirdhj6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbn1l3ag97vzyeirdhj6.png" alt="Image description" width="800" height="848"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Running Example Queries in the Editor
&lt;/h3&gt;

&lt;p&gt;In order to successfully query the datashare database, you are required to connect to redshift cluster using the cluster native database first and then use cross-database query notation &lt;code&gt;&amp;lt;shareddatabase&amp;gt;.&amp;lt;schema&amp;gt;.&amp;lt;object&amp;gt;&lt;/code&gt; to query the data in shared database.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to the Redshift query editor v2 page. Select your Serverless workgroup (default), and you will be presented a window to select authentication as a Federated User (which will generate temporary credentials) or provide Database Username and Password that was setup during the cluster setup.&lt;/li&gt;
&lt;li&gt;If selecting Federated User, set database name to the database in your cluster 'dev'. Then click save&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbk7lvudt5dt0usftfg8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbk7lvudt5dt0usftfg8.png" alt="Image description" width="800" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For Database Username and Password authentication, set the database name 'dev' and input your usernmame and password. Then click save.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zjmm272ksars4l7larc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zjmm272ksars4l7larc.png" alt="Image description" width="800" height="751"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This will also save the credentials in AWS Secrets In the Secrets Manager console, choose Secrets, and then choose the secret. Scroll down to the Secret Value section in the Secrets Detail page and click &lt;code&gt;Retrieve Secret Value&lt;/code&gt; on the right hand side. You will see the username and password saved as key value pairs for the associated &lt;code&gt;engine&lt;/code&gt;, &lt;code&gt;dbname&lt;/code&gt;, &lt;code&gt;port&lt;/code&gt; and &lt;code&gt;dbClusterIdentifer&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqgji8d21dz6pwrykuru.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqgji8d21dz6pwrykuru.png" alt="Image description" width="800" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will wonder why we just do not connect directly to the &lt;code&gt;worldwide_event_data_exchange&lt;/code&gt; datashare. This is because Amazon Redshift data sharing has the following considerations, as detailed in the &lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/considerations.html" rel="noopener noreferrer"&gt;docs&lt;/a&gt; :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connecting directly to datashare database is not possible. &lt;/li&gt;
&lt;li&gt;As a datashare user, you can still only connect to your local cluster database. &lt;/li&gt;
&lt;li&gt;Creating databases from a datashare does not allow you to connect to them, but you can read from them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you try and connect directly as a federated user for example, then you will get the following error. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqp2pmfd94t1mc46wj0oe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqp2pmfd94t1mc46wj0oe.png" alt="Image description" width="800" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We are now ready to run queries in the editor. Before moving on, as &lt;strong&gt;mentioned previously&lt;/strong&gt; running all the queries once to join all the tables with most of the data will incur a cost (probably less than $20) and still within the free credits. Should you wish to control this you can reduce the &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-capacity.html" rel="noopener noreferrer"&gt;RPU base capacity&lt;/a&gt; which defaults to 128 RPUs when creating the cluster (where 1 RPU provides 16 GB memory). This is discussed in the &lt;code&gt;Data Recovery, Monitoring and Cost Management&lt;/code&gt; section.&lt;/p&gt;

&lt;p&gt;In the query editor, run the sql block below. This will join the event, sales, venue, category and date tables and create a view of the results for the aggregated ticket price, commissions and total number of ticket sold for a given event in venue on a calendar date.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;worldwide_events_vw&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;caldate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;day&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qtr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;holiday&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eventname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;catname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="n"&gt;venuename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;venuecity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;venueseats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pricepaid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;commission&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_commission&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qtysold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_tickets&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"worldwide_event_data_exchange"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"event"&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"worldwide_event_data_exchange"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"sales"&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; 
&lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eventid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eventid&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"worldwide_event_data_exchange"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"venue"&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;venueseats&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;venue&lt;/span&gt; 
&lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;venueid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;venue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;venueid&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"worldwide_event_data_exchange"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"category"&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cat&lt;/span&gt;
&lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;catid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;catid&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"worldwide_event_data_exchange"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"date"&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;datetable&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;datetable&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dateid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dateid&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;caldate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;day&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qtr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;holiday&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eventname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;catname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;venuename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;venuecity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;venueseats&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;no&lt;/span&gt; &lt;span class="k"&gt;schema&lt;/span&gt; &lt;span class="n"&gt;binding&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By default, creating views from external tables is not supported and will throw an error as below. The &lt;code&gt;with no schema binding&lt;/code&gt; statement at the end of the query is to allow us to create the view successfully.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdjobufp9v5ks6gicntzf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdjobufp9v5ks6gicntzf.png" alt="Image description" width="800" height="105"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the query ran successfully, the view should be visible from the check that the view is visible under the view dropdown after refresh (highlighted in blue in the screenshot below). Now lets check the data in the view, by running a &lt;code&gt;SELECT * FROM "public"."worldwide_events_vw"&lt;/code&gt;. The results should be similar to the screenshot below. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvp21sf34w8z7847x19i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvp21sf34w8z7847x19i.png" alt="Image description" width="800" height="352"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Unloading data to S3
&lt;/h2&gt;

&lt;p&gt;Create a S3 bucket or use an existing one if you wish. I have created a new S3 bucket redshift-worldwide-events with an events folder. The &lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html" rel="noopener noreferrer"&gt;UNLOAD&lt;/a&gt; SQL command allows users to unload the results of the query to a S3 bucket. This requires the query, S3 path and IAM role arn to allow permissions to write to the S3 bucket. In addition we will be adding the following extra options below with the command, to create a single csv file to include the header.&lt;/p&gt;

&lt;p&gt;CSV DELIMITER AS: to use csv format with delimiter as ','&lt;br&gt;
HEADER: specify first row as header row&lt;br&gt;
CLEANPATH: to remove any existing S3 file before unloading new query&lt;br&gt;
PARALLEL OFF: turn off parallel writes as we want a single CSV files rather than multiple partitions. &lt;/p&gt;

&lt;p&gt;So the query will look like below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;unload&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'SELECT * FROM worldwide_events_vw'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="s1"&gt;'s3://redshift-worldwide-events/events/worldwide_events'&lt;/span&gt; 
&lt;span class="n"&gt;iam_role&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;your-iam-role-arn&amp;gt;'&lt;/span&gt;
&lt;span class="n"&gt;CSV&lt;/span&gt; &lt;span class="k"&gt;DELIMITER&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="s1"&gt;','&lt;/span&gt; 
&lt;span class="n"&gt;HEADER&lt;/span&gt; 
&lt;span class="n"&gt;cleanpath&lt;/span&gt; 
&lt;span class="n"&gt;parallel&lt;/span&gt; &lt;span class="k"&gt;off&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The s3 bucket path needs to be in the format &lt;code&gt;&amp;lt;s3://object-path/name-prefix&amp;gt;&lt;/code&gt;. So if the bucket created is &lt;code&gt;redshift-worldwide-events&lt;/code&gt; with an events folder, then the object path becomes &lt;code&gt;redshift-worldwide-events/events&lt;/code&gt;. The 'name-prefix' is the object name prefix to set which gets concatenated with a slice number (if it is a single file then 000). So if the 'name-prefix' is set as &lt;code&gt;worldwide_events&lt;/code&gt; then the object stored will be names &lt;code&gt;worldwide-events000&lt;/code&gt;. The iam role arn will be the arn of the role we associated with the redshift cluster when setting up. You can find it in the dashboard by navigating to namespace configuration (highlighted in yellow in screenshot below).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9k3aaj906zvinkgf9lxu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9k3aaj906zvinkgf9lxu.png" alt="Image description" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the query is successful, you should see a success message in the editor as below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F493q4sq19sjhyckib1f6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F493q4sq19sjhyckib1f6.png" alt="Image description" width="800" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Navigate to the s3 bucket and check that the file is visible. This can now be downloaded or accessed via other AWS services for further analysis as required. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fngi2jy8l2wf1dmexw60j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fngi2jy8l2wf1dmexw60j.png" alt="Image description" width="800" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Recovery, Monitoring and Cost Management
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-snapshots-recovery.html" rel="noopener noreferrer"&gt;Recovery points&lt;/a&gt; in Amazon Redshift Serverless are created approximately every 30 minutes and saved for 24 hours.&lt;br&gt;
In the Redshift Serverless console, the data backup tab shows the list of database backups which can be restored if required in case there is a failure. Given that backups are only available for 24 hours, we can also create a snapshot from a backup to be used at a later time if required. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzwc7lh264v2cnp0vm4l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzwc7lh264v2cnp0vm4l.png" alt="Image description" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can monitor &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/metrics.html" rel="noopener noreferrer"&gt;cluster performance&lt;/a&gt; via CloudWatch metrics in the Redshift console and CloudWatch, such as CPU utilization, latency etc for monitoring cluster performance. In addition, we can also monitor database query and load events directly in the console at a 1 minute resolution. The screenshot below shows the RPU capacity used for some of the queries executed. As the number of queries increases, Redshift Serverless automatically scales to optimise performance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa6b4hm30624ap12ipqth.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa6b4hm30624ap12ipqth.png" alt="Image description" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Based on usage patterns, we can monitor usage and control billing by updating the &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-capacity.html" rel="noopener noreferrer"&gt;Base Capacity&lt;/a&gt; and Max RPU hours per day, week or month in the respective sections of the workgroup configuration dashboard as shown in the screenshot below. The default RPU is 128 which can be reduced to a minimum of 8 for simpler queries on smaller data. Under the &lt;code&gt;Manage Usage limits&lt;/code&gt; we can set a maximum RPU-hours limit by frequency and decide what action needs to be taken if breached (e..g turn off user queries, send an alert). Details on how to configure this is described in more detail in the &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-billing.html" rel="noopener noreferrer"&gt;docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq35sggwj93563kzkxnxz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq35sggwj93563kzkxnxz.png" alt="Image description" width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cloudwatch alarms can also be set up from the Redshift serverless dashboard by choosing alarms from the navigation menu and selecting create alarm. We can then set a alarm for a metric to track in either the namespace or workgroup. In the screenshot below, I have set an alarm to monitor compute capacity and trigger when it exceeds a threshold of 80 RPU for 10 periods of 1 minute each. However, you can lower the threshold and/or number of consecutive periods if required. The minimum duration of a period is 1 minute and can be increased in denominations from the dropdown as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzkdpxjtfun6ohsonwkmq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzkdpxjtfun6ohsonwkmq.png" alt="Image description" width="800" height="703"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the serverless dashboard, you can now see the status of the alarm. It will first start off in &lt;code&gt;INSUFFICIENT_DATA&lt;/code&gt; state until it gathers more data to evaluate and change state to either &lt;code&gt;OK&lt;/code&gt; or &lt;code&gt;ALARM&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnnnh7sfhem0p1rk4uozf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnnnh7sfhem0p1rk4uozf.png" alt="Image description" width="800" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clicking on the &lt;code&gt;View in CloudWatch&lt;/code&gt; widget allows users to view more details in the CloudWatch dashboard like the alarm status over the most recent time period and history of the alarm state changes as shown in the screenshot below. For a more detailed explanation on Alarm Periods and Evaluation Periods, please refer to the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html" rel="noopener noreferrer"&gt;AWS documentation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpgpr0dv7ccs3buyos8qh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpgpr0dv7ccs3buyos8qh.png" alt="Image description" width="800" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clicking on the date for the state change in the history tab brings up a summary of the state change in json format. Here we can diagnose further why an alarm changed state from OK to ALARM status for example in the screenshot below. The threshold exceeded 80 RPUs for 10 consecutive datapoints (each evaluated after 1 min period), which was the configuration we set. This was probably a longer running query which required a higher compute capacity (128 RPUs) in this case. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wlq25lxw0bdgrplwuyd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wlq25lxw0bdgrplwuyd.png" alt="Image description" width="800" height="607"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That concludes the first part of this tutorial. Continue to &lt;a href="https://dev.to/aws-builders/data-analysis-with-redshift-serverless-and-quicksight-part-2-5c9o"&gt;part 2&lt;/a&gt; for creating visualisations in Quicksight.&lt;/p&gt;

</description>
      <category>quicksight</category>
      <category>redshift</category>
      <category>serverless</category>
      <category>data</category>
    </item>
    <item>
      <title>Building an entirely Serverless Workflow to Analyse Music Data using Step Functions, Glue and Athena</title>
      <dc:creator>Ryan Nazareth</dc:creator>
      <pubDate>Sun, 26 Feb 2023 18:52:37 +0000</pubDate>
      <link>https://forem.com/aws-builders/building-an-entirely-serverless-workflow-to-analyse-music-data-using-step-functions-glue-and-athena-4j2l</link>
      <guid>https://forem.com/aws-builders/building-an-entirely-serverless-workflow-to-analyse-music-data-using-step-functions-glue-and-athena-4j2l</guid>
      <description>&lt;p&gt;This blog will demonstrate how to create and run an entirely serverless ETL workflow using step functions to execute a glue job to read csv data from S3, carry out transformations in pyspark and writing the results to S3 destination key in parquet format. This will then trigger a glue crawler to create or update tables with the metadata from the parquet files. A successful job run, should then send an SNS notification to a user by email.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qhhktn0pbw9zlnpr76z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qhhktn0pbw9zlnpr76z.png" alt="Image description" width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will use the &lt;a href="http://ocelma.net/MusicRecommendationDataset/lastfm-1K.html" rel="noopener noreferrer"&gt;LastFM dataset&lt;/a&gt; which represents listening habits for nearly 1,000 users. These are split into two &lt;code&gt;tsv&lt;/code&gt; files, one containing user profiles (gender, age, location, registration date) and the other containing details of music tracks each user has listened to, with associated timestamp.&lt;br&gt;
Using aws glue, we can carry out data transformations in pyspark to generate the insights about the users, like the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Number of distinct songs each user has played.&lt;/li&gt;
&lt;li&gt;100 most popular songs (artist and title) in the dataset, with the number of times each was played.&lt;/li&gt;
&lt;li&gt;Top 10 longest sessions (by elapsed time), with the associated information about the userid, timestamp of first and last songs in the session, and the list of songs played in the session (in order of play). A user's “session” will be assumed to be comprised of one or more songs played by that user, where each song is started within 20 minutes of the previous song’s start time.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Glue Notebook and Spark Transformations
&lt;/h3&gt;

&lt;p&gt;We will create a glue job by uploading this &lt;a href="https://github.com/ryankarlos/bigdataeng/blob/master/notebooks/AWS_Glue_Notebook.ipynb" rel="noopener noreferrer"&gt;notebook&lt;/a&gt; to Amazon Glue Studio Notebooks. Before setting up any resources, let's first go through the various code snippets and functions in the notebook to describe the different transformation steps to answer the questions listed above. &lt;/p&gt;

&lt;p&gt;The first cell imports and initializes a GlueContext object, which is used to create a SparkSession to be used inside the AWS Glue job.&lt;br&gt;
Spark provides a number of classes (&lt;code&gt;StructType&lt;/code&gt;, &lt;code&gt;StructField&lt;/code&gt;) to specify the structure of the spark dataframe. &lt;code&gt;StructType&lt;/code&gt; is a collection of &lt;code&gt;StructField&lt;/code&gt; which is used to define the column name, data type and a flag for nullable or not.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql.types import (
    StringType,
    StructField,
    StructType,
    TimestampType,
)
import boto3

sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
client = boto3.client('s3')


SESSION_SCHEMA = StructType(
    [
        StructField("userid", StringType(), False),
        StructField("timestamp", TimestampType(), True),
        StructField("artistid", StringType(), True),
        StructField("artistname", StringType(), True),
        StructField("trackid", StringType(), True),
        StructField("trackname", StringType(), True),
    ]
)

S3_PATH="s3://lastfm-dataset/user-session-track.tsv"
BUCKET="lastfm-dataset"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function will read the LastFM dataset in csv format into a spark dataframe from the S3 bucket &lt;code&gt;lastfm-dataset&lt;/code&gt;, using the S3_PATH and schema definition defined above. We will drop the columns we do not need. The schema is printed below by calling the &lt;code&gt;printSchema()&lt;/code&gt; method of the spark dataframe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def read_session_data(spark):
    data = (
        spark.read.format("csv")
        .option("header", "false")
        .option("delimiter", "\t")
        .schema(SESSION_SCHEMA)
        .load(S3_PATH)
    )
    cols_to_drop = ("artistid", "trackid")
    return data.drop(*cols_to_drop).cache()

df = read_session_data(spark)
df.printSchema()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikqm91tk6iovnunpf2db.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikqm91tk6iovnunpf2db.png" alt="Image description" width="592" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The function &lt;code&gt;create_users_and_distinct_songs_count&lt;/code&gt; will create a list of user IDs, by selecting the columns &lt;code&gt;userid&lt;/code&gt;, &lt;code&gt;artistname&lt;/code&gt; and &lt;code&gt;trackname&lt;/code&gt;, dropping duplicate rows and performing a groupBy count for each &lt;code&gt;userid&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def create_users_and_distinct_songs_count(df: DataFrame) -&amp;gt; DataFrame:
    df1 = df.select("userid", "artistname", "trackname").dropDuplicates()
    df2 = (
        df1.groupBy("userid")
        .agg(count("*").alias("DistinctTrackCount"))
        .orderBy(desc("DistinctTrackCount"))
    )
    return df2

songs_per_user = create_users_and_distinct_songs_count(df)
songs_per_user.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vnvet7fobf4afuzrqo4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vnvet7fobf4afuzrqo4.png" alt="Image description" width="364" height="621"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;create_popular_songs&lt;/code&gt; function performs a &lt;code&gt;GroupBy&lt;/code&gt; count operation for &lt;code&gt;artistname&lt;/code&gt; and &lt;code&gt;trackname&lt;/code&gt; columns and then ordered in descending order of counts with a limit to get the 100 most popular songs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def create_popular_songs(df: DataFrame, limit=100) -&amp;gt; DataFrame:
    df1 = (
        df.groupBy("artistname", "trackname")
        .agg(count("*").alias("CountPlayed"))
        .orderBy(desc("CountPlayed"))
        .limit(limit)
    )
    return df1

popular_songs = create_popular_songs(df)
popular_songs.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyi9sjcpuw2vzjzudgts0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyi9sjcpuw2vzjzudgts0.png" alt="Image description" width="800" height="645"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next snippet will lag the previous timestamp for each user partition (using window function) and compute the difference between current and previous timestamp in a session per user. We then create a session flag (binary flag) for each user, if time between successive played tracks exceeds session_cutoff (20 minutes). A &lt;code&gt;SessionID&lt;/code&gt; column will compute a cumulative sum over the sessionflag column for each user. &lt;/p&gt;

&lt;p&gt;We then group the Spark DataFrame by &lt;code&gt;userid&lt;/code&gt; and &lt;code&gt;SessionID&lt;/code&gt; and compute min and max timestamp as session start and end columns. Then create a session_length (hrs) column which computes the difference between session end and start for each row and convert to hours. Order the DataFrame from max to min session length and limit to top 10 sessions as required.&lt;/p&gt;

&lt;p&gt;To get the list of tracks for each session, join to the original raw dataframe read in and group by &lt;code&gt;userid&lt;/code&gt;, &lt;code&gt;sessionID&lt;/code&gt; and &lt;code&gt;session_length&lt;/code&gt; in hours. Now apply the &lt;code&gt;pyspark.sql&lt;/code&gt; function &lt;code&gt;collect_list&lt;/code&gt; to each group to create a list of tracks for each session.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def create_session_ids_for_all_users(
    df: DataFrame, session_cutoff: int
) -&amp;gt; DataFrame:
    w1 = Window.partitionBy("userid").orderBy("timestamp")
    df1 = (
        df.withColumn("pretimestamp", lag("timestamp").over(w1))
        .withColumn(
            "delta_mins",
            round(
                (
                    col("timestamp").cast("long")
                    - col("pretimestamp").cast("long")
                )
                / 60
            ),
        )
        .withColumn(
            "sessionflag",
            expr(
                f"CASE WHEN delta_mins &amp;gt; {session_cutoff} OR delta_mins IS NULL THEN 1 ELSE 0 END"
            ),
        )
        .withColumn("sessionID", sum("sessionflag").over(w1))
    )
    return df1


def compute_top_n_longest_sessions(df: DataFrame, limit: int) -&amp;gt; DataFrame:
    df1 = (
        df.groupBy("userid", "sessionID")
        .agg(
            min("timestamp").alias("session_start_ts"),
            max("timestamp").alias("session_end_ts"),
        )
        .withColumn(
            "session_length(hrs)",
            round(
                (
                    col("session_end_ts").cast("long")
                    - col("session_start_ts").cast("long")
                )
                / 3600
            ),
        )
        .orderBy(desc("session_length(hrs)"))
        .limit(limit)
    )
    return df1


def longest_sessions_with_tracklist(
    df: DataFrame, session_cutoff: int = 20, limit: int = 10
) -&amp;gt; DataFrame:
    df1 = create_session_ids_for_all_users(df, session_cutoff)
    df2 = compute_top_n_longest_sessions(df1, limit)
    df3 = (
        df1.join(df2, ["userid", "sessionID"])
        .select("userid", "sessionID", "trackname", "session_length(hrs)")
        .groupBy("userid", "sessionID", "session_length(hrs)")
        .agg(collect_list("trackname").alias("tracklist"))
        .orderBy(desc("session_length(hrs)"))
    )
    return df3

df_sessions = longest_sessions_with_tracklist(df)
df_sessions.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7bdvgcmti9qvi29afrk7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7bdvgcmti9qvi29afrk7.png" alt="Image description" width="733" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, the snippet below will convert pyspark dataframe to glue dynamic dataframe and write to s3 bucket in parquet format, using the &lt;code&gt;write_dynamic_frame()&lt;/code&gt; method. By default this method, saves the output files with the prefix &lt;code&gt;part-00&lt;/code&gt; in the name. It would be better to rename this to something simpler. To do this, we can use the &lt;code&gt;copy_object()&lt;/code&gt; method of the boto s3 client to copy the existing object to a new location (using a custom name as suffix .e.g &lt;code&gt;popular_songs.parquet&lt;/code&gt;) within the bucket.The original object can then be deleted using the &lt;code&gt;delete_object()&lt;/code&gt; method.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def rename_s3_results_key(source_key_prefix, dest_key):
    response = client.list_objects_v2(Bucket=BUCKET)
    body = response["Contents"]
    key =  [obj['Key'] for obj in body if source_key_prefix in obj['Key']]
    client.copy_object(Bucket=BUCKET, CopySource={'Bucket': BUCKET, 'Key': key[0]}, Key=dest_key)
    client.delete_object(Bucket=BUCKET, Key=key[0])

def write_ddf_to_s3(df:DataFrame, name: str):
    dyf = DynamicFrame.fromDF(df.repartition(1), glueContext, name)
    sink = glueContext.write_dynamic_frame.from_options(frame=dyf,                                                    connection_type = "s3a",format = "glueparquet",                                  connection_options = {"path": f"s3a://{BUCKET}/results/{name}/", "partitionKeys": []},
                                                        transformation_ctx = f"{name}_sink"
                                                                )
    source_key_prefix = f"results/{name}/run-"
    dest_key = f"results/{name}/{name}.parquet"
    rename_s3_results_key(source_key_prefix, dest_key)
    return sink

write_ddf_to_s3(popular_songs, "popular_songs")
write_ddf_to_s3(df_sessions, "df_sessions")
write_ddf_to_s3(songs_per_user, "distinct_songs")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the next sections we will setup all the resources defined in the architecture diagram and execute the state machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data upload to S3
&lt;/h2&gt;

&lt;p&gt;First we will create a standard bucket &lt;code&gt;lastfm-dataset&lt;/code&gt; from the AWS console to store the source files in and enable &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration.html" rel="noopener noreferrer"&gt;transfer acceleration&lt;/a&gt; in the bucket properties to optimise transfer speed. This will generate a s3-endpoint &lt;code&gt;s3 accelerate.amazonaws.com&lt;/code&gt;, which can be used to upload files to using the cli. Since some of these files are large, it is easier to use the &lt;code&gt;aws s3&lt;/code&gt; commands (such as &lt;code&gt;aws s3 cp&lt;/code&gt;) for uploading to the S3 bucket as this will automatically use multipart upload feature if the file size exceeds 100MB. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1bpksh35paja91nn2t3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1bpksh35paja91nn2t3.png" alt="Image description" width="800" height="76"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1bybclhtiy4lxy3e5063.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1bybclhtiy4lxy3e5063.png" alt="Image description" width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS Glue Job and Crawler
&lt;/h3&gt;

&lt;p&gt;We will then create a glue job by uploading this &lt;a href="https://github.com/ryankarlos/bigdataeng/blob/master/notebooks/AWS_Glue_Notebook.ipynb" rel="noopener noreferrer"&gt;notebook&lt;/a&gt; to Amazon Glue Studio Notebooks. We will first need to create a role for Glue to assume and give permissions to access S3 as below. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjgtia2am5fcdb3aoz1v8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjgtia2am5fcdb3aoz1v8.png" alt="Glue Job role" width="800" height="245"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the Amazon Glue Studio console, choose Jobs from the navigation menu. &lt;/li&gt;
&lt;li&gt;In the Create Job options section, select Upload and then select the &lt;code&gt;AWS_Glue_Notebook.ipynb&lt;/code&gt; file to upload. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnu2z8klyunf5fe0or5wq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnu2z8klyunf5fe0or5wq.png" alt="Image description" width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;On the next screen, name your job as &lt;code&gt;LastFM_Analysis&lt;/code&gt;. Select the glue role created previously in the IAM Role dropdown list. Choose spark kernel and &lt;code&gt;Start Notebook&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ngzcx0c71bi1pb7gfbb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ngzcx0c71bi1pb7gfbb.png" alt="Image description" width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We should see the notebook in the next screen. Click 'Save'.
If we navigate back to the AWS Glue Studio Jobs tab, we should see the new job &lt;code&gt;LastFM_Analysis&lt;/code&gt; created. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx848gmus53zyysom90pc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx848gmus53zyysom90pc.png" alt="Image description" width="800" height="156"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can now setup the &lt;a href="https://docs.aws.amazon.com/glue/latest/dg/console-crawlers.html" rel="noopener noreferrer"&gt;glue crawler&lt;/a&gt; from the AWS Glue console, to include the settings in the screenshot below. This will collect metadata from the glue output parquet files in S3 , and update the glue catalog tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl2para0tusgaxv1keb4c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl2para0tusgaxv1keb4c.png" alt="Image description" width="800" height="670"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  SNS topic
&lt;/h3&gt;

&lt;p&gt;We will also need to set up subscription to SNS topic so that notifications will be sent to an email address by following the &lt;a href="https://docs.aws.amazon.com/sns/latest/dg/sns-email-notifications.html" rel="noopener noreferrer"&gt;AWS docs&lt;/a&gt;. We will setup a separate task at the end of the Step Function workflow to publish to the SNS topic. However, one could alternatively configure S3 event notification for specific S3 keys so that any parquet outputs from the glue job into S3 will publish to SNS topic destination. &lt;/p&gt;

&lt;p&gt;Once you have setup the sub subscription from the console, you should get an email notification, asking you to confirm subscription as below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4iavj3q6p1glx62a69ri.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4iavj3q6p1glx62a69ri.png" alt="Image description" width="800" height="264"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step Function setup and execution
&lt;/h3&gt;

&lt;p&gt;Now we need to create a state machine. This is a workflow in an AWS Step Function, which consists of a set of &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/concepts-states.html" rel="noopener noreferrer"&gt;states&lt;/a&gt;,&lt;br&gt;
each of which represent a single unit of work. The state machine is defined in &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html" rel="noopener noreferrer"&gt;Amazon States Language&lt;/a&gt;, which is a JSON-based notation. In this example, the amazon state language specification is as below. We will use this when creating the state machine in the console.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "Comment": "Glue ETL flights pipeline execution",
  "StartAt": "Glue StartJobRun",
  "States": {
    "Glue StartJobRun": {
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startJobRun",
      "Parameters": {
        "JobName": "LastFM_Analysis",
        "MaxCapacity": 2
      },
      "ResultPath": "$.gluejobresults",
      "Next": "Wait"
    },
    "Wait": {
      "Type": "Wait",
      "Seconds": 30,
      "Next": "Get Glue Job status"
    },
    "Get Glue Job status": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:glue:getJobRun",
      "Parameters": {
        "JobName.$": "$.gluejobresults.JobName",
        "RunId.$": "$.gluejobresults.JobRunId"
      },
      "Next": "Check Glue Job status",
      "ResultPath": "$.gluejobresults.status"
    },
    "Check Glue Job status": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.gluejobresults.status.JobRun.JobRunState",
          "StringEquals": "SUCCEEDED",
          "Next": "StartCrawler"
        }
      ],
      "Default": "Wait"
    },
    "StartCrawler": {
      "Type": "Task",
      "Parameters": {
        "Name": "LastFM-crawler"
      },
      "Resource": "arn:aws:states:::aws-sdk:glue:startCrawler",
      "Next": "Wait for crawler to complete"
    },
    "Wait for crawler to complete": {
      "Type": "Wait",
      "Seconds": 70,
      "Next": "SNS Publish Success"
    },
    "SNS Publish Success": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:*:Default",
        "Message.$": "$"
      },
      "Next": "Success"
    },
    "Success": {
      "Type": "Succeed"
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before creating the state machine, we will also need to create a role for Step Function to assume, with permissions to call the various services e.g Glue, Athena, SNS, Cloudwatch (if logging will be enabled when creating the state machine) etc using AWS managed policies as below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw06fkss2cdgqvxumg6u6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw06fkss2cdgqvxumg6u6.png" alt="Image description" width="800" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the Step Functions console, in the State Machine tab:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select Create State Machine &lt;/li&gt;
&lt;li&gt;Select "Write your workflow in code" with Type "Standard"&lt;/li&gt;
&lt;li&gt;Paste in the state language specification. This will generate a visual representation of the state machine as below, if the definition is valid.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhl1bgyptnn8dm6lw1aku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhl1bgyptnn8dm6lw1aku.png" alt="Image description" width="800" height="485"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select next and then in the "Specify Details" section, fill in the State Machine Name, execution role created previously from the dropdown and turn on Logging to CloudWatch. Then click "Create State Machine"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fys9wh9yj8z5yorakni41.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fys9wh9yj8z5yorakni41.png" alt="Image description" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let us go through what each of the states will be doing. The first task state &lt;code&gt;Glue StartJobRun&lt;/code&gt; will start the glue job &lt;code&gt;LastFM_Analysis&lt;/code&gt; with 2 data processing units (DPUs) capacity as specified in the parameters block. The output of this state is then included in the &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/input-output-resultpath.html" rel="noopener noreferrer"&gt; ResultsPath &lt;/a&gt; as &lt;code&gt;$.gluejobresults&lt;/code&gt; along with the original input. This will give access to glue job metadata like the job id, status, job name to be used as parameters for subsequent states. &lt;/p&gt;

&lt;p&gt;The next state is a &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-wait-state.html" rel="noopener noreferrer"&gt; Wait state &lt;/a&gt; which pauses the execution of the state machine for 30 seconds before proceeding to the next tasks of checking the glue job status for the glue job. Using &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-choice-state.html" rel="noopener noreferrer"&gt; Choice state &lt;/a&gt;, we can add a condition to proceed to the next task (&lt;code&gt;StartCrawler&lt;/code&gt;) if the value of glue job status is &lt;code&gt;SUCCEEDED&lt;/code&gt;, otherwise it loops back to the Wait Task activity and waits for another 30 seconds before repeating the process again. This ensures we only start crawling the data from S3 when the glue job has completed as the output parquet files will be available and ready to be crawled. &lt;/p&gt;

&lt;p&gt;Similarly, after the &lt;code&gt;StartCrawler&lt;/code&gt; task, we can add a wait state to pause step function for 70 seconds (we expect the crawler to have completed in a minute), to ensure that a notification is sent to the SNS topic &lt;code&gt;Default&lt;/code&gt; only when the crawler has completed successfully.&lt;/p&gt;

&lt;p&gt;Now the state machine can be executed. If the step function completes successfully, we should see an output similar to below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgaouqgw1n64vh20lza44.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgaouqgw1n64vh20lza44.png" alt="Image description" width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the glue job is successful, we should see the parquet files in dedicated subfolders in the results folder in the S3 bucket. You should also get a notification to the email subscribed to the SNS topic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2g58uzqy37ukcuwa473t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2g58uzqy37ukcuwa473t.png" alt="Image description" width="800" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcdpqgr9cjosysmz0gl9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcdpqgr9cjosysmz0gl9.png" alt="Image description" width="800" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The catalog tables should be created after successful completion of the crawler. We can now query the tables in Athena as below. The tables could also be accessed via Amazon Quicksight or Tableau for generating visualisation dashboards for further insights.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F25528ictnv2fbhjvzaga.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F25528ictnv2fbhjvzaga.png" alt="Image description" width="800" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffvz3f0ki45mb47uho9g3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffvz3f0ki45mb47uho9g3.png" alt="Image description" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>blockchain</category>
      <category>web3</category>
      <category>crypto</category>
      <category>cryptocurrency</category>
    </item>
    <item>
      <title>What movie to watch next ? Amazon Personalize to the rescue - Part 2</title>
      <dc:creator>Ryan Nazareth</dc:creator>
      <pubDate>Tue, 04 Oct 2022 10:24:56 +0000</pubDate>
      <link>https://forem.com/aws-builders/what-movie-to-watch-next-amazon-personalize-to-the-rescue-part-2-2pj9</link>
      <guid>https://forem.com/aws-builders/what-movie-to-watch-next-amazon-personalize-to-the-rescue-part-2-2pj9</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/aws-builders/recommending-movies-with-amazon-personalize-part-1-4o7a"&gt;first part&lt;/a&gt; of this blog, we used AWS Step Functions to orchestrate a workflow to run a Glue Job on data from S3, trigger an import job in Personalize and train a model (recipe). In this section, we will focus on deploying the model and getting batch and realtime recommendations for movies.The designed architecture is as shown in the screenshot below. Again, all scripts referenced in code snippets in this blog can be found in my &lt;a href="https://github.com/ryankarlos/AWS-ML-services/tree/master/projects/personalize" rel="noopener noreferrer"&gt;github&lt;/a&gt; repository.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4n3ujqm4juvgegbrwkf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4n3ujqm4juvgegbrwkf.png" alt="personalize-recommendation-workflow" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With the &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-new-item-USER_PERSONALIZATION.html" rel="noopener noreferrer"&gt;User-Personalization recipe&lt;/a&gt;, Amazon Personalize generates scores for items based on a user's interaction data and metadata. These scores represent the relative certainty that Amazon Personalize has in whether the user will interact with the item next.  Higher scores represent greater certainty as described in the &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/recommendations.html" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;. Amazon Personalize scores all the items in your catalog relative to each other on a scale from 0 to 1 (both inclusive), so that the total of all scores equals 1. For example, if you're getting movie recommendations for a user and there are three movies in the Items dataset, their scores might be 0.6, 0.3, and 0.1. Similarly, if you have 1,000 movies in your inventory, the highest-scoring movies might have very small scores (the average score would be.001), but, because scoring is relative, the recommendations are still valid. Please refer to the &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/recommendations.html" rel="noopener noreferrer"&gt;docs&lt;/a&gt; for further details on this. &lt;/p&gt;

&lt;h3&gt;
  
  
  Personalize Batch Inference Job
&lt;/h3&gt;

&lt;p&gt;The CloudFormation  stack created in [part 1], should have deployed the necessary resources to run the batch job, i.e a lambda function which gets triggered when input data is added to S3, and creates the batch job in Personalize. The lambda function will automatically trigger either a batch segment job or a batch inference job depending on the filename. A &lt;code&gt;users.json&lt;/code&gt; file, is assumed to have user ids for which we required item recommendations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4638"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"663"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"94"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"3384"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1030"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"162540"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"15000"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"13"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"50"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"80000"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"20000"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"110000"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"5000"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"9000"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"34567"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will trigger a batch inference job using the solution version Arn specified  as the lambda environment variable (defined through the CloudFormation stack parameters) . We will be using the solution version trained using the &lt;strong&gt;USER_PERSONALIZATION&lt;/strong&gt; recipe. An &lt;code&gt;items.json&lt;/code&gt; file on the other hand, will trigger a batch segment job, and should be of the format as below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1240"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"33794"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"89745"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"89747"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"89753"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1732"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"8807"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"7153"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"44"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"165"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"307"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"306"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"457"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"586"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"588"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"589"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"596"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will return a list of users with highest probabilites for recommending the items to. &lt;strong&gt;Note&lt;/strong&gt; that a batch segment job requires the solution to be trained with a &lt;strong&gt;USER_SEGMENTATION&lt;/strong&gt; recipe and will throw an error if another recipe is used. This will require training a new solution with this recipe and is beyond the scope of this tutorial.The lambda config should look as below with the event trigger set as S3.&lt;/p&gt;

&lt;p&gt;A second lambda function which runs a transform operation when the  results from the batch job are added to S3 from Personalize. If successful, a notification is sent to SNS topic configured with email as endpoint to send alert when the workflow completes.  The output of the batch job from personalize, returns a json in the following format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"recommendedItems"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="err"&gt;....&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="nl"&gt;"scores"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="err"&gt;....&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"2"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"recommendedItems"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="err"&gt;....&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="nl"&gt;"scores"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;......&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;.....&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the transformation, we intend to return a structured dataset serialised in parquet format (with snappy compression), with the following schema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;userID&lt;/strong&gt;: integer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommendations&lt;/strong&gt;: string&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The movie id is mapped to the title and associated genre and release year or each user as below . Each recommendation is separated by a &lt;code&gt;|&lt;/code&gt; delimiter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   userId                           Recommendations
0    1       Movie Title (year) (genre) | Movie Title (year) (genre) | ....
1    2       Movie Title (year) (genre) | Movie Title (year) (genre) | ....
......
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This also uses a lambda layer with the AWS managed DataWrangler layer, so the pandas and numpy libraries are available. The configuration, should look like below, with the lambda layer and destination as SNS.&lt;/p&gt;

&lt;p&gt;To trigger the batch inference job workflow, copy the sample &lt;code&gt;users.json&lt;/code&gt; batch data to s3 path below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;datasets/personalize/ml-25m/batch/input/users.json s3://recommendation-sample-data/movie-lens/batch/input/users.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a batch inference job with the job name having the unixtimestamp affixed to the end. We should receive a notification via email, when the entire workflow completes. The outputs of the batch job and subsequent transformation, should be visible in the bucket with keys &lt;code&gt;movie-lens/batch/results/inference/users.json.out&lt;/code&gt; and &lt;br&gt;
&lt;code&gt;movie-lens/batch/results/inference/transformed.parquet&lt;/code&gt; respectively. These have also been copied and stored &lt;a href="https://github.com/ryankarlos/AWS-ML-services/tree/master/datasets/personalize/ml-25m/batch/results" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    userId                                    Recommendations
0    15000  Kiss the Girls (1997) (Crime) | Scream (1996) ...
1   162540  Ice Age 2: The Meltdown (2006) (Adventure) | I...
2     5000  Godfather, The (1972) (Crime) | Star Wars: Epi...
3       94  Jumanji (1995) (Adventure) | Nell (1994) (Dram...
4     4638  Inglourious Basterds (2009) (Action) | Watchme...
5     9000  Die Hard 2 (1990) (Action) | Lethal Weapon 2 (...
6      663  Crow, The (1994) (Action) | Nightmare Before C...
7     1030  Sister Act (1992) (Comedy) | Lethal Weapon 4 (...
8     3384  Ocean's Eleven (2001) (Crime) | Matrix, The (1...
9    34567  Lord of the Rings: The Fellowship of the Ring,...
10      50  Grand Budapest Hotel, The (2014) (Comedy) | He...
11   80000  Godfather: Part II, The (1974) (Crime) | One F...
12  110000  Manhattan (1979) (Comedy) | Raging Bull (1980)...
13      13  Knocked Up (2007) (Comedy) | Other Guys, The (...
14   20000  Sleepless in Seattle (1993) (Comedy) | Four We...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Creating a Campaign for realtime recommendations
&lt;/h3&gt;

&lt;p&gt;A campaign is a deployed solution version (trained model) with provisioned dedicated transaction capacity for creating real-time recommendations for your application users.  After you complete Preparing and importing data and Creating a solution, you are ready to deploy your solution version by creating an &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html" rel="noopener noreferrer"&gt;AWS Personalize Campaign&lt;/a&gt;.If you are getting batch recommendations, you don't need to create a campaign.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;python projects/personalize/deploy_solution.py &lt;span class="nt"&gt;--campaign_name&lt;/span&gt; MoviesCampaign &lt;span class="nt"&gt;--sol_version_arn&lt;/span&gt; &amp;lt;solution_version_arn&amp;gt; &lt;span class="nt"&gt;--mode&lt;/span&gt; create

2022-07-09 21:12:08,412 - deploy - INFO - Name: MoviesCampaign
2022-07-09 21:12:08,412 - deploy - INFO - ARN: arn:aws:personalize:........:campaign/MoviesCampaign
2022-07-09 21:12:08,412 - deploy - INFO - Status: CREATE PENDING
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An additional arg &lt;code&gt;--config&lt;/code&gt; can be passed, to set the &lt;strong&gt;explorationWeight&lt;/strong&gt; and &lt;strong&gt;explorationItemAgeCutOff&lt;/strong&gt; parameters for the &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-new-item-USER_PERSONALIZATION.html#bandit-hyperparameters" rel="noopener noreferrer"&gt;User Personalizaion Recipe&lt;/a&gt;. These parameters default to 0.3 and 30.0 respectively if not passed (as in previous example)&lt;br&gt;
To set the explorationWeight and ItemAgeCutoff to 0.6 and 100 respectively, run the script as below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;python projects/personalize/deploy_solution.py &lt;span class="nt"&gt;--campaign_name&lt;/span&gt; MoviesCampaign &lt;span class="nt"&gt;--sol_version_arn&lt;/span&gt; &amp;lt;solution_version_arn&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--config&lt;/span&gt; &lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;itemExplorationConfig&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;explorationWeight&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;0.6&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;explorationItemAgeCutOff&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;100&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}}"&lt;/span&gt; &lt;span class="nt"&gt;--mode&lt;/span&gt; create

2022-07-09 21:12:08,412 - deploy - INFO - Name: MoviesCampaign
2022-07-09 21:12:08,412 - deploy - INFO - ARN: arn:aws:personalize:........:campaign/MoviesCampaign
2022-07-09 21:12:08,412 - deploy - INFO - Status: CREATE PENDING
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Setting up API Gateway with Lambda Proxy Integration
&lt;/h3&gt;

&lt;p&gt;You can also get &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/getting-real-time-recommendations.html" rel="noopener noreferrer"&gt;real-time recommendations&lt;/a&gt; from Amazon Personalize with a campaign created earlier to give movie recommendations.To increase recommendation relevance, include contextual metadata for a user, such as their device type or the time of day, when you get recommendations or get a personalized ranking. The API Gateway integration with lambda backend, should already be configured if CloudFormation was run successfully. We have configured the method request to accept a querystring parameter &lt;em&gt;user_id&lt;/em&gt; and defined model schema. An API method can be integrated with Lambda using one of two &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-integrations.html" rel="noopener noreferrer"&gt;integration methods&lt;/a&gt;: &lt;em&gt;Lambda proxy integration&lt;/em&gt; or &lt;em&gt;Lambda non-proxy (custom) integration&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;By default, we use &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-proxy-integrations.html" rel="noopener noreferrer"&gt;Lambda Proxy Integration&lt;/a&gt; when creating the resource in CloudFormation, which allows the client to call a single lambda function in the backend. When a client submits a request, API Gateway sends the raw request to lambda without necessarily preserving the order of the parameters. This request data includes the request headers, query string parameters, URL path variables, payload, and API configuration data as detailed &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-proxy-integrations.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We could also use &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-custom-integrations.html" rel="noopener noreferrer"&gt;Lambda non-proxy integration&lt;/a&gt; by setting the template parameter &lt;em&gt;APIGatewayIntegrationType&lt;/em&gt; to AWS. The difference to the Proxy Integration method is that in addition, we also need to configure a mapping template to map the incoming request data  to the integration request, as required by the backend Lambda function. In the CloudFormation template &lt;em&gt;personalize_predict.yaml&lt;/em&gt;, this is already predefined in the &lt;em&gt;RequestTemplates&lt;/em&gt; property of the &lt;em&gt;ApiGatewayRootMethod&lt;/em&gt; resource, which  translates the user_id query string parameter to the user_id property of the JSON payload. This is necessary because input to a Lambda function in the Lambda function must be expressed in the body. However, as the default type is set to &lt;em&gt;AWS_PROXY&lt;/em&gt;, the mapping template is ignored as it is not required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4vs6ohlmu63ypbjfyjq2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4vs6ohlmu63ypbjfyjq2.png" alt="api-gateway-get-method-execution" width="800" height="175"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The API endpoint URL to be invoked should be visible from the console, under the stage tab.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faqwbuqsqfkmqhczvpkik.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faqwbuqsqfkmqhczvpkik.png" alt="api-gateway-dev-stage-console" width="800" height="617"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Invocation and Monitoring with AWS X-Ray
&lt;/h3&gt;

&lt;p&gt;The API can be tested by opening a browser and typing the URL  into a browser address bar along with the querystring parameters. For example, &lt;em&gt;&lt;a href="https://knmel67a1g.execute-api.us-east-1.amazonaws.com/dev?user_id=5" rel="noopener noreferrer"&gt;https://knmel67a1g.execute-api.us-east-1.amazonaws.com/dev?user_id=5&lt;/a&gt;&lt;/em&gt;, will generate recommendations for user with id 5.&lt;br&gt;
For monitoring, we have also configured API gateway to send traces to X-Ray and logs to CloudWatch. Since the API is integrated with a single lambda function, you will see nodes in the service map containing information about the overall time spent and other performance metrics in the API Gateway service, the Lambda service, and the Lambda function. The timeline shows the hierarchy of segments and subsegments.  Further details on request/response times and faults/errors can be found by clicking on each segment/subsegment in the timeline. For further information, refer to following &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-using-xray-maps.html" rel="noopener noreferrer"&gt;AWS documentation&lt;/a&gt; on using AWS X-Ray service maps and trace views with API Gateway.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsu9jsvlz130f37w2lhtz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsu9jsvlz130f37w2lhtz.png" alt="Xrayconsole-APIGateway-lambda" width="800" height="165"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgvt0je7yunwq0nt9oenc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgvt0je7yunwq0nt9oenc.png" alt="Xrayconsole-trace-timeline" width="800" height="209"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>api</category>
      <category>aws</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>What movie to watch next ? Amazon Personalize to the rescue - Part 1</title>
      <dc:creator>Ryan Nazareth</dc:creator>
      <pubDate>Sun, 02 Oct 2022 20:58:30 +0000</pubDate>
      <link>https://forem.com/aws-builders/recommending-movies-with-amazon-personalize-part-1-4o7a</link>
      <guid>https://forem.com/aws-builders/recommending-movies-with-amazon-personalize-part-1-4o7a</guid>
      <description>&lt;p&gt;Amazon Personalize allows developers with no prior machine learning experience to easily build sophisticated personalization capabilities into their applications. With Personalize, you provide an activity stream from your application, as well as an inventory of the items you want to recommend, and Personalize will process the data to train a personalization model that is customized for your data. &lt;/p&gt;

&lt;p&gt;In this tutorial, we will be using the &lt;a href="https://movielens.org/" rel="noopener noreferrer"&gt;MovieLens dataset&lt;/a&gt; which is a popular dataset used for recommendation research. We will be using the &lt;a href="https://grouplens.org/datasets/movielens/" rel="noopener noreferrer"&gt;MovieLens 25M Dataset&lt;/a&gt; under the &lt;strong&gt;Recommended for new research&lt;/strong&gt; section. This contains 25 million ratings and one million tag applications applied to 62,000 movies by 162,000 users. The scripts and code referenced in this tutorial can be found in my &lt;a href="https://github.com/ryankarlos/AWS-ML-services/tree/master/projects/personalize" rel="noopener noreferrer"&gt;github&lt;/a&gt; repository.&lt;/p&gt;

&lt;p&gt;Download the respective zip file and navigate to the folder where the zip is stored and run the unzip command. You may need to install the unzip package if not already installed from this &lt;a href="https://www.tecmint.com/install-zip-and-unzip-in-linux/" rel="noopener noreferrer"&gt;link&lt;/a&gt;. For example on ubuntu &lt;code&gt;sudo apt-get install -y unzip&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;datasets/personalize
&lt;span class="nv"&gt;$ &lt;/span&gt;unzip ml-25m.zip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Important Note on Pricing
&lt;/h2&gt;

&lt;p&gt;Depending on the personalize recipe used and the size of the dataset, this can result in a large bill when training a personalize solution. I learnt this the hard way by not reading the AWS Personalize billing docs properly which resulted in this exercise &lt;strong&gt;costing me over $100&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;So I thought I would share the ways in which one could mitigate this and what to look out for when configuring the training solution.&lt;/p&gt;

&lt;p&gt;For the purpose of this tutorial, I have used the MovieLens 25M Dataset. However, one could sample a smaller dataset from this or use the MovieLens Latest Datasets recommended for education and development, which is a lot smaller (100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users). &lt;/p&gt;

&lt;p&gt;Secondly, It should be noted that if the model complexity needs better configuration, AWS Personalize &lt;strong&gt;will automatically scale up&lt;/strong&gt; to suitable instance. This means that more compute resource will be used to complete your jobs faster and hence result in a larger bill.&lt;/p&gt;

&lt;p&gt;The training hours can be broken down to the following components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A training hour represents 1 hour of compute capacity using 4v CPUs and 8 GiB memory&lt;/li&gt;
&lt;li&gt;The number of training jobs created for the solution if HPO is enabled&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If one has enabled hyperparameter optimization (HPO) or tuning, the optimal hyperparameters are determined by running many training jobs using different values from the specified ranges of possibilities as described in the &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/customizing-solution-config-hpo.html#hpo-tuning" rel="noopener noreferrer"&gt;docs&lt;/a&gt;. In this tutorial, I have used HPO tuning with the following configuration for the training solution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"hpoResourceConfig"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"maxNumberOfTrainingJobs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"16"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"maxParallelTrainingJobs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"8"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;maxNumberOfTrainingJobs&lt;/code&gt; indicates that you have maximum training job set to 16. Each will need own resource. In another words, the 560 training hours are a result of 16 training jobs as well as larger compute resource.&lt;/p&gt;

&lt;p&gt;I was wondering how to reduce the cost for future solutions, so i contacted AWS Technical Support. They recommended the following:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;Disable HPO, if you want to tune your model, build something that works first then optimise later. One can check that HPO is enabled or disabled by running &lt;code&gt;aws personalize describe-solution &amp;lt;arn&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Go over our &lt;a href="https://github.com/aws-samples/amazon-personalize-samples/blob/master/PersonalizeCheatSheet2.0.md" rel="noopener noreferrer"&gt;cheat sheet&lt;/a&gt; provided by the Service Team.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;There is also no way to force AWS Personalize to use a specific instance type, when it scales to complete the training job faster. The best cost optimisation method would be to turn off HPO in this scenario. Once the model is trained, there is no extra cost for keeping the model active after training. The &lt;code&gt;ACTIVE&lt;/code&gt; status will only be shown when training is complete and it does not necessarily mean the training is &lt;code&gt;ACTIVE&lt;/code&gt; as described in the &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/creating-a-solution-version.html" rel="noopener noreferrer"&gt;docs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Loading data into S3
&lt;/h2&gt;

&lt;p&gt;Create an S3 bucket named &lt;code&gt;recommendation-sample-data&lt;/code&gt; and run the following command in the cli to enable &lt;strong&gt;Transfer Acceleration&lt;/strong&gt; for the bucket.All Amazon S3 requests made by s3 and s3api AWS CLI commands can now be directed to the accelerate endpoint: &lt;code&gt;s3-accelerate.amazonaws.com&lt;/code&gt;. We also need to set the configuration value. &lt;code&gt;use_accelerate_endpoint&lt;/code&gt; to &lt;code&gt;true&lt;/code&gt; in a profile in the  AWS Config file.  For further details, please consult the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration-examples.html" rel="noopener noreferrer"&gt;AWS docs&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;aws s3api put-bucket-accelerate-configuration &lt;span class="nt"&gt;--bucket&lt;/span&gt; recommendation-sample-data &lt;span class="nt"&gt;--accelerate-configuration&lt;/span&gt; &lt;span class="nv"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Enabled
&lt;span class="nv"&gt;$ &lt;/span&gt;aws configure &lt;span class="nb"&gt;set &lt;/span&gt;default.s3.use_accelerate_endpoint &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In addition, to &lt;strong&gt;Transfer Acceleration&lt;/strong&gt;, this &lt;a href="https://aws.amazon.com/premiumsupport/knowledge-center/s3-upload-large-files/" rel="noopener noreferrer"&gt;AWS article&lt;/a&gt; recommends using the CLI for uploads for large file sizes, as it automatically performs &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html" rel="noopener noreferrer"&gt;multipart uploading&lt;/a&gt; when the file size is large. We can also set the maximum concurrent number of requests to 20 to use more of the host's bandwidth and resources during the upload. By default, the AWS CLI uses 10 maximum concurrent requests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;aws configure &lt;span class="nb"&gt;set &lt;/span&gt;default.s3.max_concurrent_requests 20
&lt;span class="nv"&gt;$ &lt;/span&gt;aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;datasets/personalize/ml-25m/ s3://recommendation-sample-data/movie-lens/raw_data/ &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="nt"&gt;--recursive&lt;/span&gt; &lt;span class="nt"&gt;--endpoint-url&lt;/span&gt; https://recommendation-sample-data.s3-accelerate.amazonaws.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ua4srlsioha2og3pvu0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ua4srlsioha2og3pvu0.png" alt="s3-data-upload" width="800" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally we need to add the glue script and lambda function to S3 bucket as well. This assumes the lambda function is zipped as in &lt;code&gt;lambdas/data_import_personalize.zip&lt;/code&gt; and you have a bucket with key &lt;code&gt;aws-glue-assets-376337229415-us-east-1/scripts&lt;/code&gt;. If not adapt the query accordingly. Run the following commands from root of the repo&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;step_functions/personalize-definition.json s3://recommendation-sample-data/movie-lens/personalize-definition.json
&lt;span class="nv"&gt;$ &lt;/span&gt;aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;lambdas/trigger_glue_personalize.zip s3://recommendation-sample-data/movie-lens/lambda/trigger_glue_personalize.zip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you have not configured transfer acceleration for the default glue assets bucket, then you can set to false before running &lt;code&gt;cp&lt;/code&gt; command as below. Otherwise, you will get the following error: &lt;/p&gt;

&lt;p&gt;&lt;em&gt;An error occurred (InvalidRequest) when calling the PutObject operation: S3 Transfer Acceleration is not configured on this bucket&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;aws configure &lt;span class="nb"&gt;set &lt;/span&gt;default.s3.use_accelerate_endpoint &lt;span class="nb"&gt;false&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;projects/personalize/glue/Personalize_Glue_Script.py s3://aws-glue-assets-376337229415-us-east-1/scripts/Personalize_Glue_Script.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CloudFormation Templates
&lt;/h2&gt;

&lt;p&gt;The CloudFormation template for creating the resources for this example is located in this &lt;a href="https://github.com/ryankarlos/AWS-ML-services/tree/master/cloudformation" rel="noopener noreferrer"&gt;folder&lt;/a&gt;. The CloudFormation template &lt;code&gt;personalize.yaml&lt;/code&gt; creates the following resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Glue Job &lt;/li&gt;
&lt;li&gt;Personalize resources (Dataset, DatasetGroup, Schema) and associated Role &lt;/li&gt;
&lt;li&gt;Step Function for orchestrating the Glue and Personalize DatasetImport Jobs and creating a Personalize Solution&lt;/li&gt;
&lt;li&gt;Lambda function and associated Role, for triggering step function execution with S3 event notification.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can use the following cli command to create the template, with the path to the template passed to the &lt;code&gt;--template-body&lt;/code&gt;argument. Adapt this depending on where your template is stored. We also need to include the &lt;code&gt;CAPABILITIES_NAMED_IAM&lt;/code&gt; value to &lt;code&gt;--capabilities&lt;/code&gt; argument as the template includes IAM resources (e.g. IAM role) which has a custom name such as a &lt;code&gt;RoleName&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt; &lt;span class="nv"&gt;$ &lt;/span&gt;aws cloudformation create-stack &lt;span class="nt"&gt;--stack-name&lt;/span&gt; PersonalizeGlue &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--template-body&lt;/span&gt; file://cloudformation/personalize.yaml &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--capabilities&lt;/span&gt; CAPABILITY_NAMED_IAM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If successful, we should see the following resources successfully created in the resources tab.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4az38c5145lwdagmhec6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4az38c5145lwdagmhec6.png" alt="cloud_formation_parameters_tab" width="800" height="574"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we run the command as above, just using the default parameters, we should see the key value pairs listed in the parameters tab as in screenshot below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmw656fu2141b9wzs5j98.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmw656fu2141b9wzs5j98.png" alt="cloud_formation_resources_tab" width="800" height="303"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We should see that all the services should be created. For example navigate to the Step function console and click on the step function name &lt;code&gt;GlueETLPersonalizeTraining&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc5um2b302procacchzh8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc5um2b302procacchzh8.png" alt="step_function_diagram" width="778" height="1094"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F048wrqdgtzb8ieahichp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F048wrqdgtzb8ieahichp.png" alt="definition_file" width="800" height="1114"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  S3 event notifications
&lt;/h2&gt;

&lt;p&gt;We need to configure S3 event notifications for the training and prediction workflows. For the Training workflow, we need to configure s3 to lambda notification when raw data is loaded into S3, to trigger the step function execution. For the prediction workflow (batch and realtime), the following configurations are required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S3 to Lambda notification for triggering Personalize Batch Job when batch sample data object but into S3 bucket prefix&lt;/li&gt;
&lt;li&gt;S3 to Lambda notification for triggering lambda to transform output of batch job added to S3. &lt;/li&gt;
&lt;li&gt;S3 notification to SNS topic, when the output of lambda transform lands in S3 bucket. We have configured email as subscriber to SNS via protocol set as email endpoint, via Cloudformation. The SNS messages will then send email to subscriber address when event message received from S3.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To add bucket event notification for starting the training workflow via step functions, run the custom script and passing arg &lt;code&gt;--workflow&lt;/code&gt; with value &lt;code&gt;train&lt;/code&gt;. By default, this will send S3 event when csv file is uploaded into &lt;code&gt;movie-lens/batch/results/&lt;/code&gt; prefix in the bucket.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;python projects/personalize/put_notification_s3.py &lt;span class="nt"&gt;--workflow&lt;/span&gt; train

INFO:botocore.credentials:Found credentials &lt;span class="k"&gt;in &lt;/span&gt;shared credentials file: ~/.aws/credentials
INFO:__main__:Lambda arn arn:aws:lambda:........:function:LambdaSFNTrigger &lt;span class="k"&gt;for function &lt;/span&gt;LambdaSFNTrigger
INFO:__main__:HTTPStatusCode: 200
INFO:__main__:RequestId: X6X9E99JE13YV6RH
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To add bucket event notification for batch/realtime predictions run the script and pass &lt;code&gt;--workflow&lt;/code&gt; with value &lt;code&gt;predict&lt;/code&gt;. The default prefixes set for the object event triggers for s3 to lambda and s3 to sns notification, can be found in the &lt;a href="https://github.com/ryankarlos/AWS-MLservices/blob/master/projects/personalize/put_notification_s3.py" rel="noopener noreferrer"&gt;source code&lt;/a&gt;. These can be overridden by passing the respective argument names (see click options in the&lt;a href="https://github.com/ryankarlos/AWS-ML%20services/blob/master/projects/personalize/put_notification_s3.py" rel="noopener noreferrer"&gt;source code&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;python projects/personalize/put_notification_s3.py &lt;span class="nt"&gt;--workflow&lt;/span&gt; predict

INFO:botocore.credentials:Found credentials &lt;span class="k"&gt;in &lt;/span&gt;shared credentials file: ~/.aws/credentials
INFO:__main__:Lambda arn arn:aws:lambda:us-east-1:376337229415:function:LambdaBatchTrigger &lt;span class="k"&gt;for function &lt;/span&gt;LambdaBatchTrigger
INFO:__main__:Lambda arn arn:aws:lambda:us-east-1:376337229415:function:LambdaBatchTransform &lt;span class="k"&gt;for function &lt;/span&gt;LambdaBatchTransform
INFO:__main__:Topic arn arn:aws:sns:us-east-1:376337229415:PersonalizeBatch &lt;span class="k"&gt;for &lt;/span&gt;PersonalizeBatch
INFO:__main__:HTTPStatusCode: 200
INFO:__main__:RequestId: Q0BCATSW52X1V299
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: There is currently no support for notifications to FIFO type SNS topics. &lt;/p&gt;

&lt;h2&gt;
  
  
  Trigger Workflow for Training Solution
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26hxvd628lnqtfy324tw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26hxvd628lnqtfy324tw.png" alt="personalize_train" width="800" height="620"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The lambda function and step function resources in the workflow should already be created via Cloudformation. We will trigger the workflow, by uploading the raw dataset into S3 path, for which S3 event notification is configured to trigger lambda and invoke the function. This will execute the state machine, which will run all the steps defined in the definition file.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftmu24fcuff0bfne49yk1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftmu24fcuff0bfne49yk1.png" alt="execution-input" width="800" height="265"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98bp1gukmwmom9sdb2gs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98bp1gukmwmom9sdb2gs.png" alt="sfn-diagram" width="732" height="814"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewymricvco2jhde9erm1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewymricvco2jhde9erm1.png" alt="step-output" width="800" height="352"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Firstly, it will run the glue job to transform the  raw data to required schema and format for &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/interactions-dataset-requirements.html" rel="noopener noreferrer"&gt;importing interactions dataset&lt;/a&gt; into Personalize. The outputs from the glue job are stored in a different S3 folder to the raw data. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flas3e27n2j9dzt6chji7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flas3e27n2j9dzt6chji7.png" alt="s3_glue_output" width="800" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It will then import the interactions dataset into the Personalize. A custom dataset group resource and interactions dataset is already created and defined, when creating the Cloudformation stack.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhp9qbqz15ick20ae20k0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhp9qbqz15ick20ae20k0.png" alt="interactions-dataset-console" width="800" height="973"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Wait for solution version to print an &lt;strong&gt;ACTIVE&lt;/strong&gt; status. Training can take a while, depending on the dataset size and number of user-item interactions. If using AutoMl this can take longer. The training time (hrs) value is  based on 1 hr of compute capacity (default is 4CPU and 8GiB memory). However, as discussed in &lt;strong&gt;Pricing section&lt;/strong&gt; of this blog,  AWS Personalize automatically chooses a more efficient instance type to train the data in order to complete the job more quickly. In this case, the training hours metric computed will be adjusted and increased, resulting in a larger bill.&lt;/p&gt;

&lt;h3&gt;
  
  
   Analysing Traces and Debugging with AWS X-Ray
&lt;/h3&gt;

&lt;p&gt;To diagnose any faults in execution, we can look at the x ray traces and logs. You can now view the service map within the Amazon CloudWatch console. Open the CloudWatch console and choose Service map under X-Ray traces from the left navigation pane. The service map indicates the health of each node by colouring it based on the ratio of successful calls to errors and faults. Each AWS resource that sends data to X-Ray appears as a service in the graph. Edges connect the services that work together to serve requests as detailed &lt;a href="https://docs.aws.amazon.com/xray/latest/devguide/xray-concepts.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;. In the center of each node, the console shows the average response time and number of traces that it sent per minute during the chosen time range. A trace collects all the segments generated by a single request.&lt;/p&gt;

&lt;p&gt;Choose a service node to view requests for that node, or an edge between two nodes to view requests that traversed that connection. The service map disassociates the workflow into two trace ids, for every request, with the following groups of segments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lambda service and function segments&lt;/li&gt;
&lt;li&gt;step function, glue, personalize segments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdh0ncve87fnd4in2dzg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdh0ncve87fnd4in2dzg.png" alt="train_app_service_map_xray" width="800" height="746"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyxea53a1xk0ungbp95vw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyxea53a1xk0ungbp95vw.png" alt="latency-metric" width="800" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8yly9pwy19ckemgl7crh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8yly9pwy19ckemgl7crh.png" alt="request-metrics" width="800" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F534s1if0mqwi1qoecnh0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F534s1if0mqwi1qoecnh0.png" alt="Image description" width="800" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also choose a trace ID to view the trace map and timeline for a trace.  The Timeline view shows a hierarchy of segments and subsegments. The first entry in the list is the segment, which represents all data recorded by the service for a single request. Below the segment are subsegments. This example shows subsegments recorded by Lambda segments. Lambda records a segment for the Lambda service that handles the invocation request, and one for the work done by the function as described &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/services-xray.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;. The function segment comes with subsegments for the following phases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Initialization phase&lt;/strong&gt;: Lambda execution environment is initialised. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invocation phase&lt;/strong&gt;: function handler is invoked.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overhead phase&lt;/strong&gt;: dwell time between sending the response and the signal for the next invoke.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxxdif7vvb9gbj643pkp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxxdif7vvb9gbj643pkp.png" alt="xray_traces_sfn_1" width="800" height="235"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3sd2vcab466o7bqsic60.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3sd2vcab466o7bqsic60.png" alt="xray-sfn-duration" width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For step functions, we can see the various subsegments corresponding to the different states in the state machine.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5dl7o7te8d74hm1uvx3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5dl7o7te8d74hm1uvx3.png" alt="xray_traces_sfn_2" width="800" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluating solution metrics
&lt;/h2&gt;

&lt;p&gt;You can use &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html" rel="noopener noreferrer"&gt;offline metrics&lt;/a&gt; to evaluate the performance of the trained model before you create a campaign and provide recommendations. Offline metrics allow you to view the effects of modifying a solution's hyperparameters or compare results from models trained with the same data. To get performance metrics, Amazon Personalize splits the input interactions data into a training set and a testing set. The split depends on the type of recipe you choose. For &lt;strong&gt;USER_SEGMENTATION&lt;/strong&gt; recipes, the training set consists of 80% of each user's interactions data and the testing set consists of 20% of each user's interactions data.For all other recipe types, the training set consists of 90% of your users and their interactions data. The testing set consists of the remaining 10% of users and their interactions data.&lt;/p&gt;

&lt;p&gt;Amazon Personalize then creates the solution version using the training set. After training completes, Amazon Personalize gives the new solution version the oldest 90% of each user’s data from the testing set as input. Amazon Personalize then calculates metrics by comparing the recommendations the solution version generates to the actual interactions in the newest 10% of each user’s data from the testing set as described &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcc34l4q09ppk4prrrxku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcc34l4q09ppk4prrrxku.png" alt="personalize_solution_user-personalization_recipe_with_HPO" width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You retrieve the metrics for a the trained solution version above, by running the following script, which calls the GetSolutionMetrics operation with the &lt;code&gt;solutionVersionArn&lt;/code&gt; parameter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python projects/personalize/evaluate_solution.py &lt;span class="nt"&gt;--solution_version_arn&lt;/span&gt; &amp;lt;solution-version-arn&amp;gt;

2022-07-09 20:31:24,671 - evaluate - INFO - Solution version status: ACTIVE
2022-07-09 20:31:24,787 - evaluate - INFO - Metrics:

 &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;'coverage'&lt;/span&gt;: 0.1233, &lt;span class="s1"&gt;'mean_reciprocal_rank_at_25'&lt;/span&gt;: 0.1208, &lt;span class="s1"&gt;'normalized_discounted_cumulative_gain_at_10'&lt;/span&gt;: 0.1396, &lt;span class="s1"&gt;'normalized_discounted_cumulative_gain_at_25'&lt;/span&gt;: 0.1996, &lt;span class="s1"&gt;'normalized_discounted_cumulative_gain_at_5'&lt;/span&gt;: 0.1063, &lt;span class="s1"&gt;'precision_at_10'&lt;/span&gt;: 0.0367, &lt;span class="s1"&gt;'precision_at_25'&lt;/span&gt;: 0.0296, &lt;span class="s1"&gt;'precision_at_5'&lt;/span&gt;: 0.0423&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The metrics above are summarised from the &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html" rel="noopener noreferrer"&gt;AWS docs&lt;/a&gt; below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;coverage&lt;/strong&gt;: An evaluation metric that tells you the proportion of unique items that Amazon Personalize might recommend using your model out of the total number of unique items in Interactions and Items datasets. To make sure Amazon Personalize recommends more of your items, use a model with a higher coverage score. Recipes that feature item exploration, such as &lt;strong&gt;User-Personalization&lt;/strong&gt;, have higher coverage than those that do not, such as popularity-count &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;mean reciprocal rank at 25&lt;/strong&gt;: An evaluation metric that assesses the relevance of a model’s highest ranked recommendation. Amazon Personalize calculates this metric using the average accuracy of the model when ranking the most relevant recommendation out of the top 25 recommendations over all requests for recommendations. This metric is useful if you're interested in the single highest ranked recommendation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;normalized discounted cumulative gain (NCDG) at K&lt;/strong&gt;: An evaluation metric that tells you about the relevance of your model’s highly ranked recommendations, where K is a sample size of 5, 10, or 25 recommendations. Amazon Personalize calculates this by assigning weight to recommendations based on their position in a ranked list, where each recommendation is discounted (given a lower weight) by a factor dependent on its position. The normalized discounted cumulative gain at K assumes that recommendations that are lower on a list are less relevant than recommendations higher on the list. Amazon Personalize uses a weighting factor of &lt;em&gt;1/log(1 + position)&lt;/em&gt;, where the top of the list is position 1. This metric rewards relevant items that appear near the top of the list, because the top of a list usually draws more attention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;precision at K&lt;/strong&gt;: An evaluation metric that tells you how relevant your model’s recommendations are based on a sample size of K (5, 10, or 25) recommendations. Amazon Personalize calculates this metric based on the number of relevant recommendations out of the top K recommendations, divided by K, where K is 5, 10, or 25. This metric rewards precise recommendation of the relevant items as described &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/aws-builders/what-movie-to-watch-next-amazon-personalize-to-the-rescue-part-2-2pj9"&gt;second part&lt;/a&gt; of this blog, we will create a campaign with the deployed solution version and set up API Gateway with Lambda for the generating real time. &lt;/p&gt;

</description>
      <category>aws</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>Setting up AWS Code Pipeline to automate deployment of tweets streaming application</title>
      <dc:creator>Ryan Nazareth</dc:creator>
      <pubDate>Sat, 17 Sep 2022 16:07:03 +0000</pubDate>
      <link>https://forem.com/aws-builders/setting-up-aws-code-pipeline-to-automate-deployment-of-tweets-streaming-application-m7o</link>
      <guid>https://forem.com/aws-builders/setting-up-aws-code-pipeline-to-automate-deployment-of-tweets-streaming-application-m7o</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;In this tutorial, we will configure &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/concepts.html" rel="noopener noreferrer"&gt;AWS CodePipeline&lt;/a&gt; to build an ECR image and deploy the latest version to lambda container. The application code will stream tweets using &lt;a href="https://www.tweepy.org/" rel="noopener noreferrer"&gt;Tweepy&lt;/a&gt;, a Python library for accessing the Twitter API. First we need to setup CodePipeline and the various stages to deploy application code to lambda image which will stream tweets when invoked. The intended devops architecture is as below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ca951z24elp5yihgkeu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ca951z24elp5yihgkeu.png" alt="architecture_tweets_deploy_lambda-container" width="800" height="623"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Typically, a CodePipeline job contains the following stages: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Source&lt;/strong&gt;: In this step the latest version of our source code will be fetched from our repository and uploaded to an S3 bucket. The application source code is maintained in a repository configured as a GitHub source action in the pipeline. When developers push commits to the repository, CodePipeline detects the pushed change, and a pipeline execution starts from the Source Stage. The GitHub source action completes successfully (that is, the latest changes have been downloaded and stored to the artifact bucket unique to that execution). The output artifacts produced by the GitHub source action, which are the application files from the repository, are then used as the input artifacts to be worked on by the actions in the next stage. This is described in more detail in the &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/welcome-introducing-artifacts.html" rel="noopener noreferrer"&gt;AWS docs&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build&lt;/strong&gt;: During this step we will use this uploaded source code and automate our manual packaging step using a &lt;a href="https://docs.aws.amazon.com/codebuild/latest/userguide/concepts.html" rel="noopener noreferrer"&gt;CodeBuild&lt;/a&gt; project. The build task pulls a build environment image and builds the application in a virtual container.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unit Test&lt;/strong&gt;: The next action can be a unit test project created in CodeBuild and configured as a test action in the pipeline. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deploy to Dev/Test Environment&lt;/strong&gt;: This deploys the application to a dev/test env environment using &lt;a href="https://docs.aws.amazon.com/codedeploy/latest/userguide/welcome.html" rel="noopener noreferrer"&gt;CodeDeploy&lt;/a&gt; or another action provider such as CloudFormation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integration Test&lt;/strong&gt;: This runs end to end Integration testing project created in CodeBuild and configured as a test action in the pipeline.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deploy to Production Environment&lt;/strong&gt;: This deploys the application to a production environment. Could configure the pipeline so this stage requires &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/approvals-action-add.html" rel="noopener noreferrer"&gt;manual approval&lt;/a&gt; to execute.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The stages described above are just an example and could be fewer or more depending on the application. For example, we could have more environments for testing before deploying to production. For the rest of the blog, we will describe how some of these stages can be configured for deploying the twitter streaming application. All source code referenced in the rest of the article can be accessed &lt;a href="https://github.com/ryankarlos/AWS-CICD" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Application code for Streaming Tweets
&lt;/h3&gt;

&lt;p&gt;First, you will need to sign up for a Twitter Developer account and create a project and associated developer application. Then generate the following credentials: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API Key and Secret&lt;/strong&gt;:  Username and password for the App&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access Token and Secret&lt;/strong&gt;: Represent the user that owns the app and will be used for authenticating requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These steps are explained in more detail &lt;a href="https://developer.twitter.com/en/docs/twitter-api/getting-started/getting-access-to-the-twitter-api" rel="noopener noreferrer"&gt;here&lt;/a&gt;. We can then define a python function which requires these credentials as parameters to make requests to the API with OAuth 1.0a authentication. This will require the latest version of &lt;strong&gt;tweepy&lt;/strong&gt; to be installed &lt;code&gt;pip install tweepy&lt;/code&gt;.The &lt;code&gt;event&lt;/code&gt; parameter will be the payload with keys &lt;code&gt;keyword&lt;/code&gt; and &lt;code&gt;duration&lt;/code&gt; to determine the keyword to search for and the duration the stream should last for.  For example, to stream tweets containing keyword &lt;code&gt;machine learning&lt;/code&gt; for 30 seconds, the payload would be &lt;code&gt;{"keyword": "machine learning", "duration": 30}&lt;/code&gt;. The code also excludes any retweets to reduce noise.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tweepy&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tweepy_search_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;consumer_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;consumer_secret&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;access_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;access_secret&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tweepy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OAuth1UserHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;consumer_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;consumer_secret&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;access_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;access_secret&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;time_limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;duration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tweepy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;API&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_on_rate_limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tweepy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Cursor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;search_tweets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extended&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;time_limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;time_limit&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; seconds time limit reached, so disconnecting stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tweets streamed ! &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
                &lt;span class="n"&gt;dt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;
                &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;day&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;day&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;month&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%H:%M:%S&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;handle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;screen_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;favourite_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;favourites_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retweet_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retweet_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retweeted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retweeted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;followers_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;followers_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;friends_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;friends_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lang&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since we will be using AWS for doing this, we can store the credentials in &lt;strong&gt;AWS Secrets Manager&lt;/strong&gt; rather than having them defined in the code or passing them as environment variables for greater security. We can use boto sdk for creating a client session for &lt;strong&gt;Secrets Manager&lt;/strong&gt; and accessing the secrets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_secrets&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
   &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
   &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secretsmanager&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt;
   &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_secrets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Values&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;twitterkeys&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}])&lt;/span&gt;
   &lt;span class="n"&gt;arn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SecretList&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ARN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
   &lt;span class="n"&gt;get_secret_value_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; 
   &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_secret_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SecretId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;secret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_secret_value_response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SecretString&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
   &lt;span class="n"&gt;secret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Successfully retrieved secrets !&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;secret&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will also  be invoking lambda to call the Twitter API to stream tweets, so we will need to define another script to call the &lt;code&gt;tweepy_search_api&lt;/code&gt; and &lt;code&gt;get_secrets&lt;/code&gt; functions defined above in a lambda handler, as shown below. This assumes &lt;code&gt;tweepy_search_api&lt;/code&gt; and &lt;code&gt;get_secrets&lt;/code&gt; functions are defined in modules &lt;code&gt;secrets.py&lt;/code&gt; and &lt;code&gt;tweets_api&lt;/code&gt; respectively and in the same directory as the lambda function module. The snippet below parses the user parameters from the payload depending on whether the lambda function is invoked via CodePipeline action or invoked directly from the Lambda console or local machine (for testing purposes). If invoked via CodePipeline action, the &lt;strong&gt;UserParameters&lt;/strong&gt; key contains the parameters stored as string value, which need to be converted to a dictionary type programatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tweets_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tweepy_search_api&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;secrets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_secrets&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;itertools&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Event payload type: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Event:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CodePipeline.job&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cloud&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mode: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cloud&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CodePipeline.job&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actionConfiguration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configuration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UserParameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;job_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CodePipeline.job&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Params:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, JobID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Params:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;code_pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;codepipeline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_secrets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;api_keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;itertools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;islice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Searching and delivering Tweets with Tweepy API: &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;tweepy_search_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;api_keys&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cloud&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;code_pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_job_success_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jobId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Exception:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cloud&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;code_pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_job_failure_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;jobId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;failureDetails&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JobFailed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following &lt;a href="https://github.com/ryankarlos/AWS-CICD/tree/master/projects/deploy-lambda-image" rel="noopener noreferrer"&gt;folder&lt;/a&gt; contains all the code described above in addition to few more configuration files. These are &lt;code&gt;buildspec.yml&lt;/code&gt; and &lt;code&gt;Dockerfile&lt;/code&gt; for building the docker image (containing the application code) and pushing to ECR in the CodeBuild stage. These will be explained in more detail in the next sections.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating the resources with CloudFormation
&lt;/h3&gt;

&lt;p&gt;First, we will need to create the following resources with CloudFormation which will be referenced when configuring pipeline. &lt;strong&gt;Note&lt;/strong&gt; This assumes you already have an ECR repository named &lt;code&gt;tweepy-stream-deploy&lt;/code&gt;, which is referenced as a parameter in the CloudFormation template, although it can be overridden.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/ryankarlos/AWS-CICD/blob/master/cf-templates/CodeDeployLambdaTweepy.yaml" rel="noopener noreferrer"&gt;Lambda Function&lt;/a&gt; with URI reference to ECR repository&lt;/li&gt;
&lt;li&gt;Roles for &lt;a href="https://github.com/ryankarlos/AWS-CICD/blob/master/cf-templates/roles/RoleLambdaImageStaging.yaml" rel="noopener noreferrer"&gt;Lambda&lt;/a&gt;, &lt;a href="https://github.com/ryankarlos/AWS-CICD/blob/master/cf-templates/roles/CodepipelineRole.yaml" rel="noopener noreferrer"&gt;CodePipeline&lt;/a&gt; and &lt;a href="https://github.com/ryankarlos/AWS-CICD/blob/master/cf-templates/roles/CloudFormationRole.yaml" rel="noopener noreferrer"&gt;CloudFormation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Parameters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ECRRepoName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tweepy-stream-deploy"&lt;/span&gt;
    &lt;span class="na"&gt;Description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ECR repository for tweets application&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;String&lt;/span&gt;
&lt;span class="na"&gt;Resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;LambdaImageStaging&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AWS::Lambda::Function'&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;PackageType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Image&lt;/span&gt;
      &lt;span class="na"&gt;FunctionName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;codedeploy-staging"&lt;/span&gt;
      &lt;span class="na"&gt;Code&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;ImageUri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/${ECRRepoName}:latest'&lt;/span&gt;
      &lt;span class="na"&gt;Role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Fn::ImportValue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RoleLambdaImage-TwitterArn&lt;/span&gt;
      &lt;span class="na"&gt;Timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;302&lt;/span&gt;
      &lt;span class="na"&gt;MemorySize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1024&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;aws cloudformation create-stack &lt;span class="nt"&gt;--stack-name&lt;/span&gt; CodeDeployLambdaTweets &lt;span class="nt"&gt;--template-body&lt;/span&gt; file://cf-templates/CodeDeployLambdaTweepy.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create all the role resources using the templates &lt;a href="https://github.com/ryankarlos/AWS-CICD/tree/master/cf-templates/roles" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws cloudformation create-stack &lt;span class="nt"&gt;--stack-name&lt;/span&gt; RoleCloudFormationforCodeDeploy &lt;span class="nt"&gt;--template-body&lt;/span&gt; file://cf-templates/roles/CloudFormationRole.yaml

aws cloudformation create-stack &lt;span class="nt"&gt;--stack-name&lt;/span&gt; RoleCodePipeline &lt;span class="nt"&gt;--template-body&lt;/span&gt; file://cf-templates/roles/CodepipelineRole.yaml

aws cloudformation create-stack &lt;span class="nt"&gt;--stack-name&lt;/span&gt; RoleLambdaImage &lt;span class="nt"&gt;--template-body&lt;/span&gt; file://cf-templates/roles/RoleLambdaImageStaging.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can validate the Cloudformation templates before deploying by using &lt;a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-validate-template.html" rel="noopener noreferrer"&gt;validate-template&lt;/a&gt; to check the template file  for syntax errors. During validation, AWS CloudFormation first checks if the template is valid JSON. If it isn't, CloudFormation checks if the template is valid YAML. If both checks fail, CloudFormation returns a template &lt;a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-validate-template.html" rel="noopener noreferrer"&gt;validation error&lt;/a&gt;. Note that the &lt;code&gt;aws cloudformation validate-template&lt;/code&gt; command is designed to check only the syntax of your template. It does not ensure that the property values that you have specified for a resource are valid for that resource. Nor does it determine the number of resources that will exist when the stack is created.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating the CodePipeline Stages
&lt;/h3&gt;

&lt;p&gt;This will use the CodePipeline &lt;a href="https://github.com/ryankarlos/AWS-CICD/blob/master/cp-definitions/deploy-lambda-image.json" rel="noopener noreferrer"&gt;definition file&lt;/a&gt; to create the &lt;strong&gt;Source&lt;/strong&gt;, &lt;strong&gt;Build&lt;/strong&gt; and &lt;strong&gt;Deploy&lt;/strong&gt; stages via the AWS cli. Alternatively, one could also do this with CloudFormation but since I had initially created CodePipeline via the AWS console, I found it easier to generate the structure of the pipeline using &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/codepipeline/get-pipeline.html" rel="noopener noreferrer"&gt;get-pipeline&lt;/a&gt; from the cli and reuse the definition file the create the CodePipeline again in the future, which will be described in the next steps.&lt;/p&gt;

&lt;p&gt;CodePipeline also provide support for a number of actions as listed &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/actions.html" rel="noopener noreferrer"&gt;here&lt;/a&gt; which is part of a sequence in a stage, and is a task performed on an artifact. Codepipeline can integrate with a number of action providers such as CodeCommit, S3, Github, CodeBuild, Jenkins, CodeDeploy, CloudFormation, ECS etc in different stages of Source, Build, Test, Deploy. The full list of providers can be found in the &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/integrations-action-type.html" rel="noopener noreferrer"&gt;docs&lt;/a&gt;. We will be using CodeCommit &lt;strong&gt;Source&lt;/strong&gt; action, CodeBuild &lt;strong&gt;build&lt;/strong&gt; action to , CloudFormation &lt;strong&gt;Deploy&lt;/strong&gt; action and Lambda &lt;strong&gt;Invoke&lt;/strong&gt; action.&lt;/p&gt;

&lt;p&gt;Next, we will zip the &lt;a href="https://github.com/ryankarlos/AWS-CICD/tree/master/cf-templates" rel="noopener noreferrer"&gt;cf templates folder&lt;/a&gt;. This is required for the Deploy stage in CodePipeline, which will use [CloudFormation Actions] to update the roles for CloudFormation, CodePipeline and Lambda, if the templates have changed and update the existing Lambda resource with the latest image tag to deploy the updated application source code. These templates will need to be committed to &lt;strong&gt;CodeCommit&lt;/strong&gt; in the &lt;strong&gt;Source&lt;/strong&gt; stage and output as artifacts. We will copy this zipped folder to S3 and configure code pipeline in the &lt;a href="https://github.com/ryankarlos/AWS-CICD/blob/master/cp-definitions/deploy-lambda-image.json" rel="noopener noreferrer"&gt;definition file&lt;/a&gt; so that action in source stage reads from the s3 location of template file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;cf-templates 
&lt;span class="nv"&gt;$ &lt;/span&gt;zip template-source-artifacts.zip CodeDeployLambdaTweepy.yaml roles/&lt;span class="k"&gt;*&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;template-source-artifacts.zip s3://codepipeline-us-east-1-49345350114/lambda-image-deploy/template-source-artifacts.zip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="(https://github.com/ryankarlos/AWS-CICD/blob/master/cp-definitions/deploy-lambda-image.json)"&gt;definition json&lt;/a&gt; assumes CodePipeline role is created as described above. It is worth having a look at the file contents to understand the settings before we create the pipeline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CodeCommitSource"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"actionTypeId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Source"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AWS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CodeCommit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"runOrder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"configuration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"BranchName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"master"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"OutputArtifactFormat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CODE_ZIP"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"PollForSourceChanges"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"false"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"RepositoryName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deploy-lambda-image"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"outputArtifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CodeCommitSourceArtifact"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"inputArtifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"namespace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SourceVariables"&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a code commit repository named &lt;code&gt;deploy-lambda-image&lt;/code&gt; and configures the output artifact &lt;code&gt;CodeCommitSourceArtifact&lt;/code&gt; for the CodeCommit action. This is a ZIP file that contains the contents of the configured repository and branch at the commit specified as the source revision for the pipeline execution. We will later pass this artifact to the build stage. The next action in the source stage will be for loading the cloud formation templates zip file that we previously uploaded to the S3 bucket. This is an &lt;strong&gt;S3 Source Action&lt;/strong&gt; which creates a CloudWatch Events Rule when a new object is uploaded to a source bucket. More details about this can be found in the AWS docs &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/action-reference-S3.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CFTemplatesArtifact"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"actionTypeId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                      &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Source"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                      &lt;/span&gt;&lt;span class="nl"&gt;"owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AWS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"S3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                      &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"runOrder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"configuration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                     &lt;/span&gt;&lt;span class="nl"&gt;"PollForSourceChanges"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"false"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                      &lt;/span&gt;&lt;span class="nl"&gt;"S3Bucket"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"codepipeline-us-east-1-49345350114"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                      &lt;/span&gt;&lt;span class="nl"&gt;"S3ObjectKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lambda-image-deploy/template-source-artifacts.zip"&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"outputArtifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                        &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CFTemplatesArtifact"&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
                       &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"inputArtifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This action will also create an output artifact &lt;code&gt;CFTemplatesArtifact&lt;/code&gt; so we can pass this to a downstream code deploy stage. The build stage includes information about how to run a build, including where to get the source code, which build environment to use, which build commands to run, and where to store the build output. It uses the following &lt;a href="https://docs.aws.amazon.com/codebuild/latest/userguide/getting-started-cli-create-build-spec.html" rel="noopener noreferrer"&gt;buildspec.yml&lt;/a&gt; file which will be included when copying the source code in the next section, to run a build. This action uses the &lt;code&gt;CodeCommitSourceArtifact&lt;/code&gt; containing the application code which needs to be built.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Build"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Build-Tweepy-Stream"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"actionTypeId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Build"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AWS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CodeBuild"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"runOrder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"configuration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"BatchEnabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"false"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"EnvironmentVariables"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;IMAGE_REPO_NAME&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;value&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;tweepy-stream-deploy&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;type&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;PLAINTEXT&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;},{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;IMAGE_TAG&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;value&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;latest&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;type&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;PLAINTEXT&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;},{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;AWS_DEFAULT_REGION&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;value&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;us-east-1&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;type&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;PLAINTEXT&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;},{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;AWS_ACCOUNT_ID&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;value&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;[ACCT_ID]&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;type&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;PLAINTEXT&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"ProjectName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Build-Twitter-Stream"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"outputArtifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"inputArtifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CodeCommitSourceArtifact"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will be building &lt;a href="https://docs.aws.amazon.com/codebuild/latest/userguide/sample-docker.html" rel="noopener noreferrer"&gt;docker image&lt;/a&gt; to push to ECR. We set the following environment variables as they are referenced in the &lt;code&gt;buildspec.yml&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS_DEFAULT_REGION&lt;/strong&gt;: us-east-1&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS_ACCOUNT_ID&lt;/strong&gt;: with a value of account-ID&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IMAGE_TAG&lt;/strong&gt;: with a value of Latest&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IMAGE_REPO_NAME&lt;/strong&gt;: tweepy-stream-deploy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;buildspec.yml&lt;/code&gt; file, is similar to the example in the &lt;a href="https://docs.aws.amazon.com/codebuild/latest/userguide/sample-docker.html" rel="noopener noreferrer"&gt;AWS docs&lt;/a&gt; for pushing the docker image to ECR. In the &lt;strong&gt;pre-build&lt;/strong&gt; phase,  we use the &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/ecr/get-login-password.html" rel="noopener noreferrer"&gt;get-login-password&lt;/a&gt; cli command to retrieve an authentication token using the &lt;code&gt;GetAuthorizationToken API&lt;/code&gt; to authenticate to ECR registry. The token is passed to the login command of the Docker cli to authenticate to the ECR registry and allow Docker to push and pull images from the registry until the authorization token expires (after 12 hours). The &lt;strong&gt;build&lt;/strong&gt; phase runs the steps in the &lt;code&gt;Dockerfile&lt;/code&gt; shown in the snippet below, to build the image.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM public.ecr.aws/lambda/python:3.9.2022.03.23.16

# Copy function code
COPY main_twitter.py ${LAMBDA_TASK_ROOT}
COPY secrets.py ${LAMBDA_TASK_ROOT}
COPY tweets_api.py ${LAMBDA_TASK_ROOT}

COPY requirements.txt  .
RUN  pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"

# Set the CMD to your handler
CMD [ "main_twitter.handler" ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The steps in the Dockerfile include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pulling the base python 3.9 image from ECR&lt;/li&gt;
&lt;li&gt;Copying the module &lt;code&gt;main_twitter.py&lt;/code&gt; containing the lambda handler and the other modules it imports from &lt;/li&gt;
&lt;li&gt;Copying the requirements.txt file and install the python dependencies &lt;/li&gt;
&lt;li&gt;Finally, setting the container entrypoint to the lambda handler&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the image is successfully built in the &lt;strong&gt;build&lt;/strong&gt; phase, it is tagged with the &lt;code&gt;Latest&lt;/code&gt; tag. Finally, the &lt;strong&gt;post-build&lt;/strong&gt; phase pushes the tagged image to the private ECR repository uri.&lt;/p&gt;

&lt;p&gt;The next stage is the &lt;strong&gt;Deploy&lt;/strong&gt; stage named &lt;code&gt;DeployLambda&lt;/code&gt;, which use CloudFormation as Action provider for performing a number of actions to update the resource roles, deleting the existing lambda resource and deploying the latest image to lambda. All these actions use the &lt;code&gt;CFTemplatesartifact&lt;/code&gt; from the source stage, to reference the path to the cloud formation template  (in the &lt;code&gt;TemplatePath&lt;/code&gt; property) relative to the root of the artifact. The Input Artifact would need to be the CloudFormation Template script which is output from the &lt;strong&gt;Source&lt;/strong&gt; stage. We provide the stack name and CloudFormation role in the configuration. The &lt;strong&gt;ActionMode&lt;/strong&gt; will depend on whether we need to create, update or delete the stack.&lt;/p&gt;

&lt;p&gt;The first three actions will update the roles for CloudFormation, CodePipeline and Lambda, if the CloudFormatiom templates have changed. The &lt;strong&gt;runOrder&lt;/strong&gt; property is set to  value of 1 for these actions so that they run in parallel. The next action deletes any existing lambda image which may exist. The &lt;strong&gt;runOrder&lt;/strong&gt; value is incremented to 2 so that it runs after the roles are created.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeleteExistingLambdaImage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"actionTypeId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Deploy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AWS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CloudFormation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"runOrder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"configuration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"ActionMode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DELETE_ONLY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"RoleArn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::376337229415:role/CloudFormationRole"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"StackName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CodeDeployLambdaTweets"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"outputArtifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"inputArtifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CFTemplatesArtifact"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we will deploy the lambda image using &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/action-reference-CloudFormation.html" rel="noopener noreferrer"&gt;CloudFormation Action&lt;/a&gt; called &lt;code&gt;DeployLambdaImage&lt;/code&gt; in this stage. In the configuration, we specify the &lt;strong&gt;OutputFileName&lt;/strong&gt; and the &lt;strong&gt;outputArtifacts&lt;/strong&gt; name, which we will pass to the next stage. The &lt;strong&gt;TemplatePath&lt;/strong&gt; will reference the required template yaml in &lt;strong&gt;CFTemplatesArtifact&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeployLambdaImage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"actionTypeId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Deploy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AWS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CloudFormation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"runOrder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"configuration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"ActionMode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CREATE_UPDATE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"OutputFileName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lambda-codedeploy-output"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"RoleArn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::376337229415:role/CloudFormationRole"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"StackName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CodeDeployLambdaTweets"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"TemplatePath"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CFTemplatesArtifact::CodeDeployLambdaTweepy.yaml"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"outputArtifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"LambdaDeployArtifact"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"inputArtifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CFTemplatesArtifact"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the final stage of CodePipeline, we will do a test invocation of the deployed lambda image. This will use a &lt;strong&gt;CodeDeploy&lt;/strong&gt; action for &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/actions-invoke-lambda-function.html" rel="noopener noreferrer"&gt;invoking lambda&lt;/a&gt;. We will use the artifact from the previous stage as input. The configuration parameters set the function name and parameter values for invoking the lambda function  i.e. we will stream tweets with the keyword &lt;code&gt;Machine Learning&lt;/code&gt; for a duration of 10 secs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"LambdaInvocationTest"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"LambdaStagingInvocation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"actionTypeId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Invoke"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AWS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Lambda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"runOrder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"configuration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"FunctionName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"codedeploy-staging"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"UserParameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;keyword&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Machine Learning&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;duration&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:10}"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"outputArtifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"LambdaInvocationArtifact"&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"inputArtifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"LambdaDeployArtifact"&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can create the CodePipeline resource with the &lt;strong&gt;Source&lt;/strong&gt;, &lt;strong&gt;Build&lt;/strong&gt; and &lt;strong&gt;Deploy&lt;/strong&gt; stages using the following &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/pipelines-create.html" rel="noopener noreferrer"&gt;command&lt;/a&gt; in cli.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;aws codepipeline create-pipeline &lt;span class="nt"&gt;--cli-input-json&lt;/span&gt; file://cp-definitions/deploy-lambda-image.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This should create the pipeline which should be visible in the console or via cli &lt;code&gt;list-pipelines&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws codepipeline list-pipelines
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The next section will configure our setup to be able to pull and push to code commit repository from our local machine. The CodeCommit respository that we have just created in CodePipeline is empty so we also need to copy the &lt;a href="https://github.com/ryankarlos/AWS%20CICD/tree/master/projects/deploy-lambda-image" rel="noopener noreferrer"&gt;application code&lt;/a&gt; into the repository before running CodePipeline end to end. &lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up a local repository
&lt;/h3&gt;

&lt;p&gt;In this step, you set up a local repository to connect to your remote CodeCommit repository.This assumes using ssh keys installed on your machine. If not setup ssh keys already using &lt;code&gt;ssh-keygen&lt;/code&gt; as described in the &lt;a href="https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up-ssh-unixes.html" rel="noopener noreferrer"&gt;AWS docs&lt;/a&gt;. Upload your SSH public key to your IAM user. Once you have uploaded your SSH public key, copy the SSH Key ID. Edit your SSH configuration file named "config" in your local ~/.ssh directory.  Add the following lines to the file, where the value for User is the SSH Key ID.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Host git-codecommit.*.amazonaws.com
User Your-IAM-SSH-Key-ID-Here
IdentityFile ~/.ssh/Your-Private-Key-File-Name-Here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you have saved the file, make sure it has the right permissions by running the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ~/.ssh
&lt;span class="nb"&gt;chmod &lt;/span&gt;600 config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clone the CodeCommit repository to your local computer and start working on code. You can get the ssh uri from the console under &lt;strong&gt;Clone URL&lt;/strong&gt; for the CodeCommit repository. Navigate to a local directory (e.g. '/tmp') where you'd like your local repository to be stored and run the following &lt;a href="https://docs.aws.amazon.com/codecommit/latest/userguide/how-to-connect.html" rel="noopener noreferrer"&gt;command&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git clone ssh://git-codecommit.us-east-1.amazonaws.com/v1/repos/deploy-lambda-image
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we will copy all the files from the following &lt;a href="https://github.com/ryankarlos/AWS-CICD/tree/master/projects/deploy-lambda-image" rel="noopener noreferrer"&gt;folder&lt;/a&gt;&lt;br&gt;
into the local directory you created earlier (for example, /tmp/deploy-lambda-image). Be sure to place the files directly into your local repository.  The directory and file hierarchy should look like this, assuming you have cloned a repository named &lt;code&gt;deploy-lambda-image&lt;/code&gt; into the &lt;code&gt;/tmp&lt;/code&gt; directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/tmp
   └-- deploy-lambda-image
        ├── README.md
        ├── __init__.py
        ├── appspec.yaml
        ├── buildspec.yml
        ├── dockerfile
        ├── local_run.py
        ├── main_twitter.py
        ├── requirements.txt
        ├── secrets.py
        └── tweets_api.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the following commands to stage all of the files, commit with a commit message and then push the files to the CodeCommit repository.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git add &lt;span class="nb"&gt;.&lt;/span&gt;
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"Add sample application files"&lt;/span&gt;
git push
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The files you downloaded and added to your local repository have now been added to the main branch in the CodeCommit &lt;code&gt;MyDemoRepo&lt;/code&gt; repository and are ready to be included in a pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Pipeline Execution
&lt;/h3&gt;

&lt;p&gt;CodePipeline can be configured to trigger with every push to CodeCommit via EventBridge by defining an event rule to trigger CodePipeline when there are changes to the CodeCommit repository associated with the pipeline. This can be done from the console as detailed &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/pipelines-trigger-source-repo-changes-console.html" rel="noopener noreferrer"&gt;here&lt;/a&gt; and replacing the Arn for respective CodeCommit and CodePipeline resources respectively. &lt;/p&gt;

&lt;p&gt;We can now commit the code in our local repository to CodeCommit. The will create a CodeCommit event which will be processed by EventBridge according to the configured rule, which will then trigger CodePipeline to execute the different stages as shown in the screenshots below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fog28nexxmvv5kr36rysg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fog28nexxmvv5kr36rysg.png" alt="TweetsLambdaDeploy-pipelineviz-1" width="800" height="641"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fng92016o4ojm9wuo9c7y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fng92016o4ojm9wuo9c7y.png" alt="TweetsLambdaDeploy-pipelineviz-2" width="800" height="506"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For manual triggering, choose &lt;strong&gt;Release change&lt;/strong&gt; on the pipeline details page on the console. This runs the most recent revision available in each source location specified in a CodeCommit Source action through the pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6i4hcgde1pw8ghyiyalr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6i4hcgde1pw8ghyiyalr.png" alt="codepipeline_executionhistory" width="800" height="323"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the pipeline has finished, we can check &lt;strong&gt;CloudWatch&lt;/strong&gt; to see the invocation logs in the corresponding log stream. The &lt;code&gt;main_twitter.handler&lt;/code&gt; calls the &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/APIReference/API_PutJobSuccessResult.html" rel="noopener noreferrer"&gt;PutJobSuccessResult&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/APIReference/API_PutJobFailureResult.html" rel="noopener noreferrer"&gt;PutJobFailureResult&lt;/a&gt; actions to return the success/failure of the lambda execution to the pipeline, which will terminate the &lt;strong&gt;LambdaInvocationTest&lt;/strong&gt; stage with success or failure appropriately.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foullyomjtg3q5ph1clt4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foullyomjtg3q5ph1clt4.png" alt="lambda_invocation_logs" width="800" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[1] &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/concepts-devops-example.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/codepipeline/latest/userguide/concepts-devops-example.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[2] &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/concepts.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/codepipeline/latest/userguide/concepts.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[3] &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/approvals-action-add.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/codepipeline/latest/userguide/approvals-action-add.html&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;[4] &lt;a href="https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-pull-ecr-image.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-pull-ecr-image.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[5] &lt;a href="https://docs.aws.amazon.com/AmazonECR/latest/userguide/registry_auth.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/AmazonECR/latest/userguide/registry_auth.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[6] &lt;a href="https://docs.aws.amazon.com/codebuild/latest/userguide/getting-started-create-build-project-console.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/codebuild/latest/userguide/getting-started-create-build-project-console.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[7] &lt;a href="https://docs.aws.amazon.com/codebuild/latest/userguide/getting-started-run-build-console.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/codebuild/latest/userguide/getting-started-run-build-console.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[8] &lt;a href="https://docs.aws.amazon.com/codebuild/latest/userguide/sample-docker.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/codebuild/latest/userguide/sample-docker.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[9] &lt;a href="https://docs.aws.amazon.com/codebuild/latest/userguide/getting-started-cli-create-build-spec.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/codebuild/latest/userguide/getting-started-cli-create-build-spec.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[10] &lt;a href="https://docs.aws.amazon.com/codecommit/latest/userguide/how-to-create-repository.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/codecommit/latest/userguide/how-to-create-repository.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[11] &lt;a href="https://docs.aws.amazon.com/codedeploy/latest/userguide/applications-create.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/codedeploy/latest/userguide/applications-create.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[12] &lt;a href="https://docs.aws.amazon.com/codedeploy/latest/userguide/deployment-groups-create.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/codedeploy/latest/userguide/deployment-groups-create.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[13] &lt;a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-validate-template.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-validate-template.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[14] &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/pipelines-create.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/codepipeline/latest/userguide/pipelines-create.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[15] &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/lambda/create-function.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/cli/latest/reference/lambda/create-function.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[16] &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/lambda/invoke.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/cli/latest/reference/lambda/invoke.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[17] &lt;a href="https://bobbyhadz.com/blog/aws-cli-invalid-base64-lambda-error" rel="noopener noreferrer"&gt;https://bobbyhadz.com/blog/aws-cli-invalid-base64-lambda-error&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>git</category>
      <category>testing</category>
    </item>
    <item>
      <title>AWS Fraud Detector for classifying fraudulent online registered accounts - Part 2</title>
      <dc:creator>Ryan Nazareth</dc:creator>
      <pubDate>Wed, 14 Sep 2022 20:21:06 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-fraud-detector-for-classifying-fraudulent-online-registered-accounts-part-2-4ngo</link>
      <guid>https://forem.com/aws-builders/aws-fraud-detector-for-classifying-fraudulent-online-registered-accounts-part-2-4ngo</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/aws-builders/aws-fraud-detector-for-classifying-fraudulent-online-registered-accounts-part-1-1j4p"&gt;first part&lt;/a&gt; of this blog, we trained and evaluated the fraud detector model performance. Now we will need to make it active by deploying it and then make predictions. We will also setup a &lt;strong&gt;REST API&lt;/strong&gt; with &lt;strong&gt;API gateway (Lambda Integration)&lt;/strong&gt; for making realtime predictions. The architecture diagram below shows the workflow. All the references to code snippets are in &lt;a href="https://github.com/ryankarlos/AWS-ML-services/tree/master/projects/fraud" rel="noopener noreferrer"&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmrinc4jlz4ixzk4zibd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmrinc4jlz4ixzk4zibd.png" alt="Fraud Predict Architecture" width="800" height="638"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the AWS Fraud Detector console, choose the model we have trained from the &lt;strong&gt;Models&lt;/strong&gt; page. Scroll to the top of the &lt;strong&gt;Version details&lt;/strong&gt; page and choose &lt;strong&gt;Actions&lt;/strong&gt; and &lt;strong&gt;Deploy model version&lt;/strong&gt;. On the &lt;strong&gt;Deploy model version&lt;/strong&gt; prompt that appears, choose &lt;strong&gt;Deploy version&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwovbhpq6jp7545m5mt8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwovbhpq6jp7545m5mt8.png" alt="variables" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Version details&lt;/strong&gt; shows a &lt;strong&gt;Status&lt;/strong&gt; of 'Deploying'. When the model is ready, the Status changes to &lt;strong&gt;Active&lt;/strong&gt;. Once the model has finished deploying and status changed to active, we will need to associate the model with Fraud Detector for predictions. However, we will also need to update the rule expressions as the default Fraud Detector version 1 created from Cloudformation uses the variable &lt;strong&gt;amt&lt;/strong&gt; in the rule expression as seen in the screenshot below. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feifk9pzhoqsagpylcx8t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feifk9pzhoqsagpylcx8t.png" alt="detector-version-1-default-rules" width="800" height="168"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We need to change this to model insight score which is a new variable created after model training has completed. This variable is not available when the Cloudformation stack is created as the model has not been trained yet so we needed to have a placeholder existing variable so the rule expression is valid to avoid the stack for throwing an error. We can run the following custom &lt;a href="https://github.com/ryankarlos/AWS-ML-services/blob/master/projects/fraud/deploy.py" rel="noopener noreferrer"&gt;script&lt;/a&gt; from the command line to update the detector rules and associate the new model with it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python projects/fraud/deploy.py &lt;span class="nt"&gt;--update_rule&lt;/span&gt; 1 &lt;span class="nt"&gt;--model_version&lt;/span&gt; 1.0 &lt;span class="nt"&gt;--rules_version&lt;/span&gt; 2 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will carry out two steps: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Update the existing rule version with the correct expression based on the number passed to the &lt;code&gt;--update_rule&lt;/code&gt; argument. It will create a new rule version (incremented from the original version number). &lt;/li&gt;
&lt;li&gt;Then it will create a new detector version and associate the model version (&lt;code&gt;--model_version&lt;/code&gt; argument) and the rules_version (&lt;code&gt;--rules_version&lt;/code&gt; argument) which should be set to be the increment of the existing rule version. This will automatically increment the detector version to &lt;strong&gt;2.0&lt;/strong&gt; as the existing version is &lt;strong&gt;1.0&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the script runs successfully, we should see the following output streamed to stdout.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;26-06-2022 04:50:34 : INFO : deploy : main : 121 : Updating rule version 1
26-06-2022 04:50:34 : INFO : deploy : update_detector_rules : 71 : Updating Investigate rule ....
{'detectorId': 'fraud_detector_demo', 'ruleId': 'investigate', 'ruleVersion': '2'}

26-06-2022 04:50:35 : INFO : deploy : update_detector_rules : 80 : Updating review rule ....
{'detectorId': 'fraud_detector_demo', 'ruleId': 'review', 'ruleVersion': '2'}

26-06-2022 04:50:35 : INFO : deploy : update_detector_rules : 89 : Updating approve rule ....
{'detectorId': 'fraud_detector_demo', 'ruleId': 'approve', 'ruleVersion': '2'}

26-06-2022 04:50:35 : INFO : deploy : main : 123 : Deploying trained model version 1.0 to new detector version 
{'detectorId': 'fraud_detector_demo', 'detectorVersionId': '2', 'status': 'DRAFT', 'ResponseMetadata': {'RequestId': 'da37d973-2c43-4c56-93e5-f9b9bd132bb3', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sun, 26 Jun 2022 03:50:36 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '77', 'connection': 'keep-alive', 'x-amzn-requestid': 'da37d973-2c43-4c56-93e5-f9b9bd132bb3'}, 'RetryAttempts': 0}}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The image below show the model associated with the new version and the correct rules expressions which use the &lt;strong&gt;fraud_model_insightscore&lt;/strong&gt; variable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3x4yu1mvf73kwre9x7ox.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3x4yu1mvf73kwre9x7ox.png" alt="fraud-detector-update-rules-version2" width="800" height="181"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8toxp0ntn9qmj6qydhcn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8toxp0ntn9qmj6qydhcn.png" alt="fraud-detector-associate-model" width="800" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the next section, we will set up API gateway to create a rest api endpoint to serve http requests, with lambda in the backend.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up API gateway
&lt;/h3&gt;

&lt;p&gt;In this section, we will walk through the steps to create the REST API &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open the &lt;strong&gt;API Gateway console&lt;/strong&gt;, and select &lt;strong&gt;CreateAPI&lt;/strong&gt; and the type as &lt;strong&gt;RestAPI&lt;/strong&gt;. &lt;/li&gt;
&lt;li&gt;To create an empty API, select &lt;strong&gt;Create New API&lt;/strong&gt; and then &lt;strong&gt;New API&lt;/strong&gt;. In &lt;strong&gt;Settings&lt;/strong&gt;, choose a &lt;strong&gt;API name&lt;/strong&gt; such as &lt;code&gt;FraudLambdaProxy&lt;/code&gt; and optional &lt;strong&gt;Description&lt;/strong&gt;. Then choose &lt;code&gt;Create API&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Choose the root resource (&lt;strong&gt;/&lt;/strong&gt;) just created and select &lt;strong&gt;Create Method&lt;/strong&gt; from the &lt;strong&gt;Actions&lt;/strong&gt; menu. Then Select &lt;code&gt;Get&lt;/code&gt; from the dropdown menu.&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;Integration Type&lt;/strong&gt; select &lt;code&gt;Lambda Function&lt;/code&gt; and choose &lt;code&gt;Use Lambda Proxy integration&lt;/code&gt;. The Lambda function should already have been created via Cloudformation in &lt;a href="https://dev.to/aws-builders/aws-fraud-detector-for-classifying-fraudulent-online-registered-accounts-part-1-1j4p"&gt;part 1&lt;/a&gt;). Select the &lt;strong&gt;Lambda Region&lt;/strong&gt; where the function was created (&lt;code&gt;us-east-1&lt;/code&gt;) and for the &lt;strong&gt;Lambda Function&lt;/strong&gt; field, select &lt;code&gt;PredictFraudModel&lt;/code&gt; from the dropdown, and then click &lt;strong&gt;Save&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;In the &lt;strong&gt;Method Execution&lt;/strong&gt; pane, choose &lt;strong&gt;Method Request&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;In settings, set &lt;strong&gt;Request Validator&lt;/strong&gt; to &lt;code&gt;Validate query string parameters and headers&lt;/code&gt;. Leave &lt;strong&gt;Authorization&lt;/strong&gt; as &lt;code&gt;None&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Expand the &lt;strong&gt;URL Query String Parameters&lt;/strong&gt; dropdown, then choose &lt;strong&gt;Add query string&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Enter the following variables one by one as a separate name field. Mark all as required except for &lt;code&gt;flow_definition&lt;/code&gt; variable
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; amt, category, cc_num, city, city_pop, event_timestamp, first, flow_definition, gender, job, last, merchant, state, street, trans_num, zip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybztcjttoatrr681k3dd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybztcjttoatrr681k3dd.png" alt="API-gateway-method-request-console" width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go back to the &lt;strong&gt;Method Execution&lt;/strong&gt; pane.It should look like below.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8h4xo4dnpxyvv4vv9ww.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8h4xo4dnpxyvv4vv9ww.png" alt="API-gateway-method-execution-console" width="800" height="173"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose &lt;strong&gt;Integration Request&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Choose the &lt;strong&gt;Mapping Templates&lt;/strong&gt; dropdown and then choose &lt;strong&gt;Add mapping template&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;For the &lt;strong&gt;Content-Type&lt;/strong&gt; field, enter &lt;code&gt;application/json&lt;/code&gt; and then choose the check mark icon.&lt;/li&gt;
&lt;li&gt;In the pop-up that appears, choose &lt;strong&gt;Yes&lt;/strong&gt; to secure this integration.&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;Request body passthrough&lt;/strong&gt;, choose &lt;code&gt;When there are no templates defined (recommended)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;In the &lt;strong&gt;mapping template&lt;/strong&gt; editor, copy and replace the existing script with the following code:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;#if(&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('flow_definition')"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;#set(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;$my_default_value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$input.params('flow_definition')"&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;#else&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;#set&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;($my_default_value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ignore"&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;#end&lt;/span&gt;&lt;span class="w"&gt;


&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"variables"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"trans_num"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('trans_num')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"amt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('amt')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"zip"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('zip')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"city"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('city')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"first"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('first')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"job"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('job')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"street"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('street')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('category')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"city_pop"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('city_pop')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"gender"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('gender')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"cc_num"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('cc_num')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"last"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('last')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"state"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('state')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"merchant"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('merchant')"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"EVENT_TIMESTAMP"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$input.params('event_timestamp')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"flow_definition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"$my_default_value"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwhbz6iwa4vd57i7qzyqw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwhbz6iwa4vd57i7qzyqw.png" alt="API-gateway-integration-request-console" width="800" height="599"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose &lt;strong&gt;Save&lt;/strong&gt;, and go back to &lt;strong&gt;MethodExecution&lt;/strong&gt; pane. Click on &lt;strong&gt;Test&lt;/strong&gt; button on the left.&lt;/li&gt;
&lt;li&gt;In the &lt;strong&gt;Query Strings box&lt;/strong&gt; paste the following
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;trans_num=6cee353a9d618adfbb12ecad9d427244&amp;amp;amt=245.97&amp;amp;zip=97383&amp;amp;city=Stayton&amp;amp;first=Erica&amp;amp;job=Engineer, biomedical&amp;amp;street=213 Girll Expressway&amp;amp;category=shopping_pos&amp;amp;city_pop=116001&amp;amp;gender=F&amp;amp;cc_num=180046165512893&amp;amp;last=Walker&amp;amp;state=OR&amp;amp;merchant=fraud_Macejkovic-Lesch&amp;amp;event_timestamp=2020-10-13T09:21:53.000Z&amp;amp;flow_definition=arn:aws:sagemaker:us-east-1:376337229415:flow-definition/fraud-detector-a2i-1656277295743
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If successful you should see the response and logs as in screenshot below. You can also navigate to CloudWatch log stream group for Lambda invocation and check it has run successfully.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuli6lsn7v562o8rs5fxe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuli6lsn7v562o8rs5fxe.png" alt="API-gateway-get-method-test1-console" width="800" height="374"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpuyzmyh67uy2swxh2fr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpuyzmyh67uy2swxh2fr.png" alt="API-gateway-get-method-test2logs-console" width="800" height="609"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can then proceed to deploying the API. Go back to &lt;strong&gt;Resources&lt;/strong&gt;,  &lt;strong&gt;Actions&lt;/strong&gt; and &lt;strong&gt;Deploy API&lt;/strong&gt;. Select &lt;strong&gt;Deployment Stage&lt;/strong&gt; as  &lt;code&gt;New Stage&lt;/code&gt; and choose &lt;strong&gt;name&lt;/strong&gt; as &lt;code&gt;dev&lt;/code&gt;. You should see the API endpoint to invoke on the console. Finally make sure logging is setup to allow debugging errors in  the REST API, by following the instructions &lt;a href="https://aws.amazon.com/premiumsupport/knowledge-center/api-gateway-cloudwatch-logs/" rel="noopener noreferrer"&gt;here&lt;/a&gt;. The setup should look like below. Note that when you add the IAM role to gateway console, it should automatically add the log group in the format &lt;code&gt;API-Gateway-Execution-Logs_apiId/stageName&lt;/code&gt;. The Arn for the log group end with &lt;code&gt;dev:*&lt;/code&gt;. You need to only include the Arn upto the stagename &lt;code&gt;dev&lt;/code&gt; as shown in the image below. Otherwise it will throw issues with the validation checks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhea2ri13ze15gkmsx2ih.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhea2ri13ze15gkmsx2ih.png" alt="api-rest-stage-editor-logs-config" width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To test the API's new endpoint, we can use &lt;a href="https://learning.postman.com/docs/getting-started/sending-the-first-request/" rel="noopener noreferrer"&gt;postman&lt;/a&gt; for sending an API request. Create a postman account and select &lt;strong&gt;GET&lt;/strong&gt; from the list of request types. Since the &lt;strong&gt;GET&lt;/strong&gt; method is configured in &lt;code&gt;/&lt;/code&gt; root resource, we can invoke the api endpoint &lt;code&gt;https://d9d16i7hbc.execute-api.us-east-1.amazonaws.com/dev&lt;/code&gt; with the query string parameters appended at the end  (&lt;code&gt;key=value&lt;/code&gt; format and separated by &lt;code&gt;&amp;amp;&lt;/code&gt;). Paste the following command in the box as in screenshot below. You should see the parameters and values automatically parsed and populated in the &lt;code&gt;KEY/VALUE&lt;/code&gt; rows below.&lt;br&gt;
Click send and you should see the response body at the bottom.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://d9d16i7hbc.execute-api.us-east-1.amazonaws.com/dev?trans_num=6cee353a9d618adfbb12ecad9d427244&amp;amp;amt=245.97&amp;amp;zip=97383&amp;amp;city=Stayton&amp;amp;first=Erica&amp;amp;job='Engineer, biomedical'&amp;amp;street='213 Girll Expressway'&amp;amp;category=shopping_pos&amp;amp;city_pop=116001&amp;amp;gender=F&amp;amp;cc_num=180046165512893&amp;amp;last=Walker&amp;amp;state=OR&amp;amp;merchant=fraud_Macejkovic-Lesch&amp;amp;event_timestamp=2020-10-13T09:21:53.000Z
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujtijy4e82raznlp4e87.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujtijy4e82raznlp4e87.png" alt="postman-api-gateway" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can check the log streams associated with the latest invocation in the Cloudwatch log group for API gateway. This will show  messages with the execution or access details of your request.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3bbpmggbv06vsfn51q0l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3bbpmggbv06vsfn51q0l.png" alt="API-gateway-cloudwatch-log-group" width="800" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: If any changes are made to the api configuration or parameters - it would need to be redeployed for the changes to take effect.&lt;/p&gt;

&lt;h3&gt;
  
  
  Generate Predictions
&lt;/h3&gt;

&lt;p&gt;You can use a batch predictions job in Amazon Fraud Detector to get predictions for a set of events that do not require real-time scoring. You may want to generate fraud predictions for a batch of events. These might be payment fraud, account take over or compromise, and free tier misuse while performing an offline proof-of-concept. &lt;/p&gt;

&lt;p&gt;You can also use batch predictions to evaluate the risk of events on an hourly, daily, or weekly basis depending upon your business need. If you want to analyze fraud transactions after the fact, you can perform batch fraud predictions using Amazon Fraud Detector.  Then you can store fraud prediction results in an Amazon S3 bucket. Although beyond the scope of this example, we could have also used additional services like Amazon Athena  to help analyze the fraud prediction results (once delivered in S3) and Amazon QuickSight for visualising the results on a dashboard.Copy the batch sample file delivered in the  glue_transformed folder (following successful glue job run) to batch_predict folder. This will trigger notification to SQS queue which has Lambda function as target, which starts the batch prediction job in Fraud Detector&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;s3://fraud-sample-data/glue_transformed/test/fraudTest.csv s3://fraud-sample-data/batch_predict/fraudTest.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can monitor the batch prediction jobs in Fraud Detector. Once complete, we should see the output in S3. An example of &lt;br&gt;
a batch output is available &lt;a href="https://github.com/ryankarlos/AWS-ML-services/blob/master/datasets/fraud-sample-data/dataset1/results/DetectorBatchResults.csv" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpe0pe8kyk1b73kb6bg8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpe0pe8kyk1b73kb6bg8.png" alt="batch_prediction_jobs" width="800" height="227"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In realtime mode, we will make use of the API gateway created and integrated with the lambda function which makes the &lt;br&gt;
&lt;code&gt;get_event_prediction&lt;/code&gt; api call to FraudDetector. In this example we are using the same lambda for batch and realtime predictions. The code in lamdba checks the checks the event payload to see if certain keys are present which are expected from a request from API gateway (.i.e after the request &lt;br&gt;
is transformed via the mapping template in api gateway). We have configured the mapping template to create a variables key, so we can check if the payload has 'variables' key,  to run realtime prediction. If the event payload has 'Records' key, it indicates the event is coming from  SQS and will run a batch prediction job. &lt;/p&gt;

&lt;p&gt;Ideally, separate lambdas for realtime and batch prediction could have been used, to make it easier to manage. &lt;br&gt;
To run realtime prediction, API gateway REST API has been configured to accept query string parameters and send the request to lambda as explained in the previous section.&lt;/p&gt;
&lt;h3&gt;
  
  
  Teardown resources
&lt;/h3&gt;

&lt;p&gt;The custom bash script below can be executed to teardown all the fraud detector resources, including the trained fraud model, detector (including rules), event type (including outcomes, variables, labels).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="nv"&gt;variables&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt; &lt;span class="s2"&gt;"trans_num"&lt;/span&gt; &lt;span class="s2"&gt;"amt"&lt;/span&gt; &lt;span class="s2"&gt;"city_pop"&lt;/span&gt; &lt;span class="s2"&gt;"street"&lt;/span&gt; &lt;span class="s2"&gt;"job"&lt;/span&gt; &lt;span class="s2"&gt;"cc_num"&lt;/span&gt; &lt;span class="s2"&gt;"gender"&lt;/span&gt; &lt;span class="s2"&gt;"merchant"&lt;/span&gt; &lt;span class="s2"&gt;"last"&lt;/span&gt; &lt;span class="s2"&gt;"category"&lt;/span&gt; &lt;span class="s2"&gt;"zip"&lt;/span&gt; &lt;span class="s2"&gt;"city"&lt;/span&gt; &lt;span class="s2"&gt;"state"&lt;/span&gt; &lt;span class="s2"&gt;"first"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;labels&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s2"&gt;"legit"&lt;/span&gt; &lt;span class="s2"&gt;"fraud"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;rules&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s2"&gt;"investigate"&lt;/span&gt; &lt;span class="s2"&gt;"review"&lt;/span&gt; &lt;span class="s2"&gt;"approve"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;outcomes&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s2"&gt;"high_risk"&lt;/span&gt; &lt;span class="s2"&gt;"low_risk"&lt;/span&gt; &lt;span class="s2"&gt;"medium_risk"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;event_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"credit_card_transaction"&lt;/span&gt;
&lt;span class="nv"&gt;entity_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"customer"&lt;/span&gt;
&lt;span class="nv"&gt;detector_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"fraud_detector_demo"&lt;/span&gt;
&lt;span class="nv"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;fraud_model

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Delete model versions"&lt;/span&gt;    
aws frauddetector  delete-model-version &lt;span class="nt"&gt;--model-id&lt;/span&gt; &lt;span class="nv"&gt;$model_name&lt;/span&gt; &lt;span class="nt"&gt;--model-type&lt;/span&gt; ONLINE_FRAUD_INSIGHTS &lt;span class="nt"&gt;--model-version-number&lt;/span&gt; 1.0
aws frauddetector  delete-model-version &lt;span class="nt"&gt;--model-id&lt;/span&gt; &lt;span class="nv"&gt;$model_name&lt;/span&gt; &lt;span class="nt"&gt;--model-type&lt;/span&gt; ONLINE_FRAUD_INSIGHTS &lt;span class="nt"&gt;--model-version-number&lt;/span&gt; 2.0

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Delete model"&lt;/span&gt;
aws frauddetector  delete-model &lt;span class="nt"&gt;--model-id&lt;/span&gt; &lt;span class="nv"&gt;$model_name&lt;/span&gt; &lt;span class="nt"&gt;--model-type&lt;/span&gt; ONLINE_FRAUD_INSIGHTS


&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting detector version id 1"&lt;/span&gt;
aws frauddetector delete-detector-version &lt;span class="nt"&gt;--detector-id&lt;/span&gt; &lt;span class="nv"&gt;$detector_name&lt;/span&gt; &lt;span class="nt"&gt;--detector-version-id&lt;/span&gt; 1

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;var &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;rules&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;do
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting rule &lt;/span&gt;&lt;span class="nv"&gt;$var&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        aws frauddetector  delete-rule &lt;span class="nt"&gt;--rule&lt;/span&gt; &lt;span class="nv"&gt;detectorId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$detector_name&lt;/span&gt;,ruleId&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$var&lt;/span&gt;,ruleVersion&lt;span class="o"&gt;=&lt;/span&gt;1
    &lt;span class="k"&gt;done&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting detector id &lt;/span&gt;&lt;span class="nv"&gt;$detector_name&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
aws frauddetector delete-detector &lt;span class="nt"&gt;--detector-id&lt;/span&gt; &lt;span class="nv"&gt;$detector_name&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting event-type &lt;/span&gt;&lt;span class="nv"&gt;$event_type&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
aws frauddetector delete-event-type &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$event_type&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting entity-type &lt;/span&gt;&lt;span class="nv"&gt;$entity_type&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
aws frauddetector delete-entity-type &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$entity_type&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;var &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;variables&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;do
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting variable &lt;/span&gt;&lt;span class="nv"&gt;$var&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        aws frauddetector  delete-variable &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$var&lt;/span&gt;
    &lt;span class="k"&gt;done&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;var &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;do
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting label &lt;/span&gt;&lt;span class="nv"&gt;$var&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        aws frauddetector  delete-label &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$var&lt;/span&gt;
    &lt;span class="k"&gt;done&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;var &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;outcomes&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;do
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting outcome &lt;/span&gt;&lt;span class="nv"&gt;$var&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        aws frauddetector  delete-outcome &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$var&lt;/span&gt;
    &lt;span class="k"&gt;done&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting cloud formation stack"&lt;/span&gt;
aws cloudformation delete-stack &lt;span class="nt"&gt;--stack-name&lt;/span&gt; FraudDetectorGlue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, from the S3 console, empty the bucket contents and then delete the bucket.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdih03km0z0jual0nlte3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdih03km0z0jual0nlte3.png" alt="bucket-final-folder-structure" width="800" height="190"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>api</category>
      <category>aws</category>
      <category>machinelearning</category>
      <category>serverless</category>
    </item>
  </channel>
</rss>
