<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Cristian Carballo</title>
    <description>The latest articles on Forem by Cristian Carballo (@criscarba).</description>
    <link>https://forem.com/criscarba</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F899050%2Fc90b9d9b-b5a1-4b32-a977-04192e0f21e5.png</url>
      <title>Forem: Cristian Carballo</title>
      <link>https://forem.com/criscarba</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/criscarba"/>
    <language>en</language>
    <item>
      <title>Alternativa a Bedrock Knowledge Base</title>
      <dc:creator>Cristian Carballo</dc:creator>
      <pubDate>Mon, 10 Feb 2025 14:11:53 +0000</pubDate>
      <link>https://forem.com/aws-builders/alternativa-a-bedrock-knowledge-base-2pf</link>
      <guid>https://forem.com/aws-builders/alternativa-a-bedrock-knowledge-base-2pf</guid>
      <description>&lt;p&gt;En esta publicación compartiré una arquitectura de solución que he diseñado para crear una Base de conocimiento de un modelo de RAG. &lt;/p&gt;

&lt;p&gt;Primero que nada, tal vez nos preguntemos &lt;strong&gt;¿Que es un Modelo de RAG?&lt;/strong&gt;.. Básicamente un modelo RAG (Retrieval-Augmented Generation) es una arquitectura de inteligencia artificial que combina recuperación de información (Retrieval) con generación de texto (Generation) para mejorar la precisión y relevancia de las respuestas generadas por modelos de lenguaje.&lt;/p&gt;

&lt;p&gt;Este enfoque se utiliza principalmente en sistemas de IA generativa, donde se requiere responder preguntas o generar contenido basado en información actualizada y contextual, sin depender únicamente del conocimiento pre-entrenado del modelo.&lt;/p&gt;

&lt;p&gt;Actualmente el servicio de AWS Bedrock ofrece una increible caracteristica que nos permite crear las Knowledge Base (o base de conocimiento) de una manera muy simple. Para más información sobre esta feature, revisar el siguiente &lt;a href="https://aws.amazon.com/bedrock/knowledge-bases/" rel="noopener noreferrer"&gt;link&lt;/a&gt;. Sin embargo, también podemos diseñar soluciones escalables para el mismo fin dentro de AWS. &lt;/p&gt;

&lt;p&gt;A continuación compartiré una arquitectura de solución que está idealmente pensada para la generación de los embeddings a través de la incorporación de nuevos archivos en Amazon S3. Esta arquitectura representa una Knowledge Base en AWS, diseñada para almacenar y procesar información, generando embeddings que permiten búsquedas semánticas utilizando Amazon OpenSearch. A continuación, se describe cada componente y su función en el flujo de datos. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fes53adakcmc19a2pw7ro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fes53adakcmc19a2pw7ro.png" alt="Image description" width="800" height="238"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;Componentes Principales&lt;/strong&gt;&lt;/u&gt;&lt;br&gt;
&lt;strong&gt;🔹 Infraestructura y Almacenamiento&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS CloudFormation: Automatiza la creación de la infraestructura, desplegando la Knowledge Base Stack.&lt;/li&gt;
&lt;li&gt;S3 Knowledge Base: Almacena los documentos y archivos que serán procesados.&lt;/li&gt;
&lt;li&gt;Files Metadata Table: Base de datos que mantiene los metadatos de los archivos procesados.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🔹 Procesamiento de Datos&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon EventBridge: Detecta eventos cuando se suben archivos a S3 y activa el flujo de procesamiento.&lt;/li&gt;
&lt;li&gt;Extract Metadata (Lambda): Extrae metadatos de los archivos y los envía a la Files Metadata Table.&lt;/li&gt;
&lt;li&gt;File Processing Queue (SQS): Cola de procesamiento que gestiona las solicitudes para generar embeddings.&lt;/li&gt;
&lt;li&gt;File Processing Dead Letter Queue (SQS): Almacena mensajes fallidos que no pudieron ser procesados.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🔹 Generación de Embeddings y Vectorización&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate Embeddings (Lambda): Genera embeddings de los archivos utilizando un modelo de machine learning.&lt;/li&gt;
&lt;li&gt;ECR (Elastic Container Registry): Contiene imágenes Docker para ejecutar las funciones de extracción de metadatos y generación de embeddings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🔹 Indexación y Búsqueda Semántica&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon OpenSearch Service: Almacena y permite búsquedas basadas en embeddings, facilitando la búsqueda semántica dentro de la Knowledge Base.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Narrativa del Processo: &lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Carga de Archivos → Un usuario sube un archivo a S3 Knowledge Base.&lt;/li&gt;
&lt;li&gt;Evento en EventBridge → Detecta la carga y activa la Lambda Extract Metadata.&lt;/li&gt;
&lt;li&gt;Extracción de Metadatos → La Lambda extrae metadatos y los guarda en Files Metadata Table.&lt;/li&gt;
&lt;li&gt;Cola de Procesamiento (SQS) → Se envía un mensaje a la File Processing Queue para iniciar la generación de embeddings.&lt;/li&gt;
&lt;li&gt;Generación de Embeddings → La Lambda Generate Embeddings convierte el archivo en un vector numérico.&lt;/li&gt;
&lt;li&gt;Indexación en OpenSearch → Los embeddings se almacenan en Amazon OpenSearch para búsqueda semántica.&lt;/li&gt;
&lt;li&gt;Actualización de Estado → La tabla de metadatos se actualiza con el estado final del archivo.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Beneficios de esta Arquitectura&lt;/u&gt;&lt;/strong&gt;&lt;br&gt;
✅ Automatización Completa con AWS CloudFormation y EventBridge.&lt;br&gt;
✅ Procesamiento Escalable con AWS Lambda y SQS.&lt;br&gt;
✅ Búsqueda Semántica Avanzada mediante Amazon OpenSearch y embeddings.&lt;br&gt;
✅ Alta Disponibilidad y Tolerancia a Fallos con S3, SQS y Dead Letter Queue.&lt;/p&gt;

&lt;p&gt;Espero que sea útil para todos ustedes esta arquitectura de referencia. Ante cualquier consulta, no duden en contactarme!. &lt;a href="https://www.linkedin.com/in/cristianrcarballo/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/cristianrcarballo/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>data</category>
      <category>cloud</category>
      <category>aws</category>
      <category>rag</category>
    </item>
    <item>
      <title>No le temas a AWS LakeFormation</title>
      <dc:creator>Cristian Carballo</dc:creator>
      <pubDate>Mon, 10 Feb 2025 13:54:56 +0000</pubDate>
      <link>https://forem.com/aws-builders/no-le-temas-a-aws-lakeformation-e67</link>
      <guid>https://forem.com/aws-builders/no-le-temas-a-aws-lakeformation-e67</guid>
      <description>&lt;p&gt;Al momento de diseñar una solución en AWS, es sumamente importante prestar atención a la "Seguridad" y más aún cuando en la solución se involucra el acceso a datos. &lt;/p&gt;

&lt;p&gt;En esta publicación hablaré sobre una alternativa que nos ofrece el Servicio AWS LakeFormation para resguardar el acceso a los datos.&lt;/p&gt;

&lt;p&gt;Si bien existen algunas alternativas, la aqui trataremos el control de acceso se gestiona mediante LF-Tags. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fohkm5psi7vpk71hw1ex0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fohkm5psi7vpk71hw1ex0.png" alt="Image description" width="800" height="560"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A continuación, se desglosan los componentes y su funcionamiento:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Servicios Claves Utilizados&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;u&gt;AWS Lake Formation&lt;/u&gt;: Administra permisos a nivel de base de datos y tablas utilizando LF-Tags.&lt;/li&gt;
&lt;li&gt;
&lt;u&gt;AWS Glue Data Catalog&lt;/u&gt;: Almacena metadatos de los datos en S3.&lt;/li&gt;
&lt;li&gt;
&lt;u&gt;Bases de Datos en AWS Glue&lt;/u&gt;:
-- Glue DB#1 (SALES): Contiene tablas con datos de ventas.
-- Glue DB#2 (MKT): Contiene tablas de datos de marketing.
-- Glue DB#3 (PRIVATE): Contiene datos privados/sensibles.&lt;/li&gt;
&lt;li&gt;
&lt;u&gt;Athena&lt;/u&gt;: Permite consultas SQL sobre los datos.&lt;/li&gt;
&lt;li&gt;
&lt;u&gt;Amazon QuickSight&lt;/u&gt;: Se usa para análisis y visualización de datos.&lt;/li&gt;
&lt;li&gt;
&lt;u&gt;Amazon SageMaker&lt;/u&gt;: Se utiliza para Machine Learning y analítica predictiva.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Control de Acceso con LF-Tags (Etiquetas)&lt;/strong&gt;&lt;br&gt;
La arquitectura emplea LF-Tags para definir permisos a nivel de base de datos, tabla y columna. Como se puede ver en la arquitectura, Se han asignado LF-Tags específicos a cada unidad de negocio:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Etiquetas Verdes → Datos de Ventas (SALES)&lt;/li&gt;
&lt;li&gt;Etiquetas Azules → Datos de Marketing (MKT)&lt;/li&gt;
&lt;li&gt;Etiquetas Rojas → Datos Privados/Restringidos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Al setear etiquetas a las distintas bases de datos podemos permitir/no permitir el acceso a los datos de una manera mas práctica logrando asi que: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Los usuarios con acceso solo a “Sales” LF-Tags pueden consultar datos en Glue DB#1, pero no pueden ver datos de Marketing o Privados.&lt;/li&gt;
&lt;li&gt;Los usuarios con acceso a “Marketing” LF-Tags solo pueden consultar Glue DB#2.&lt;/li&gt;
&lt;li&gt;Los datos Privados (PRIVATE) tienen acceso restringido y requieren permisos especiales.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Flujo de Datos&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Los datos son registrados en AWS Lake Formation.&lt;/li&gt;
&lt;li&gt;AWS Glue Data Catalog almacena los metadatos de bases de datos y tablas.&lt;/li&gt;
&lt;li&gt;Se asignan LF-Tags a bases de datos y tablas.&lt;/li&gt;
&lt;li&gt;IAM Roles + LF-Tags controlan el acceso para usuarios, servicios o grupos.&lt;/li&gt;
&lt;li&gt;Athena, QuickSight y SageMaker acceden de forma segura respetando las restricciones de LF-Tags.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Beneficios Claves&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ Seguridad de Datos con LF-Tags → Garantiza que solo usuarios autorizados accedan a conjuntos de datos específicos.&lt;br&gt;
✅ Segmentación por Unidad de Negocio → Se separan los datos de Ventas, Marketing y Privados mediante control basado en etiquetas.&lt;br&gt;
✅ Integración con Servicios de Análisis AWS → Athena, QuickSight y SageMaker acceden a los datos de manera controlada.&lt;br&gt;
✅ Uso de un Rol IAM Centralizado → Facilita la gestión de permisos a nivel de servicio.&lt;/p&gt;

</description>
      <category>data</category>
      <category>bigdata</category>
      <category>aws</category>
      <category>security</category>
    </item>
    <item>
      <title>Formas de Replicar Datos con S3</title>
      <dc:creator>Cristian Carballo</dc:creator>
      <pubDate>Sat, 30 Dec 2023 13:31:04 +0000</pubDate>
      <link>https://forem.com/criscarba/formas-de-replicar-datos-con-s3-2ld9</link>
      <guid>https://forem.com/criscarba/formas-de-replicar-datos-con-s3-2ld9</guid>
      <description>&lt;p&gt;Cuando se trata de replicación de datos entre Buckets S3, nos encontramos comunmente con 2 escenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Los Buckets (Source &amp;amp; Target) se encuentran en la misma cuenta.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gqgn7o2yl3zmg3nhpgg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gqgn7o2yl3zmg3nhpgg.png" alt="Image description" width="624" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Los Buckets (Source &amp;amp; Target) se encuentran en cuenas diferentes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx0re8wi6ptzsa1qnmedp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx0re8wi6ptzsa1qnmedp.png" alt="Image description" width="624" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Es importante resaltar que existen multiples metodos para replicar datos entre buckets, sin embargo es fundamental entender el caso de uso que se esté abordando en ese momento. En esta publicación nos enfocaremos en casos donde estrictamente se replicará objetos entre S3 Buckets, por lo cual podriamos optar por utilizar la funcionalidad más standard ofrecida por el propio servicio de S3. Estas implementaciones las realizaremos utilizando &lt;strong&gt;CloudFormation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Si bien ambos escenarios son muy similares, hay que considerar que difieren sensiblemente en la manera de que se deben implementar. Para el caso de que &lt;strong&gt;los Buckets se encuentran en la misma cuenta&lt;/strong&gt;, no debe considerarse la incorporación de una &lt;strong&gt;Bucket Policy **en el Target Bucket, sin embargo debe considerarse la activación del **Versionado&lt;/strong&gt;, ya que es condición necesaria para que funcione la replicación y además sirve para poder restaurar los objetos a su versión anterior.&lt;/p&gt;

&lt;p&gt;Para desplegar la &lt;strong&gt;Solución #1 (Buckets en la misma Cuenta)&lt;/strong&gt;, deberemos crear los siguientes archivos:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Archivo de Deploy (deploy.sh)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;###
STACK_NAME="s3-replication-same-account-template"
TEMPLATE_FILE_NAME="s3-replication-same-account-template"
###

PROFILE="default"
ARTIFACTORY_BUCKET="**Nombre de un Random Bucket, para subir el tamplate**"

#1) Create Package
aws cloudformation package --template ./$TEMPLATE_FILE_NAME.yaml \
                           --s3-bucket $ARTIFACTORY_BUCKET \
                           --output json &amp;gt; $TEMPLATE_FILE_NAME-packaged-$ENV.yaml \
                           --profile $PROFILE

#2.1) Create Stack (From Package)
    aws cloudformation create-stack --stack-name $STACK_NAME \
                                    --parameters file://./parameters.json \
                                    --template-body file://./$TEMPLATE_FILE_NAME-packaged-$ENV.yaml \
                                    --profile $PROFILE \
                                    --region us-east-1 \
                                    --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Archivo de Parametros (parameters.json)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[
    {
        "ParameterKey": "pSampleType",
        "ParameterValue": "same-account-replication"
    }
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Archivo de Cloudformation (s3-replication-same-account-
template.yaml)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31
Description: "Same Account - S3 Replication"

Parameters:
  pSampleType:
    Description: S3 Replication Sample Type
    Type: String 


Resources:

  #####################################################
  #################### S3 BUCKET ######################
  #####################################################
  rSourceBucket:
    #DependsOn: rReplicationRole  
    Type: AWS::S3::Bucket
    Properties:       
      BucketName: !Sub "${pSampleType}-source-bucket-${AWS::AccountId}"            
      VersioningConfiguration: 
        Status: Enabled
      ReplicationConfiguration:
        Role: !GetAtt rReplicationRole.Arn
        Rules:
          - Id: !Sub "${pSampleType}-sample"
            Status: Enabled
            Prefix: datalake
            Destination:
              Bucket: !GetAtt rDestinationBucket.Arn
              StorageClass: STANDARD  
      Tags: 
        - Key: "S3-BucketName"        
          Value: !Sub "${pSampleType}-source-bucket-${AWS::AccountId}"
        - Key: "CostCenter"
          Value: "00000"  

  rDestinationBucket:  
    Type: AWS::S3::Bucket
    Properties:       
      BucketName: !Sub "${pSampleType}-destination-bucket-${AWS::AccountId}"            
      VersioningConfiguration: 
        Status: Enabled      
      Tags: 
        - Key: "S3-BucketName"        
          Value: !Sub "${pSampleType}-destination-bucket-${AWS::AccountId}"
        - Key: "CostCenter"
          Value: "00000"  



  #####################################################
  ##################### IAM ROLE ######################
  #####################################################

  rReplicationRole:
    Type: "AWS::IAM::Role"
    Properties:
      RoleName: !Sub "${pSampleType}-role"      
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Action:
              - "sts:AssumeRole"
            Effect: "Allow"
            Principal:
              Service:
                - "s3.amazonaws.com"
      Policies:
        - PolicyName: S3ReplicationPolicy
          PolicyDocument:
            Statement:
              - Effect: Allow
                Action:
                  - "s3:GetObjectVersionForReplication"
                  - "s3:GetObjectVersionAcl"
                  - "s3:GetObjectVersionTagging"
                Resource: !Sub "arn:aws:s3:::${pSampleType}-source-bucket-${AWS::AccountId}/*"
              - Effect: Allow
                Action:
                  - "s3:ListBucket"
                  - "s3:GetReplicationConfiguration"
                Resource: !Sub "arn:aws:s3:::${pSampleType}-source-bucket-${AWS::AccountId}"
              - Effect: Allow
                Action:
                  - "s3:ReplicateObject"
                  - "s3:ReplicateDelete"
                  - "s3:ReplicateTags"
                Resource: !Sub "arn:aws:s3:::${pSampleType}-destination-bucket-${AWS::AccountId}/*"



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Para desplegar la solución simplemente se debe ejecutar el archivo de deploy de la siguiente manera:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bash deploy.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Continuando con el despliegue de la &lt;strong&gt;Solución #2 (Buckets en distintas cuentas)&lt;/strong&gt;, deberemos crear los siguientes archivos:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Archivo de Deploy Source (deploy_source.sh)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;###
STACK_NAME="s3-replication-different-account-source-template"
TEMPLATE_FILE_NAME="s3-replication-different-account-source-template"
###

PROFILE="default"
ARTIFACTORY_BUCKET="**Nombre de un Random Bucket, para subir el tamplate**"

#1) Create Package
aws cloudformation package --template ./$TEMPLATE_FILE_NAME.yaml \
                           --s3-bucket $ARTIFACTORY_BUCKET \
                           --output json &amp;gt; $TEMPLATE_FILE_NAME-packaged-$ENV.yaml \
                           --profile $PROFILE

#2.1) Create Stack (From Package)
    aws cloudformation create-stack --stack-name $STACK_NAME \
                                    --parameters file://./parameters-source.json \
                                    --template-body file://./$TEMPLATE_FILE_NAME-packaged-$ENV.yaml \
                                    --profile $PROFILE \
                                    --region us-east-1 \
                                    --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Archivo de Deploy Target (deploy_destination.sh)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;###
STACK_NAME="s3-replication-different-account-destination-template"
TEMPLATE_FILE_NAME="s3-replication-different-account-destination-template"
###

PROFILE="default"
ARTIFACTORY_BUCKET="**Nombre de un Random Bucket, para subir el tamplate**"

#1) Create Package
aws cloudformation package --template ./$TEMPLATE_FILE_NAME.yaml \
                           --s3-bucket $ARTIFACTORY_BUCKET \
                           --output json &amp;gt; $TEMPLATE_FILE_NAME-packaged-$ENV.yaml \
                           --profile $PROFILE

#2.1) Create Stack (From Package)
    aws cloudformation create-stack --stack-name $STACK_NAME \
                                    --parameters file://./parameters-destination.json \
                                    --template-body file://./$TEMPLATE_FILE_NAME-packaged-$ENV.yaml \
                                    --profile $PROFILE \
                                    --region us-east-1 \
                                    --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Archivo de Parametros Source (parameters-source.json)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[
    {
        "ParameterKey": "pSampleType",
        "ParameterValue": "different-account-replication"
    },
    {
        "ParameterKey": "pDestinationBucketName",
        "ParameterValue": "different-account-replication-destination-bucket-aws-account-id"
    }
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Archivo de Parametros Source (parameters-destination.json)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[
    {
        "ParameterKey": "pSampleType",
        "ParameterValue": "different-account-replication"
    },
    {
        "ParameterKey": "pReplicationRoleArn",
        "ParameterValue": "arn:aws:iam::aws-account-id:role/different-account-replication-role"
    }
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Archivo de Cloudformation (s3-replication-different-account-source-template.yaml)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31
Description: "Same Account - S3 Replication"

Parameters:
  pSampleType:
    Description: S3 Replication Sample Type
    Type: String 
  pDestinationBucketName:
    Description: S3 Destination Bucket
    Type: String 

Resources:

  #####################################################
  #################### S3 BUCKET ######################
  #####################################################
  rSourceBucket:
    #DependsOn: rReplicationRole  
    Type: AWS::S3::Bucket
    Properties:       
      BucketName: !Sub "${pSampleType}-source-bucket-${AWS::AccountId}"            
      VersioningConfiguration: 
        Status: Enabled
      ReplicationConfiguration:
        Role: !GetAtt rReplicationRole.Arn
        Rules:
          - Id: !Sub "${pSampleType}-sample"
            Status: Enabled
            Prefix: datalake
            Destination:
              Bucket: !Sub "arn:aws:s3:::${pDestinationBucketName}"
              StorageClass: STANDARD  
      Tags: 
        - Key: "S3-BucketName"        
          Value: !Sub "${pSampleType}-source-bucket-${AWS::AccountId}"
        - Key: "CostCenter"
          Value: "00000"  

  #####################################################
  ##################### IAM ROLE ######################
  #####################################################

  rReplicationRole:
    Type: "AWS::IAM::Role"
    Properties:
      RoleName: !Sub "${pSampleType}-role"      
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Action:
              - "sts:AssumeRole"
            Effect: "Allow"
            Principal:
              Service:
                - "s3.amazonaws.com"
      Policies:
        - PolicyName: S3ReplicationPolicy
          PolicyDocument:
            Statement:
              - Effect: Allow
                Action:
                  - "s3:GetObjectVersionForReplication"
                  - "s3:GetObjectVersionAcl"
                  - "s3:GetObjectVersionTagging"
                Resource: !Sub "arn:aws:s3:::${pSampleType}-source-bucket-${AWS::AccountId}/*"
              - Effect: Allow
                Action:
                  - "s3:ListBucket"
                  - "s3:GetReplicationConfiguration"
                Resource: !Sub "arn:aws:s3:::${pSampleType}-source-bucket-${AWS::AccountId}"
              - Effect: Allow
                Action:
                  - "s3:ReplicateObject"
                  - "s3:ReplicateDelete"
                  - "s3:ReplicateTags"
                Resource: !Sub "arn:aws:s3:::${pDestinationBucketName}/*"





&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Archivo de Cloudformation (s3-replication-different-account-destination-template.yaml)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31
Description: "Same Account - S3 Replication"

Parameters:
  pSampleType:
    Description: S3 Replication Sample Type
    Type: String

  pReplicationRoleArn:
    Description: Role in the source account for the replication
    Type: String


Resources:

  #####################################################
  #################### S3 BUCKET ######################
  #####################################################

  rDestinationBucket:  
    Type: AWS::S3::Bucket
    Properties:       
      BucketName: !Sub "${pSampleType}-destination-bucket-${AWS::AccountId}"            
      VersioningConfiguration: 
        Status: Enabled      
      Tags: 
        - Key: "S3-BucketName"        
          Value: !Sub "${pSampleType}-destination-bucket-${AWS::AccountId}"
        - Key: "CostCenter"
          Value: "00000"  



  # #####################################################
  # ################## BUCKET POLICY ####################
  # #####################################################

  rDestinationBucketsPolicy:
    Type: AWS::S3::BucketPolicy
    Properties: 
      Bucket: !Sub "${pSampleType}-destination-bucket-policy-${AWS::AccountId}" 
      PolicyDocument:
        Version: "2012-10-17"
          Statement:
            - Effect: "Allow"
              Principal:
                AWS: !Ref pReplicationRoleArn
              Action: 
                - "s3:ReplicateObject"
                - "s3:ReplicateDelete"
              Resource: !Sub "${rDestinationBucket.Arn}/*"
            - Effect: "Allow"
              Principal: 
                AWS: !Ref pReplicationRoleArn
              Action:
                - "s3:List*"
                - "s3:GetBucketVersioning"
                - "s3:PutBucketVersioning"
              Resource: !GetAtt rDestinationBucket.Arn



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Para desplegar la solución simplemente se debe ejecutar el archivo de deploy pero en las 2 cuentas de AWS (Source &amp;amp; Target), de la misma manera que en el escenerio #1. &lt;/p&gt;

&lt;p&gt;Para Finalizar, se debe considerar que la replicación entre objetos puede demorar algunos segundos o varios minutos, ya que va a depender de AWS, sin embargo AWS cuenta con &lt;strong&gt;RTC&lt;/strong&gt;, que básicamente es un SLA, donde asegura que en el lapso de 15 minutos (Máximo) serán replicados los objetos desde el Source al Target. Es clave tener en consideración esta feature cuando se tratan de escenarios de casos de uso críticos donde contar con los objetos a tiempo es necesario. &lt;/p&gt;

&lt;p&gt;Muchas gracias!&lt;br&gt;
Cristian R. Carballo &lt;br&gt;
&lt;a href="https://www.linkedin.com/in/cristianrcarballo/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/cristianrcarballo/&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Ingestando en Near Real Time con Kinesis Data Firehose</title>
      <dc:creator>Cristian Carballo</dc:creator>
      <pubDate>Thu, 28 Dec 2023 12:24:18 +0000</pubDate>
      <link>https://forem.com/criscarba/ingestando-en-near-real-time-con-kinesis-data-firehose-4fji</link>
      <guid>https://forem.com/criscarba/ingestando-en-near-real-time-con-kinesis-data-firehose-4fji</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t2b5az0wre8ul5tzy4k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t2b5az0wre8ul5tzy4k.png" alt="Image description" width="772" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Existen multiples alternativas de ingesta de datos en AWS, gracias a los diversos Servicios que se encuentran disponibles dentro de la consola, sin embargo en varias oportunidades resulta desafiante el decidir cual utilizar para resolver nuesta necesidad/caso de uso. &lt;/p&gt;

&lt;p&gt;En esta publicación nos enfocaremos en resolver escenarios donde nuestro "productor" de datos genera multiples streams constantemente. Estos productores de datos podrian ser por ejemplo: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dispositivos IOT&lt;/li&gt;
&lt;li&gt;Servidores de aplicaciones que generen logs&lt;/li&gt;
&lt;li&gt;VPC Flow Logs&lt;/li&gt;
&lt;li&gt;Telemetría en vehiculos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Para estos casos de usos utilizaremos &lt;strong&gt;Amazon Kinesis Data Firehose&lt;/strong&gt;, el cual es un servicio de ETL que captura, transforma y permite ingestar datos de streaming en Datalakes y/o otros servicios que integran nativamente, como por ej. Open Search o Redshift.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqikbo2tgehzbky9et96.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqikbo2tgehzbky9et96.png" alt="Image description" width="800" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Una de las principales particularidades de Amazon Kinesis Data Firehose, es que permite la ingesta de datos de manera Near Real Time. Esto significa que los streams de datos que hayan sido ingresados al Delivery Stream no seran entregados en su Output hasta que se cumplan una de las siguientes dos premisas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Buffer por Tiempo (&lt;u&gt;Minimo&lt;/u&gt;: 60 segundos | &lt;u&gt;Máximo&lt;/u&gt;: 900 segundos)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Buffer por Size (&lt;u&gt;Minimo&lt;/u&gt;: 1 mb | &lt;u&gt;Máximo&lt;/u&gt;: 128 mb)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cualquiera de las 2 que ocurra primero, se realizará el dump de los datos que contenga el Delivery Stream, en el output que se haya configurado. &lt;/p&gt;

&lt;p&gt;A modo de ejemplo, implementaremos en la consola de AWS la siguiente arquitectura:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flkojpj8eyem6dpixje08.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flkojpj8eyem6dpixje08.png" alt="Image description" width="718" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Donde simularemos un dispositivo de IOT generando streams que serán enviados al Delivery Stream. Al cabo de 1 minuto, los datos serán escritos en S3. &lt;/p&gt;

&lt;p&gt;A Continuación listaré los pasos a realizar:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Abrir a la Consola de AWS&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7em2yvgbwe82rr5mp3u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7em2yvgbwe82rr5mp3u.png" alt="Image description" width="800" height="218"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;En el buscador de Servicios, ingresar &lt;strong&gt;S3&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fusij40oaahwskzj84cpd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fusij40oaahwskzj84cpd.png" alt="Image description" width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Crearemos un Bucket que será el target de nuestro Delivery Stream.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;NOTA&lt;/strong&gt;&lt;/u&gt;: El nombre del Bucket debe ser único. No intentar crear el mismo que la imagen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8limyw8k1vgpestejn7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8limyw8k1vgpestejn7.png" alt="Image description" width="800" height="630"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;En el buscador de servicios, ingresar &lt;strong&gt;Kinesis Data Firehose&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fidlvwt436cntpfdwc7qf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fidlvwt436cntpfdwc7qf.png" alt="Image description" width="800" height="179"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Seleccionaremos &lt;strong&gt;Create Delivery Stream&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr5n7gnop1wp0w2vb3t5i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr5n7gnop1wp0w2vb3t5i.png" alt="Image description" width="800" height="59"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx58u1vbyxeur1tjvoynr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx58u1vbyxeur1tjvoynr.png" alt="Image description" width="800" height="647"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Seleccionamos el Bucket target y le seteamos el buffer minimo&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F95owwalczy459igbtqke.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F95owwalczy459igbtqke.png" alt="Image description" width="800" height="1043"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Tardará unos minutos en crearse el Delivery Stream.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Para poder simular la ingesta de streams, he desarrollado el siguiente script que simula la carga de datos (random) con una estructura fija en el Delivery Stream:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import names
from random import randint
import boto3
import uuid
import time
import json

DeliveryStreamName = 'demo-kdf'
session_dev = boto3.session.Session(profile_name='default')
firehose = session_dev.client('firehose', region_name='us-east-1')


cnt_streams = 10
for i in range(cnt_streams):   
    record = {
      'id': i,
      'name': names.get_first_name(),
      'surname': names.get_last_name(),
      'age': randint(18,80)
    }

    print(record)

    response = firehose.put_record(DeliveryStreamName = DeliveryStreamName,
                                   Record = {'Data': json.dumps(record)})

    time.sleep(.1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Al ejecutarlo se visualizarán por terminal los streams de datos que fueron enviados al delivery stream&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5nt7t8f37y4rzozn50y6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5nt7t8f37y4rzozn50y6.png" alt="Image description" width="590" height="209"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Luego de unos minutos, al verificar en el bucket de S3 (Output) Encontraremos el dump de datos&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmsj9o03tt21d1a3hlo4m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmsj9o03tt21d1a3hlo4m.png" alt="Image description" width="800" height="198"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;El contenido puede visualizarse con S3 Select&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fulrvdlv0i4drjreb0t91.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fulrvdlv0i4drjreb0t91.png" alt="Image description" width="800" height="253"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Muchas Gracias!!&lt;br&gt;
Cristian R. Carballo &lt;a href="https://www.linkedin.com/in/cristianrcarballo/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/cristianrcarballo/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/cristianrcarballo/&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>¿Cómo migrar tu base de datos on premise a la nube de AWS?</title>
      <dc:creator>Cristian Carballo</dc:creator>
      <pubDate>Thu, 11 Aug 2022 19:18:53 +0000</pubDate>
      <link>https://forem.com/criscarba/como-migrar-tu-base-de-datos-on-premise-a-la-nube-de-aws-2956</link>
      <guid>https://forem.com/criscarba/como-migrar-tu-base-de-datos-on-premise-a-la-nube-de-aws-2956</guid>
      <description>&lt;p&gt;En esta publicación voy a estar hablando sobre el servicio de AWS llamado &lt;strong&gt;Database Migration Service (DMS)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;Caracteristicas Principales&lt;/strong&gt;&lt;/u&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;AWS Database Migration Service (DMS) es un servicio de AWS que nos permite la migración de bases de datos a la nube.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Permite realizar la migración de bases de datos On Premise (y cloud) hacia AWS.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DMS es un servicio resiliente a potenciales fallos (highly resilient &amp;amp; self–healing)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A lo largo de la migración de datos la base de datos de origen (Source) se mantiene activa sin interrupciones. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DMS Soporta migraciones de bases de datos de manera:&lt;br&gt;
 &lt;strong&gt;Homogéneas&lt;/strong&gt;: Ej. Caso de uso: PostgreSQL ⇒ PostgreSQL&lt;br&gt;
 &lt;strong&gt;Heterogéneas&lt;/strong&gt;: Ej. Caso de uso: MS Sql Server ⇒ Aurora (Debe utilizarse el SCT (Schema Conversion Tool))&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Los tipos de migraciones pueden ser:&lt;br&gt;
 &lt;strong&gt;Full/Snapshot&lt;/strong&gt;&lt;br&gt;
 &lt;strong&gt;Change Data Capture (CDC)&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;El Schema Conversion Tool (SCT) permite convertir el schema de una BDD de un motor a otro.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Puede ser utilizado entre bases de datos OLTP / OLAP&lt;br&gt;
 Ej. OLTP: Oracle ⇒ PostgreSQL&lt;br&gt;
 Ej. OLAP: Teradata ⇒ Redshift&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3cb7i09yjmmguyjphksp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3cb7i09yjmmguyjphksp.png" alt="DMS1" width="800" height="216"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;Endpoints&lt;/strong&gt;&lt;/u&gt;&lt;br&gt;
Para utilizar DMS es necesario definir un Sourcer y Target Endpoint. A Continuación mostraré los actualmente disponibles:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhv3sbjvvrowfvkegtp9g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhv3sbjvvrowfvkegtp9g.png" alt="DMS2" width="800" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;DEMO&lt;/strong&gt;&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Descripción del escenario&lt;/u&gt;: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;La compañía “Sin Nombre SRL” está comenzando a dar sus primeros pasos en el mundo Cloud. &lt;/li&gt;
&lt;li&gt;Luego de analizar las distintas soluciones y tecnologías disponibles en el mercado, optaron por migrar su infraestructura de Base de datos hacia AWS, utilizando el servicio RDS. &lt;/li&gt;
&lt;li&gt;La actual base de datos es PostgreSQL y está sobre un servidor dedicado el cual requiere mantenimiento (patches/seguridad/OS/etc), el cual debe ser reducido. &lt;/li&gt;
&lt;li&gt;Es necesario que el 100% de los datos actuales sean migrados a la nube, exceptuando algunas entidades de datos. &lt;/li&gt;
&lt;li&gt;Debe replicarse diariamente los cambios ocurridos en el origen.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Arquitectura:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy2jwrar5alk8vpyquac0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy2jwrar5alk8vpyquac0.png" alt="DMS3" width="765" height="176"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;NOTA: Debemos contar con una instancia de Clod9&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Pasos a Realizar:&lt;/u&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Crear las 2 instancias (Source y Target) con un script de CloudFormation.&lt;/li&gt;
&lt;li&gt;Crear una Inbound Rule en el security group de cada Base de datos para abrir el puerto de PostgreSQL.&lt;/li&gt;
&lt;li&gt;Crear instancia de replicación en DMS.&lt;/li&gt;
&lt;li&gt;Instalar Postgres Tools (psql)&lt;/li&gt;
&lt;li&gt;Conectarse a Source DB&lt;/li&gt;
&lt;li&gt;Ejecutar script de creación de tablas en Source DB&lt;/li&gt;
&lt;li&gt;Crear en DMS el Source Endpoint y Target Endpoint.&lt;/li&gt;
&lt;li&gt;Probar conexión a Target DB y validar que se encuentra sin datos&lt;/li&gt;
&lt;li&gt;Crear tarea de replicación (FULL LOAD) y ejecutar. &lt;/li&gt;
&lt;li&gt;Validar que los datos hayan sido replicados en el target.&lt;/li&gt;
&lt;li&gt;Crear tarea de replicación (CDC) y ejecutar. &lt;/li&gt;
&lt;li&gt;Insertar registros nuevos en la tabla de Source DB.&lt;/li&gt;
&lt;li&gt;Validar que los datos hayan sido replicados en el target.&lt;/li&gt;
&lt;li&gt;Eliminar Recursos.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso # 1: Crear las 2 instancias (Source y Target) con un script de CloudFormation.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ejecutar el siguiente comando para posicionarse en el directorio donde se encuentra el archivo bash para realizar el deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;cd /home/ec2-user/environment/Datapath/DMS&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fim64562xhagn9430ai0p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fim64562xhagn9430ai0p.png" alt="DMS6" width="562" height="33"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ejecutar el archivo “deploy.sh” (Creación de Stack CloudFormation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;bash deploy.sh&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ydyxaodkgqmv6c4cpre.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ydyxaodkgqmv6c4cpre.png" alt="DMS7" width="800" height="93"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verificar que se esté ejecutando el stack en Cloudformation
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fagfwifgk2fpsjx60vt4b.png" alt="DSMS99" width="800" height="369"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso # 2: Crear una Inbound Rule en el security group de cada Base de datos para abrir el puerto de PostgreSQL.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ir al servicio de RDS y modificar los security groups para poder abrir los puertos 5432 (PostgreSQL)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0j5j8522g1inj5ey3hcd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0j5j8522g1inj5ey3hcd.png" alt="DMS024" width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;En la parte inferior derecha se puede visualizar el security group de nuestra instancia de BDD. Hacer Click en el link del security group para que nos direccione a las configuraciones&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fab29buhs1kf2r73m7g2u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fab29buhs1kf2r73m7g2u.png" alt="DMS124" width="800" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hacer Click en “&lt;strong&gt;Edit inbound Rules&lt;/strong&gt;” y agregar la misma regla que está en las imágenes de a continuación para abrir el puerto de postgres.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g1f2bk8kxflot68xd3l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g1f2bk8kxflot68xd3l.png" alt="DMS1235" width="800" height="264"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso # 3: Crear instancia de replicación en DMS.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ingresar al servicio &lt;strong&gt;DMS&lt;/strong&gt; y crear una nueva Replication Instance haciendo click en el boton “&lt;strong&gt;Create replicacion instance&lt;/strong&gt;” (boton naranja)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzj2ecuyoyjlsg0h1zubo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzj2ecuyoyjlsg0h1zubo.png" alt="DMS55" width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Ingresar un nombre (no utilizar el de la imagen) que debe ser único. Se recomienda agregarle de sufijo el ID de cuenta de AWS&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dejar los valores por default y establecer:&lt;br&gt;
Allocated Storage: 5 Gb&lt;br&gt;
&lt;u&gt;&lt;em&gt;VPC&lt;/em&gt;&lt;/u&gt;: Dejar la default&lt;br&gt;
&lt;u&gt;&lt;em&gt;Type&lt;/em&gt;&lt;/u&gt;: Single AZ&lt;br&gt;
&lt;u&gt;&lt;em&gt;Publicly Accessible&lt;/em&gt;&lt;/u&gt;: No seleccionado&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5q0jrgwwc7vg2apeprw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5q0jrgwwc7vg2apeprw.png" alt="DMS268" width="800" height="769"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso # 4: Instalar Postgres Tools (psql)&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dentro de la instancia de Cloud9, ejecutar el siguiente comando:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;sudo yum install postgresql -y&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fat9v0owe2cczthddhmf2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fat9v0owe2cczthddhmf2.png" alt="DMS5477" width="677" height="136"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso #5: Conectarse a Source DB&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dentro de la instancia de Cloud9, ejecutar el siguiente comando para conectarnos a la Source DB:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Asegurarse de reemplazar el valor en rojo por el endpoint de la base de datos source:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;psql    -h  [[++SOURCE_ENDPOINT++]] \&lt;br&gt;
       -U postgres \&lt;br&gt;
        -p 5432 \&lt;br&gt;
        -d postgres&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Password es: source_p4assw0rd&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1uxwe5yfbcse70jnurh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1uxwe5yfbcse70jnurh.png" alt="DMSSQL" width="611" height="260"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso #6: Ejecutar script de creación de tablas en Source DB&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;`create table if not exists personas&lt;br&gt;
(&lt;br&gt;
    id            bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,&lt;br&gt;
    nombre        varchar(100),&lt;br&gt;
    apellido      varchar(100),&lt;br&gt;
    telefono      varchar(100)&lt;br&gt;
);&lt;/p&gt;

&lt;p&gt;insert into personas (nombre, apellido, telefono) values ('cristian','carballo','1234');&lt;br&gt;
insert into personas (nombre, apellido, telefono) values ('Juan','Lopez','1234');&lt;br&gt;
insert into personas (nombre, apellido, telefono) values ('Miguel','Garcia','1234');&lt;br&gt;
insert into personas (nombre, apellido, telefono) values ('Roman','Riquelme','1234');&lt;br&gt;
insert into personas (nombre, apellido, telefono) values ('Diego','Maradona','1234');&lt;/p&gt;

&lt;p&gt;create table if not exists marcas&lt;br&gt;
(&lt;br&gt;
    id            bigint GENERATED ALWAYS AS IDENTITY,&lt;br&gt;
    nombre        varchar(100),&lt;br&gt;
    ubicacion     varchar(100)&lt;br&gt;
);&lt;/p&gt;

&lt;p&gt;insert into marcas (nombre, ubicacion) values ('Nike','USA');&lt;br&gt;
insert into marcas (nombre, ubicacion) values ('Reebok','UK');&lt;br&gt;
insert into marcas (nombre, ubicacion) values ('Adidas','Nigeria');&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqzi58uwj8bbypeasxv1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqzi58uwj8bbypeasxv1.png" alt="DMS56278" width="688" height="591"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso #7: Crear en DMS el Source Endpoint y Target Endpoint.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ingresar al servicio DMS y dentro de la opción de Endpoints, crear 2 endpoint:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Source Endpoint ⇒ PostgreSQL Source &lt;br&gt;
Target Endpoint ⇒ PostgreSQL Target&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8blrwcib95r4exk8ao8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8blrwcib95r4exk8ao8.png" alt="SDM" width="800" height="133"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2v7coby5lpwhej58dh6l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2v7coby5lpwhej58dh6l.png" alt="SOURCE" width="723" height="919"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1mlgh2xpold6m5nqkm0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1mlgh2xpold6m5nqkm0.png" alt="TARGET" width="696" height="919"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;*&lt;em&gt;Paso #8: Probar conexión a Target DB y validar que se encuentra sin datos&lt;br&gt;
*&lt;/em&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Dentro de la instancia de Cloud9, ejecutar el comando para conectarnos a la Target DB, de igual manera a lo realizado en paso # 5.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Para validar que la Target BDD esté vacía, ejecutar el siguiente comando:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;SELECT *&lt;br&gt;
FROM pg_catalog.pg_tables&lt;br&gt;
WHERE   schemaname != 'pg_catalog' AND &lt;br&gt;
        schemaname != 'information_schema';&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffru85e75mp9ystturxgg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffru85e75mp9ystturxgg.png" alt="MM" width="800" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso #9: Crear tarea de replicación (FULL LOAD) y ejecutar.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dentro del servicio DMS crear una Migration Task con los siguientes parametros: &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiz0qb3ii38upbd6of0dl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiz0qb3ii38upbd6of0dl.png" alt="DEME" width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso #10: Validar que los datos hayan sido replicados en el target.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ejecutar en el  Target BDD el siguiente comando y ver que se generó la tabla personas y luego accederla:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;SELECT *&lt;br&gt;
FROM pg_catalog.pg_tables&lt;br&gt;
WHERE   schemaname != 'pg_catalog' AND &lt;br&gt;
        schemaname != 'information_schema';&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhi669eycam9xll0u1v7u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhi669eycam9xll0u1v7u.png" alt="DMSK" width="800" height="237"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso #11: Crear tarea de replicación (CDC) y ejecutar.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dentro del servicio DMS crear una Migration Task con los siguientes parametros:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxq6o8rqwsa4b7iaqt13n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxq6o8rqwsa4b7iaqt13n.png" alt="MKL" width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso # 12: Insertar registros nuevos en la tabla de Source DB.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Para este ejemplo insertaremos 2 registros y actualizaremos uno ya existente &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn966ithd7mwmnt5y2dna.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn966ithd7mwmnt5y2dna.png" alt="IUY" width="775" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso #13: Validar que los datos hayan sido replicados en el target.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Como se puede ver en la imagen a continuación, los registros fueron migrados correctamente.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdrmwj5fqr830s2rh5b0g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdrmwj5fqr830s2rh5b0g.png" alt="KIU" width="577" height="230"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Paso #14: Eliminar Recursos&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Para eliminar los recursos realizarlos de la siguiente manera respetando el orden:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Detener la tarea de migración del CDC.&lt;/p&gt;

&lt;p&gt;Esperar que detenga y eliminar tarea de migración de CDC y FULL.&lt;/p&gt;

&lt;p&gt;Esperar a que terminen de eliminarse las 2 tareas de migración y luego eliminar la Replication Instance.&lt;/p&gt;

&lt;p&gt;Eliminar los 2 endpoints (Source &amp;amp; Target) creados en DMS.  &lt;/p&gt;

&lt;p&gt;Eliminar las 2 bases de datos en RDS de manera manual asegurándose de no realizar el backup.&lt;/p&gt;

&lt;p&gt;En CloudFormation eliminar el stack de RDS (StackRDS).&lt;/p&gt;

&lt;p&gt;En CloudFormation eliminar el stack generado por la creación de Cloud9 (aws-cloud9-datapath-xxxxx)&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to use "redshift-data" API with AWS CLI</title>
      <dc:creator>Cristian Carballo</dc:creator>
      <pubDate>Wed, 03 Aug 2022 14:58:18 +0000</pubDate>
      <link>https://forem.com/criscarba/how-to-use-redshift-data-api-with-aws-cli-1hbb</link>
      <guid>https://forem.com/criscarba/how-to-use-redshift-data-api-with-aws-cli-1hbb</guid>
      <description>&lt;p&gt;In this post i will list the steps to use the "&lt;strong&gt;redshift-data&lt;/strong&gt;" thru AWS CLI. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2e7buyrrfqex255h7c5k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2e7buyrrfqex255h7c5k.png" alt="RedshiftDataApi" width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Redshift Data API enables you to communicate from the outside of the cluster and execute statements or get results from it. &lt;/p&gt;

&lt;p&gt;There are at least 2 popular ways to use it: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;BOTO3&lt;/strong&gt; Library &lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift-data.html" rel="noopener noreferrer"&gt;BOT3 Documentation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AWS CLI&lt;/strong&gt; &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/redshift-data/index.html" rel="noopener noreferrer"&gt;CLI Documentation&lt;/a&gt; (In this post we will use this method)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;First of all you need to create a Redshift Cluster. I Highly recommend to create with default configuration, using the minimal instance types and nodes, if you want to test the "&lt;strong&gt;redshift-data&lt;/strong&gt;" api. &lt;/p&gt;

&lt;p&gt;AWS Recommendation is to create within the Cluster the user "&lt;strong&gt;redshift_data_api_user&lt;/strong&gt;" because by default there is an AWS Manage Policy ("&lt;strong&gt;AmazonRedshiftDataFullAccess&lt;/strong&gt;") with the necessary grants to connect from the outside and gives the grants to that user by default. If you want to grant the access to another user, you can get the entire policy document and replace the default user to preffer user. &lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;DEFAULT&lt;/strong&gt;&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;{&lt;br&gt;
            "Sid": "GetCredentialsForAPIUser",&lt;br&gt;
            "Effect": "Allow",&lt;br&gt;
            "Action": "redshift:GetClusterCredentials",&lt;br&gt;
            "Resource": [&lt;br&gt;
                "arn:aws:redshift:*:*:dbname:*/*",&lt;br&gt;
                "arn:aws:redshift:*:*:dbuser:*/redshift_data_api_user"&lt;br&gt;
            ]&lt;br&gt;
}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;IF YOU WANT TO CHANGE DEFAULT USER&lt;/strong&gt;&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;{&lt;br&gt;
            "Sid": "GetCredentialsForAPIUser",&lt;br&gt;
            "Effect": "Allow",&lt;br&gt;
            "Action": "redshift:GetClusterCredentials",&lt;br&gt;
            "Resource": [&lt;br&gt;
                "arn:aws:redshift:*:*:dbname:*/*",&lt;br&gt;
                "arn:aws:redshift:*:*:dbuser:*/&amp;lt;CUSTOM_USER/&amp;gt;"&lt;br&gt;
            ]&lt;br&gt;
}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Once you decide either to use default user or custom user, you need to make sure that your user/role has the policy attached. As you can see in the image below i am using the default Manage policy to my IAM User:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftyazxrfok3clfvxiixgo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftyazxrfok3clfvxiixgo.png" alt="IAM" width="800" height="153"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;Note&lt;/strong&gt;&lt;/u&gt;: It also has the policy to change the password &amp;amp; to access to the Redshift console for query the data. No other IAM permission are added to my example user. &lt;/p&gt;

&lt;p&gt;After adding the permission to the user, it is needed to perform the following steps in order to enable the redshift-data CLI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create the user within the cluster: &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;create user redshift_data_api_user password 'Password1234';&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grant permission for USAGE and CREATE over an example SCHEMA:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;GRANT USAGE on SCHEMA example to redshift_data_api_user&lt;/code&gt;&lt;br&gt;
&lt;code&gt;GRANT CREATE on SCHEMA example to redshift_data_api_user&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Finally we can use the "&lt;strong&gt;redshift-data&lt;/strong&gt;" thru CLI to execute an statement. In this example i will create a sample table within the schema "&lt;strong&gt;example&lt;/strong&gt;":&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;aws redshift-data execute-statement \                                                                                                                     &lt;br&gt;
                  --region AWS-REGION  \&lt;br&gt;
                  --db-user redshift_data_api_user  \&lt;br&gt;
                  --cluster-identifier CLUSTER-ID \&lt;br&gt;
                  --database DATABASE  \&lt;br&gt;
                  --sql "create table example.customer(name &lt;br&gt;
 varchar(10), surname varchar(10))"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;Note&lt;/strong&gt;&lt;/u&gt;: Please note that the region / cluster-identifier and database are values that you need to place based in your cluster creation. It is important to keep the "&lt;strong&gt;db-user&lt;/strong&gt;" param with the "&lt;strong&gt;redshift_data_api_user&lt;/strong&gt;" since that is the one created above and the user that we can getcredentials (policy).&lt;/p&gt;

&lt;p&gt;Hope this example is helpful for you!.&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br&gt;
Cristian.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Create an AWS Cloud9 Environment</title>
      <dc:creator>Cristian Carballo</dc:creator>
      <pubDate>Wed, 03 Aug 2022 14:02:09 +0000</pubDate>
      <link>https://forem.com/criscarba/create-an-aws-cloud9-environment-3a1f</link>
      <guid>https://forem.com/criscarba/create-an-aws-cloud9-environment-3a1f</guid>
      <description>&lt;p&gt;In this post i will explain you how to create in simple steps a Clou9 Environment for delopment within your AWS Account. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7i4vu3z1qorobgz4myc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7i4vu3z1qorobgz4myc.png" alt="Cloud9" width="600" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First things first. What is Cloud9?.&lt;/strong&gt;&lt;br&gt;
AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser. It includes a code editor, debugger, and terminal. Cloud9 comes prepackaged with essential tools for popular programming languages, including JavaScript, Python, PHP, and more, so you don’t need to install files or configure your development machine to start new projects. Since your Cloud9 IDE is cloud-based, you can work on your projects from your office, home, or anywhere using an internet-connected machine. Cloud9 also provides a seamless experience for developing serverless applications enabling you to easily define resources, debug, and switch between local and remote execution of serverless applications. With Cloud9, you can quickly share your development environment with your team, enabling you to pair program and track each other's inputs in real time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Enables you to develop using your web browser. The IDE is really similar than VSCODE.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Docker &amp;amp; GIT are installed by default. You can develop and Test your Dockerfiles using the terminal. In addition, you can clone your own repositories and take advantage of them in the cloud9 env. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F345y4whiiyeyg77t8o0e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F345y4whiiyeyg77t8o0e.png" alt="Docker&amp;amp;Git" width="700" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Linux Base OS, enables you to download packages into your environment. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;By using Cloud9 you can leverage the collaboration with other teammates. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsmb80p7j9wv1laorvswa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsmb80p7j9wv1laorvswa.png" alt="Collaboration" width="800" height="616"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating a Cloud9 Environment&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You need to have an AWS Account. &lt;a href="https://signin.aws.amazon.com/" rel="noopener noreferrer"&gt;https://signin.aws.amazon.com/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Over the Services box, search for "Cloud9"&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhefr7dz2zwmlyiycg8b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhefr7dz2zwmlyiycg8b.png" alt="C9" width="800" height="172"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click in "Create environment"&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkc3lx0r3bxkfksvs6uj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkc3lx0r3bxkfksvs6uj.png" alt="C92" width="185" height="69"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;By Default it is highly recommended to keep "&lt;strong&gt;Environment Type&lt;/strong&gt;" as "&lt;strong&gt;EC2 instance&lt;/strong&gt;" and "&lt;strong&gt;Instance type&lt;/strong&gt;" as "&lt;strong&gt;t2.micro&lt;/strong&gt;" since the "&lt;strong&gt;t2.micro&lt;/strong&gt;" is part of the free tier. It might be enought for running testing, however based on your use case you probably will need to scale out the instance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdm4czdn68ysyb31ozfcq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdm4czdn68ysyb31ozfcq.png" alt="C93" width="800" height="805"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select within which VPC you want to provision your environment. If you want to use the Default VPC, you don't need to change anything, just click in &lt;strong&gt;NEXT&lt;/strong&gt;. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5klp671u00k4qznmkrn8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5klp671u00k4qznmkrn8.png" alt="C94" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Finally you will get a Summary for the Cloud9's configuration. To create it click in "&lt;strong&gt;Create&lt;/strong&gt;"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8g2e0p671a2ta6dafhur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8g2e0p671a2ta6dafhur.png" alt="C95" width="800" height="809"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once it is created you can login into your env by click on "&lt;strong&gt;Open in Cloud9&lt;/strong&gt;"&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpmcvtaoadss48kdzz3cb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpmcvtaoadss48kdzz3cb.png" alt="C96" width="800" height="165"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvugmsc2629jzs8lpvgk9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvugmsc2629jzs8lpvgk9.png" alt="C97" width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, it already has Docker &amp;amp; Git&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9woak83pwxm7w4usfa3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9woak83pwxm7w4usfa3.png" alt="D&amp;amp;G" width="444" height="144"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hope this information is useful for you!!&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br&gt;
Cristian.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Payload Validation in AWS REST API using PYDANTIC</title>
      <dc:creator>Cristian Carballo</dc:creator>
      <pubDate>Wed, 27 Jul 2022 18:18:34 +0000</pubDate>
      <link>https://forem.com/criscarba/payload-validation-in-aws-rest-api-using-pydantic-2c7n</link>
      <guid>https://forem.com/criscarba/payload-validation-in-aws-rest-api-using-pydantic-2c7n</guid>
      <description>&lt;p&gt;This post will show an example of how to validate the payload received in a REST API developed in Python using SAM (Serverless Application Model). In addition i will show how to deploy it locally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PYDANTIC&lt;/strong&gt; provides data validation and settings management using python type annotations. Furthermore &lt;strong&gt;PYDANTIC&lt;/strong&gt; enforces type hints at runtime, and provides user friendly errors when data is invalid.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faibmn9ivxpf0w8us7p4b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faibmn9ivxpf0w8us7p4b.png" alt="SAM" width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pre-Requistes for this example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Setup a &lt;a href="https://dev.to/criscarba/aws-local-serverless-environment-setup-using-aws-sam-6h8"&gt;Local Serverless Environment&lt;/a&gt;(Click to Open).&lt;/li&gt;
&lt;li&gt;Create a GitHub Account and Fork the following Repo: &lt;a href="https://github.com/criscarba/aws_sam_app_public/tree/master/payload-validator" rel="noopener noreferrer"&gt;https://github.com/criscarba/aws_sam_app_public/tree/master/payload-validator&lt;/a&gt; . Finally clone/download it locally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Please make sure you have successfully completed the Pre-Requisites listed above before continue. Reach out me if you have any trouble.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;STEP #1 *&lt;/em&gt;— Open VSCode in the directory where the repository was downloaded or cloned:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjylch8ukbzw7h139fyk9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjylch8ukbzw7h139fyk9.png" alt="1" width="578" height="169"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frifnj18flovyi2ttkwg5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frifnj18flovyi2ttkwg5.png" alt="2" width="542" height="217"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;STEP #2&lt;/strong&gt;— Build the REST API with AWS SAM&lt;/p&gt;

&lt;p&gt;Open a new Terminal in VSCode and make sure that you are sat into the REST API path where SAM can find the “template.yaml” file and run the command:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SAM BUILD&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi5hkab5og079xam8y0xp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi5hkab5og079xam8y0xp.png" alt="3" width="528" height="70"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;STEP #3&lt;/strong&gt; — Start the API Locally&lt;/p&gt;

&lt;p&gt;Within the terminar in VSCode run the command&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SAM LOCAL START-API&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70yqxm30akq2b0vgwacd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70yqxm30akq2b0vgwacd.png" alt="4" width="553" height="72"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvfi1nqlqf33u9c4wjti4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvfi1nqlqf33u9c4wjti4.png" alt="5" width="800" height="108"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will see that the application is running at &lt;a href="http://127.0.0.1:3000" rel="noopener noreferrer"&gt;http://127.0.0.1:3000&lt;/a&gt; (localhost)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;STEP #4:&lt;/strong&gt; — Execute API using POSTMAN&lt;/p&gt;

&lt;p&gt;Download POSTMAN from their official site &lt;a href="https://www.postman.com/" rel="noopener noreferrer"&gt;https://www.postman.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Create a new workspace and test the API:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0aufxo8jcwppbqacbpl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0aufxo8jcwppbqacbpl.png" alt="Postman" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;STEP #5&lt;/strong&gt;: Understand PYDANTIC&lt;/p&gt;

&lt;p&gt;As explained above &lt;strong&gt;PYDANTIC&lt;/strong&gt; provides data validation and settings management using python type annotations and enforces type hints at runtime, and provides user friendly errors when data is invalid.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ud9vv1k1wzyirulfjnx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ud9vv1k1wzyirulfjnx.png" alt="py" width="348" height="209"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Within the “services” folder there is a python code called “event_check.py” that code is a PYDANTIC Class which contains the model of the expected event payload.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5bsjna14o3cpz43qiia.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5bsjna14o3cpz43qiia.png" alt="py2" width="588" height="152"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;STEP #6&lt;/strong&gt;: Test PYDANTIC Behavior&lt;/p&gt;

&lt;p&gt;Now that you understand how to create the model of your event that will allow PYDANTIC to enforce the validation, is time to call the API with with failure payloads and see the outcomes&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Send an unexpected KEY (table_2):&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4sx06j9pc6ke152drtyt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4sx06j9pc6ke152drtyt.png" alt="py5" width="800" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Send less keys than expected (Only table):&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7jdzlnl7k3ojxhj7xl23.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7jdzlnl7k3ojxhj7xl23.png" alt="py6" width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;**&lt;br&gt;
I hope the content is useful for everyone. Thanks a lot!&lt;br&gt;
Cheers!**&lt;/p&gt;

&lt;p&gt;Cristian Carballo&lt;br&gt;
&lt;a href="mailto:cristian.carballo3@gmail.com"&gt;cristian.carballo3@gmail.com&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>AWS Local Serverless Environment Setup using AWS SAM</title>
      <dc:creator>Cristian Carballo</dc:creator>
      <pubDate>Wed, 27 Jul 2022 18:10:01 +0000</pubDate>
      <link>https://forem.com/criscarba/aws-local-serverless-environment-setup-using-aws-sam-6h8</link>
      <guid>https://forem.com/criscarba/aws-local-serverless-environment-setup-using-aws-sam-6h8</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvhisvkv89lwx2b47agm3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvhisvkv89lwx2b47agm3.png" alt="SAM" width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This post is intended to list the steps for setting up your local development environment for creating serverless applications using the AWS SAM CLI.&lt;/p&gt;

&lt;p&gt;I will list below all the pre-requisites that you need to have installed in your local machine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A valid AWS Account&lt;/strong&gt; —To build and deploy our serverless function to the AWS Lambda, so you must have a valid AWS account. If you are new and do not have an account yet, you can navigate to &lt;a href="http://console.aws.amazon.com/" rel="noopener noreferrer"&gt;http://console.aws.amazon.com/&lt;/a&gt; and signup for a new account&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Python&lt;/strong&gt; — My example is developed in Python so i will recommend you to install Python if you want to use my code. You can download the latest version of python by visiting &lt;a href="https://www.python.org/downloads/" rel="noopener noreferrer"&gt;https://www.python.org/downloads/&lt;/a&gt; and install the same based on the operating system you are using. It is important to mention that you can build your serverless functions with any language of your choice. There are quite a few languages supported by AWS Lambda, like Python, C#, Ruby, NodeJS, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AWS CLI&lt;/strong&gt; — In addition to building the serverless apps locally, we will also need to access the AWS services programmatically. This can be achieved by installing the AWS CLI or the command-line interface, using which you can perform many administrative activities on your AWS Account.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AWS SAM CLI&lt;/strong&gt; — In order to develop and test the applications locally, you need to install the AWS SAM CLI on your machine. The AWS SAM CLI will provide an AWS Lambda like execution environment using which you can run your code locally and get the output&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Docker&lt;/strong&gt; — Finally, you also need to get Docker installed on your machine if you want to test the application locally. The AWS SAM CLI will use Docker to mount an image where the execution will be performed. You can install Docker by visiting &lt;a href="https://docs.docker.com/desktop/" rel="noopener noreferrer"&gt;https://docs.docker.com/desktop/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Visual Studio Code&lt;/strong&gt; — For developing the code, we are going to use the Visual Studio Code as the editor. You can download this from &lt;a href="http://code.visualstudio.com/" rel="noopener noreferrer"&gt;http://code.visualstudio.com/&lt;/a&gt;&lt;br&gt;
Once you have installed all the pre-requisites on your machine, you can check the installed versions by running the following commands.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;Python — python –version&lt;br&gt;
AWS CLI — aws –version&lt;br&gt;
AWS SAM CLI — sam –version&lt;br&gt;
Docker — Docker –version&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjviqw80g2sgfu950x5e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjviqw80g2sgfu950x5e.png" alt="versions" width="800" height="185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I hope the content is useful for everyone. Thanks a lot!&lt;br&gt;
Cheers!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cristian Carballo&lt;br&gt;
&lt;a href="mailto:cristian.carballo3@gmail.com"&gt;cristian.carballo3@gmail.com&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Develop your AWS Glue Jobs Locally using Jupyter Notebook</title>
      <dc:creator>Cristian Carballo</dc:creator>
      <pubDate>Wed, 27 Jul 2022 15:28:31 +0000</pubDate>
      <link>https://forem.com/criscarba/develop-your-aws-glue-jobs-locally-using-jupyter-notebook-2pb4</link>
      <guid>https://forem.com/criscarba/develop-your-aws-glue-jobs-locally-using-jupyter-notebook-2pb4</guid>
      <description>&lt;p&gt;This post is mainly intended for professionals who are Data Engineers and use AWS as a cloud provider. It will be covered how to create a local experimental environment step by step.&lt;/p&gt;

&lt;p&gt;As you well know, AWS offers multiple data oriented services, where AWS Glue stands out as a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams. AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog, an ETL engine that automatically generates Python or Scala code, and a flexible scheduler that handles dependency resolution, job monitoring, and retries. AWS Glue is serverless, so there’s no infrastructure to set up or manage.&lt;/p&gt;

&lt;p&gt;AWS Glue is designed to work with semi-structured data. It introduces a component called a dynamic frame, which you can use in your ETL scripts. A dynamic frame is similar to an Apache Spark dataframe, which is a data abstraction used to organize data into rows and columns, except that each record is self-describing so no schema is required initially. With dynamic frames, you get schema flexibility and a set of advanced transformations specifically designed for dynamic frames. You can convert between dynamic frames and Spark dataframes, so that you can take advantage of both AWS Glue and Spark transformations to do the kinds of analysis that you want.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is Jupyter Notebook?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnjc4ecra9us1gavadc8m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnjc4ecra9us1gavadc8m.png" alt="jupyter" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How can we take advantage of Jupyter Notebook&lt;/strong&gt;? Basically inside a Jupyter notebook we can perform all the necessary experimentation of our pipeline (transformations, aggregation, cleansing, enrichment, etc.) and then export it in a Python script (.py) for use in AWS Glue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let’s Get Started!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1)Install Anaconda environment with Python 3.x&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For Windows 64-Bit: &lt;a href="https://repo.anaconda.com/archive/Anaconda3-2020.11-Windows-x86_64.exe" rel="noopener noreferrer"&gt;https://repo.anaconda.com/archive/Anaconda3-2020.11-Windows-x86_64.exe&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For Windows 32-Bit: &lt;a href="https://repo.anaconda.com/archive/Anaconda3-2020.11-Windows-x86.exe" rel="noopener noreferrer"&gt;https://repo.anaconda.com/archive/Anaconda3-2020.11-Windows-x86.exe&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For other OS, find the version here: &lt;a href="https://www.anaconda.com/products/individual" rel="noopener noreferrer"&gt;https://www.anaconda.com/products/individual&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; I recommend to use Python 3.7&lt;/p&gt;

&lt;p&gt;** 2) Install Apache Maven**&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Download Link: &lt;a href="https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-common/apache-maven-3.6.0-bin.tar.gz" rel="noopener noreferrer"&gt;https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-common/apache-maven-3.6.0-bin.tar.gz&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Unzip the file into C:\apache-maven-3.6.0 (Recommended)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkg7av3x6s8a5g2ghlhtq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkg7av3x6s8a5g2ghlhtq.png" alt="zip" width="583" height="249"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create the &lt;strong&gt;MAVEN_HOME&lt;/strong&gt; System Variable (Windows =&amp;gt;Edit The system Environment variables =&amp;gt;Environment Variables). Follow the instructions below:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnkfvb04ilrnq1ti3ow0e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnkfvb04ilrnq1ti3ow0e.png" alt="maven" width="764" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fviuxq6wnorhjo5x8axxa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fviuxq6wnorhjo5x8axxa.png" alt="maven" width="601" height="185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Modify the PATH Environment Variable in order to make the MAVEN_HOME variable visible:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftx13vdux78d2wvevwa7r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftx13vdux78d2wvevwa7r.png" alt="path" width="731" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3) Install Java 8 Version&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Download the product for your OS Version &lt;a href="https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html" rel="noopener noreferrer"&gt;https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“IMPORTANT” During the Installation make sure to set the installation directory C:\jdk For the Java Development Kit and the C:\jre For the Java Runtime. (Otherwise set the directory you have chosen during the installation)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnnbsxgyvlipch1q6p5ta.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnnbsxgyvlipch1q6p5ta.png" alt="java" width="467" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs34u66axj43b59utlddi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs34u66axj43b59utlddi.png" alt="java2" width="469" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create the &lt;strong&gt;JAVA_HOME&lt;/strong&gt; Environment Variable and make sure to add it into the PATH variable. (Similar process than MAVEN_HOME)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98psbh0s7b728f24n1bn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98psbh0s7b728f24n1bn.png" alt="pathvar" width="368" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4) Install the SPARK distribution from the following location based on the glue version:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Glue version 1.0: &lt;a href="https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-1.0/spark-2.4.3-bin-hadoop2.8.tgz" rel="noopener noreferrer"&gt;https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-1.0/spark-2.4.3-bin-hadoop2.8.tgz&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Unzip the File into C:\spark_glue Directory (or choose other directory)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgpmrsy1vrjkxuu6xx3sr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgpmrsy1vrjkxuu6xx3sr.png" alt="spark" width="513" height="385"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create the &lt;strong&gt;SPARK_HOME&lt;/strong&gt; environment Variable and Add it into the PATH Variable. (Similar process than MAVEN_HOME)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faeb9zc69446lg26dmhwx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faeb9zc69446lg26dmhwx.png" alt="sparkhome" width="371" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5) Download the Hadoop Binaries&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Download Link: &lt;a href="https://archive.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz" rel="noopener noreferrer"&gt;https://archive.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create the Environment variable HADOOP_HOME, following the same process of creating the folder into “C:\hadoop”, and adding it into the PATH variable (%HADOOP_HOME%\bin)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It is also required to download the “winutils.exe” that can be downloaded from this link &lt;a href="https://github.com/steveloughran/winutils/blob/master/hadoop-3.0.0/bin/winutils.exe" rel="noopener noreferrer"&gt;https://github.com/steveloughran/winutils/blob/master/hadoop-3.0.0/bin/winutils.exe&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: Make sure that the “winutils.exe” file is within the “bin” folder of the Hadoop directory&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6) Install Python 3.7 in your Anaconda virtual environment&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open an ANACONDA PROMT and Execute the command conda install python=3.7&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F50fo2nokdgy610e8lo53.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F50fo2nokdgy610e8lo53.png" alt="Anaconda" width="538" height="73"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: This Process will take ~30 min&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7) Install “awsglue-local” in your Anaconda virtual environment&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open an ANACONDA PROMT and run the command pip install awsglue-local&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxryt2cymgn11j44obw2f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxryt2cymgn11j44obw2f.png" alt="ana" width="534" height="75"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;8) Download the Pre_Build_Glue_Jar dependencies (REQUIRED FOR CREATING THE INSTANCE OF SPARK SESSION)&lt;br&gt;
*&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Download Link: &lt;a href="https://drive.google.com/file/d/19JlxsFykugjDXeRSK5zwQ8M0nWzpGHdt/view" rel="noopener noreferrer"&gt;https://drive.google.com/file/d/19JlxsFykugjDXeRSK5zwQ8M0nWzpGHdt/view&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Unzip the jar file into the same folder that you will create your .ipynb (Jupyter Notebook). This .jar file is REQUIRED FOR CREATING THE INSTANCE OF SPARK SESSION. Below is an example of how to create the SPARK Session:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm1k0aolqqgw01yly0q9x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm1k0aolqqgw01yly0q9x.png" alt="sp" width="440" height="90"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;9) Confirm that you have installed everything successfully&lt;br&gt;
*&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Open a new Anaconda Prompt and Execute the following commands:&lt;/p&gt;

&lt;p&gt;conda list awsglue-local&lt;/p&gt;

&lt;p&gt;java -version&lt;/p&gt;

&lt;p&gt;mvn -version&lt;/p&gt;

&lt;p&gt;pyspark&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxtopzanq0x5wnb86cdq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxtopzanq0x5wnb86cdq.png" alt="glue" width="552" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggqzke8p2fuiri9mb24y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggqzke8p2fuiri9mb24y.png" alt="javav" width="552" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2dnssc7zck6z8pecy8qd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2dnssc7zck6z8pecy8qd.png" alt="mvnv" width="552" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmiuxx86x6v3va3yan57s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmiuxx86x6v3va3yan57s.png" alt="sparkver" width="552" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10) Once everything is completed, open a Jupyter notebook.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Open a new ANACONDA PROMPT and run the command “PIP INSTALL FINDSPARK” and wait until its completed. Once its completed close the Anaconda prompt. This is only required in most cases for the first time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Re-Open Anaconda Prompt and run the command “jupyter-lab” in order to open a Jupyter notebook.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjnkyuwxy6mw0l0kvjj0u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjnkyuwxy6mw0l0kvjj0u.png" alt="consola" width="543" height="104"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a Jupyter Notebook and execute the following commands (THIS IS ONE TIME ONLY)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ft555elgyeimwwm9vdx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ft555elgyeimwwm9vdx.png" alt="oto" width="166" height="60"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;import findspark&lt;br&gt;
findspark.init()&lt;br&gt;
import pyspark&lt;/p&gt;

&lt;p&gt;You won’t need to execute this code again since this is a typical step for the initial installation of spark. The Findspark library generates somes references into the local machine to link the pyspark library with the bin files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I hope the content is useful for everyone. Thanks a lot!&lt;br&gt;
Cheers!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cristian Carballo&lt;br&gt;
&lt;a href="mailto:cristian.carballo3@gmail.com"&gt;cristian.carballo3@gmail.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/cristianrcarballo/" rel="noopener noreferrer"&gt;LinkedIn Profile&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
