<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rad Huda</title>
    <description>The latest articles on Forem by Rad Huda (@radhuda).</description>
    <link>https://forem.com/radhuda</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F471505%2F7cf05912-77d5-463e-971b-f5aad76e8eed.png</url>
      <title>Forem: Rad Huda</title>
      <link>https://forem.com/radhuda</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/radhuda"/>
    <language>en</language>
    <item>
      <title>Event-Driven Python on AWS</title>
      <dc:creator>Rad Huda</dc:creator>
      <pubDate>Thu, 15 Oct 2020 18:26:16 +0000</pubDate>
      <link>https://forem.com/radhuda/event-driven-python-on-aws-1e41</link>
      <guid>https://forem.com/radhuda/event-driven-python-on-aws-1e41</guid>
      <description>&lt;p&gt;Since gaining my Cloud Practitioner Cert in early September I really wanted to delve into a project to solidify my understanding of the AWS cloud products. That is when I found this challenge online by &lt;a href="https://www.linkedin.com/in/forrestbrazeal/" rel="noopener noreferrer"&gt;Forrest Brazeal&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Overview of Challenge
&lt;/h1&gt;

&lt;p&gt;The challenge topic Event-Driven Python, and its creator is Forrest Brazeal. The challenge is to Automate an ETL processing pipeline for COVID-19 data using Python and cloud services. Forrest has created the perfect template in order for me to build real Python and AWS skills that translate well to helping me solidify by Cloud Practitioner Cert so that I can get a good foundation with AWS tools before I attempt the Solutions Architect Associate Exam. This is all in hopes to build my portfolio strength and help in getting interviews/jobs. &lt;/p&gt;

&lt;h1&gt;
  
  
  The Challenge Steps
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;1. ETL-JOB
&lt;/li&gt;
&lt;li&gt;2. EXTRACTION
&lt;/li&gt;
&lt;li&gt;3. TRANSFORMATION
&lt;/li&gt;
&lt;li&gt;4. CODE-CLEANUP
&lt;/li&gt;
&lt;li&gt;5. LOAD
&lt;/li&gt;
&lt;li&gt;6. NOTIFICATION
&lt;/li&gt;
&lt;li&gt;7. ERROR-HANDLING
&lt;/li&gt;
&lt;li&gt;8. TESTS
&lt;/li&gt;
&lt;li&gt;9. DASHBOARD
&lt;/li&gt;
&lt;li&gt;10. &lt;a href=""&gt;BLOG-POST&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Diagram of my Approach
&lt;/h1&gt;

&lt;p&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fradhuda%2FEventDrivenPython%2Fmaster%2FEventDrivenPython.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fradhuda%2FEventDrivenPython%2Fmaster%2FEventDrivenPython.png"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;h1&gt;
  
  
  ETL-JOB
&lt;/h1&gt;

&lt;p&gt;I scheduled the ETL-JOB to be done daily via CloudWatch event management. This was simply by adding under properties for your Lambda Function in &lt;code&gt;template.yaml&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Events:
        dataDownload:
          Type: Schedule
          Properties:
            Schedule: 'rate(1 day)'
            Description: daily schedule
            Enabled: True
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  EXTRACTION
&lt;/h1&gt;

&lt;p&gt;Using Pandas before this task was very simple to me. Extracting from an online csv was done easily via pd.read function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nytimes_df = pd.read_csv(nytimes_link)
john_hopkins_df = pd.read_csv(john_hopkins_link)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  TRANSFORMATION
&lt;/h1&gt;

&lt;p&gt;Again using pandas in the past the cleaning, joining and filtering were function I had been exposed to in the past.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;john_hopkins_df = john_hopkins_df[john_hopkins_df["Country/Region"]=='US']  
john_hopkins_df.columns = [x.lower() for x in john_hopkins_df.columns]

#Converting to datetime object     
nytimes_df['date'] = pd.to_datetime(nytimes_df['date'], infer_datetime_format=True)     
john_hopkins_df['date'] = pd.to_datetime(john_hopkins_df['date'], infer_datetime_format=True)

#changing index to date
nytimes_df = nytimes_df.set_index(['date'])     
john_hopkins_df = john_hopkins_df.set_index(['date'])

#dropping all columns on john hopkins except recovered 
john_hopkins_df = john_hopkins_df[['recovered']]

#Joining dataframes
df = nytimes_df.join(john_hopkins_df)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  CODE-CLEANUP
&lt;/h1&gt;

&lt;p&gt;I took the above code and created a new file out of it called &lt;code&gt;dataDownload.py&lt;/code&gt;. I created an &lt;code&gt;init.py&lt;/code&gt; file to make the dataDownload into a function I can call in my main &lt;code&gt;app.py&lt;/code&gt; file.&lt;/p&gt;

&lt;h1&gt;
  
  
  LOAD
&lt;/h1&gt;

&lt;p&gt;I originally used DynamoDB to load my converted data. But further into the steps where I wanted to use QuickSight for dash-boarding I saw a few errors arose due to incompatibility. I quickly switched to RDS-postgreSQL. This is where I trully learned how much of a blessing CloudFormation is. I just literally had to switch few lines and I was done.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  PostgreSQL:
    Type: AWS::RDS::DBInstance
    Properties: 
      DBName : CovidDB
      Engine: postgres
      AllocatedStorage: 50
      DBInstanceClass: db.t2.micro
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For loading the data I used sqlalchemy to create the sql-engine for pandas and psycopg2 to load into postgres.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;engine = create_engine(F"postgresql://{db_param['user']}:{db_param['password']}@{db_param['hostname']}:{db_param['port']}/{db_param['dbname']}")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  NOTIFICATION
&lt;/h1&gt;

&lt;p&gt;This step has also become simple due to CloudFormation. CloudFormation really does make things simple. All I had to do was add a few lines into the template.yaml file under Lambda Function properties:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;      EventInvokeConfig:
        DestinationConfig:
          OnSuccess:
            Type: SNS
            TopicArn: !Ref CovidDataUpdateSuccess
          OnFailure:
            Type: SNS
            TopicArn: !Ref CovidDataUpdateFailure

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  ERROR-HANDLING
&lt;/h1&gt;

&lt;p&gt;For Error Handling I created a bunch of try/except rules. For except I had each of the steps return an error code. Finally during lambda handler I added&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  if error != None:
        return {
        "statusCode": 301,
        "body": json.dumps({
            "message": error,
        }),
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  TESTS
&lt;/h1&gt;

&lt;p&gt;I created a few unit files to test the error-handling. I checked for errors when creating dataFrame, as well as loading. I also checked to see if the data was being appended instead of recreating the tables all over again. &lt;/p&gt;

&lt;h1&gt;
  
  
  IaC
&lt;/h1&gt;

&lt;p&gt;I converted my infrastructure to Code via CloudFormation yaml file. I listed everything in the &lt;code&gt;template.yaml&lt;/code&gt; file. &lt;/p&gt;

&lt;h1&gt;
  
  
  SOURCE CONTROL
&lt;/h1&gt;

&lt;p&gt;All of my project files are on my &lt;a href="https://github.com/radhuda/EventDrivenPython" rel="noopener noreferrer"&gt;GitHub-Rad Huda&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  DASHBOARD
&lt;/h1&gt;

&lt;p&gt;I hooked up my Database to QuickSight. &lt;/p&gt;

&lt;p&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fradhuda%2FEventDrivenPython%2Fmaster%2Fcovid%2520dashboard.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fradhuda%2FEventDrivenPython%2Fmaster%2Fcovid%2520dashboard.png"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;h1&gt;
  
  
  CHALLENGES
&lt;/h1&gt;

&lt;p&gt;I faced multiple challenges during this project. But 2 of them stuck out due to how long I spent on trying to resolve these issues. &lt;br&gt;
Problem 1 : DynamoDB and QuickSight integration was an issue.&lt;br&gt;
Solution 1: To solve this I switched over to postgreSQL which was much easier.&lt;/p&gt;

&lt;p&gt;Problem 2: I was using SAM CLI via terminal, but I wanted to get it integrated with VSCode. My error was not something I could even google search. For some reason no matter what I did VSCode would not be able to find SAM CLI. So I had to manually do most of the SAM CLI stuff via terminal. Eventually I figured out the issue. Since I was using VSCode via ssh into my desktop SAM CLI would only work from desktop computer itself. It might be a VSCode bug.&lt;/p&gt;

&lt;h1&gt;
  
  
  FUN-FACT
&lt;/h1&gt;

&lt;p&gt;I am a pharmacist trying to learn cloud tools to be able to become a solution architect for healthcare. With Operation WarpSpeed changing the face of healthcare today, I can see the tomorrow morphing into a much more cloud dependent world for healthcare. There's so much healthcare can develop once it starts integrating into the cloud.&lt;/p&gt;

&lt;h1&gt;
  
  
  CONCLUSION
&lt;/h1&gt;

&lt;p&gt;I am very grateful to Forrest for initiating such an amazing idea. This project was very helpful.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>python</category>
    </item>
  </channel>
</rss>
