<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Gaurav Thalpati</title>
    <description>The latest articles on Forem by Gaurav Thalpati (@gauravthalpati).</description>
    <link>https://forem.com/gauravthalpati</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F684804%2F7940c09e-8535-404b-9d23-198f52cd796b.jpeg</url>
      <title>Forem: Gaurav Thalpati</title>
      <link>https://forem.com/gauravthalpati</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/gauravthalpati"/>
    <language>en</language>
    <item>
      <title>AWS Cloud9 for Data Engineers</title>
      <dc:creator>Gaurav Thalpati</dc:creator>
      <pubDate>Fri, 17 Mar 2023 04:53:37 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-cloud9-for-data-engineers-5epl</link>
      <guid>https://forem.com/aws-builders/aws-cloud9-for-data-engineers-5epl</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally posted on &lt;a href="https://gauravthalpati.substack.com/" rel="noopener noreferrer"&gt;my substack&lt;/a&gt;. Sharing it here with fellow community builders.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I usually do multiple quick PoCs for my day-to-day analysis and RnD work. I often have to install various software, applications, databases, and tools for these. I’ve been using dockers by installing docker desktop on my windows laptop. I have an 8GB RAM laptop which is not the best for this kind of work. That’s why I’ve shifted to AWS Cloud9. It’s an AWS service that can help you to perform your PoC work quickly.&lt;/p&gt;

&lt;p&gt;Here is a quick guide on using Cloud9 for Data Engineering PoCs.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Cloud9?
&lt;/h3&gt;

&lt;p&gt;AWS Cloud9 is a cloud-based IDE for development work. It is powered by EC2 machine, and its size can be selected based on the workload you want to execute.&lt;/p&gt;

&lt;p&gt;It provides IDE to write, execute and debug code and supports Python, JavaScript, and many other languages. The best thing is that it integrates with AWS services like S3, and you can easily download and upload files from/to S3 from Cloud9. It also supports collaborative development and a chat facility with other developers.&lt;/p&gt;

&lt;p&gt;AWS Cloud9 is not a “data” specific service and is not much discussed within the data community. But it is one of the best services that can help you to make DE work much easier and quicker.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why you should use Cloud9 for DE PoCs?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Single interface to perform various activities like creating code, running bash commands, transferring files to S3, running AWS CLI commands, and pushing code to git.&lt;/li&gt;
&lt;li&gt;Easy to install new tools using dockers.&lt;/li&gt;
&lt;li&gt;Provision EC2 instance as per your need. No need to worry about powerful laptops with 16GB+ RAM. ( I generally use m5.xlarge with 16GB RAM)&lt;/li&gt;
&lt;li&gt;Start and Stop without losing your installed software. Pay only when you are using it.&lt;/li&gt;
&lt;li&gt;All good features of EC2 + simplicity of doing all things in one place&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below is a list of some of the DE activities that Cloud9 can be used for&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case #1 | Editing S3 files quickly
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Scenario: You want to create and upload some dummy data to S3.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can easily create a new file in Cloud9 and upload it in just a couple of clicks to your S3 bucket.&lt;/p&gt;

&lt;p&gt;If you want to add more columns to this file or add more records, you can download the file, make changes and upload it back - without leaving your Cloud9 terminal.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Supports multiple AWS Services along with S3&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2a0mg7kca945gamo1vjg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2a0mg7kca945gamo1vjg.png" alt="Supports multiple AWS Services along with S3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Browse the folder where you want to upload the file&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmw1pk7r2bxluqrk6jbr2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmw1pk7r2bxluqrk6jbr2.png" alt="Browse the folder where you want to upload the file"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Upload the file from Cloud9 to S3&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7bozqhrgawsgl8ogtauh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7bozqhrgawsgl8ogtauh.png" alt="Upload the file from Cloud9 to S3"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can also execute simple shell commands to make changes to files. If you love running sed or awk one-liners, you can definitely try it out!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;awk command - my all-time favorite!&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ji5nbv6kpgi5zlhfzih.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ji5nbv6kpgi5zlhfzih.png" alt="awk command - my all-time favorite!"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Use Case #2 | Running AWS CLI commands
&lt;/h3&gt;

&lt;p&gt;You can execute the AWS CLI commands directly from the Cloud9 console without adding any credentials.&lt;/p&gt;

&lt;p&gt;AWS CLI is preinstalled on the Amazon Linux 2 machine.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Validate AWS CLI version&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrlhlyzev7e047o7n083.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrlhlyzev7e047o7n083.png" alt="Validate AWS CLI version"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Scenario: You want to check the IAM users in your account.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can execute the AWS CLI commands for the IAM service.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;List users using IAM CLI&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxpvjvckc0bmajye71nw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxpvjvckc0bmajye71nw.png" alt="List users in this account"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Use Case #3 | Creating Python Scripts
&lt;/h3&gt;

&lt;p&gt;If you want to create quick Python scripts for your DE work, you don’t need to open PyCharm or other editors. You can simply do it in Cloud9 itself.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Change the “Text” to “Python” to switch to Python Compiler&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0d9hnhex7xjz6l0ove5a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0d9hnhex7xjz6l0ove5a.png" alt="Change the “Text” to “Python” to switch to Python Compiler."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Save the file with .py extension and execute it in the console&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsjswmgalc7zm84yftlhw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsjswmgalc7zm84yftlhw.png" alt="Save the file with .py extension and execute it in the console itself.&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Use Case #4 | Running dockers
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Scenario: You want to run Spark quickly and try out some simple commands for learning purposes. There are many options to use - Glue, EMR, Databricks. One of the easiest ways is to run Spark on docker using Cloud9&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;docker is pre-installed if you have selected the Amazon Linux 2 machines while creating the Cloud9 instance&lt;/p&gt;

&lt;p&gt;To confirm if docker is installed, execute the below command.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Validate that docker is installed&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77nlp1gez5uoro59h7ib.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77nlp1gez5uoro59h7ib.png" alt="Validate that docker is installed."&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now you can pull Spark (Python) from the docker hub and start the shell using the commands below.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;docker pull apache/spark-py&lt;br&gt;
docker run -it apache/spark-py /opt/spark/bin/pyspark&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Run the docker, and you are ready with the Spark Shell&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1oy9ptqu4498vnjcaf23.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1oy9ptqu4498vnjcaf23.png" alt="Run the docker, and you are ready with the Spark Shell."&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can follow the same approach for running other tools like Kafka, MySql, and many others.&lt;/p&gt;

&lt;p&gt;This Cloud9 instance does not come with docker-compose, which might be required for other software.&lt;/p&gt;

&lt;p&gt;For installing docker-compose, you can execute the below commands&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose&lt;br&gt;
sudo chmod +x /usr/local/bin/docker-compose&lt;br&gt;
docker-compose version //Validate if its working&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: _For Amazon Linux2, you need to install docker-compose-linux-x86_64&lt;/em&gt;_&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case #5 | Uploading code to git
&lt;/h3&gt;

&lt;p&gt;And finally, when all your work is done, and you want to save your work for future reference, it can be easily uploaded to git. Cloud9 has easy integration with git, and you can quickly pull and push your code to git repos.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Scenario: You want to push the python code you created earlier to your git repo&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Configure git from the left-hand pane using the “Source Control” option. For the first time, clone the repo by providing the repo link. It will identify the changes and mark them accordingly. You can also use manual commands like add, commit, and push.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Configure the git repo in the source control&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2i2df4svzuayjxgxh6dp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2i2df4svzuayjxgxh6dp.png" alt="Configure the git repo in the source control."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Commit the changes, and add the appropriate message&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ql6t2m4i03j32ouk4j0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ql6t2m4i03j32ouk4j0.png" alt="Commit the changes, and add the appropriate message."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Push the change using manual commands or from the UI&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38g6s4dsv3xu29eqn504.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38g6s4dsv3xu29eqn504.png" alt="Push the change using manual commands or from the UI"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Note: You will have to provide your git user name and personal token when pushing the new changes to your git repo&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Validate the changes in your git repo.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Confirm the new files are added to your repo&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbr3zu2ay22azw2erv0d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbr3zu2ay22azw2erv0d.png" alt="Confirm the new files are added to your repo."&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Note: Once you finish your work, close the Cloud9 window; otherwise, the instance will keep running. You can also go to EC2 services on the console and directly stop the Cloud9 instance to save some $&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These are just a few use cases of Cloud9 for DE work. You can explore and leverage other features for your day-to-day RnD work, PoCs, learning, and training activities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;And Cloud9 is not just for doing PoCs or educational work. You can also use it in your actual projects&lt;/strong&gt;. It can help in collaborative coding, chatting with fellow developers, and many more cool features.&lt;/p&gt;

&lt;p&gt;You can try these out, and if you have any comments/suggestions/questions, please let me know.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
