<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Jessica Tiwari</title>
    <description>The latest articles on Forem by Jessica Tiwari (@jessica_tiwari_dec39541e2).</description>
    <link>https://forem.com/jessica_tiwari_dec39541e2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3692779%2F546a0cb5-76f1-4792-9192-23d310b8820c.png</url>
      <title>Forem: Jessica Tiwari</title>
      <link>https://forem.com/jessica_tiwari_dec39541e2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/jessica_tiwari_dec39541e2"/>
    <language>en</language>
    <item>
      <title>Automating the EC2 Instance</title>
      <dc:creator>Jessica Tiwari</dc:creator>
      <pubDate>Thu, 12 Feb 2026 04:20:24 +0000</pubDate>
      <link>https://forem.com/jessica_tiwari_dec39541e2/automating-the-ec2-instance-abc</link>
      <guid>https://forem.com/jessica_tiwari_dec39541e2/automating-the-ec2-instance-abc</guid>
      <description>&lt;p&gt;This week, as a part of MS2V Technologies, we embarked on a small yet informative project focusing on automating the E2 web server using user data. The main objective was to launch an EC2 instance that would automatically install NGINX and display a simple message in the browser, eliminating the need for manual intervention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture Design&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The architecture design was straightforward. The user would connect to the public IP through which the EC2 instance was configured. Once connected, NGINX would be installed, and our simple HTML page would load.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Launching the EC2 Instance&lt;/strong&gt;&lt;br&gt;
To initiate this project, I first launched an EC2 instance by selecting the appropriate AMI (Amazon Machine Image). Given that this was a basic project with minimal content, I opted for a T2.micro instance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network and Security Setup&lt;/strong&gt;&lt;br&gt;
While configuring the network and security group, I allowed SSH access from my IP on port 22 and HTTP traffic from anywhere through port 80. This restriction ensures secure access for SSH while allowing the browser to load my web page seamlessly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User Data Script&lt;/strong&gt;&lt;br&gt;
The user data script created for this project was minimal, focusing on displaying a simple text message upon accessing the web server. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating a baked AMI&lt;/strong&gt;&lt;br&gt;
After successfully connecting to the instance and verifying that NGINX was operational, I created a baked AMI to the existing EC2 instance configuration. This process taking a snapshot of the instance's volume, which resulted in a new AMI that can be reused to launch future instances without the need for manual setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
Overall, this project was an excellent opportunity to understand the features of AMIs and the automation capabilities within AWS. The ability to launch instances with pre-installed software and configurations greatly simplifies the process, making rapid deployment of web servers both efficient and reliable.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloudnative</category>
      <category>fellowship</category>
    </item>
    <item>
      <title>Building a Batch Data Pipeline on AWS</title>
      <dc:creator>Jessica Tiwari</dc:creator>
      <pubDate>Mon, 05 Jan 2026 15:19:22 +0000</pubDate>
      <link>https://forem.com/jessica_tiwari_dec39541e2/building-a-batch-data-pipeline-on-aws-2fkb</link>
      <guid>https://forem.com/jessica_tiwari_dec39541e2/building-a-batch-data-pipeline-on-aws-2fkb</guid>
      <description>&lt;p&gt;This is how I approached as a beginner. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Define the Data Flow and Storage&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Created an S3-based data lake with three zones:&lt;/li&gt;
&lt;li&gt;raw for incoming data&lt;/li&gt;
&lt;li&gt;processed for cleaned data&lt;/li&gt;
&lt;li&gt;Enabled versioning on the raw bucket to preserve original data for reprocessing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Catalog and Schema&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Created a Glue Data Catalog database.&lt;/li&gt;
&lt;li&gt;Used Glue Crawlers to scan raw data and infer schemas.&lt;/li&gt;
&lt;li&gt;Enabled automatic partition discovery based on date folders.&lt;/li&gt;
&lt;li&gt;Scheduled crawlers to run after each data ingestion.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;ETL Transformation&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Implemented AWS Glue Jobs using Python Spark.&lt;/li&gt;
&lt;li&gt;Transformation steps:&lt;/li&gt;
&lt;li&gt;Read raw CSV/JSON data from S3.&lt;/li&gt;
&lt;li&gt;Standardize column names and data types.&lt;/li&gt;
&lt;li&gt;Handle null and malformed records.&lt;/li&gt;
&lt;li&gt;Convert data into Parquet format with Snappy compression.&lt;/li&gt;
&lt;li&gt;Enabled job bookmarks to ensure incremental processing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Query and Validation&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Configured Amazon Athena to use the Glue Data Catalog.&lt;/li&gt;
&lt;li&gt;Ran validation queries on processed and curated datasets.&lt;/li&gt;
&lt;li&gt;Used partition filters to minimize scanned data and reduce cost.&lt;/li&gt;
&lt;li&gt;Verified record counts and schema consistency.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Automation&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Triggered Glue Jobs using EventBridge schedules.&lt;/li&gt;
&lt;li&gt;Monitored job execution and failures via CloudWatch.&lt;/li&gt;
&lt;li&gt;Configured SNS alerts for ETL failures.&lt;/li&gt;
&lt;li&gt;Archived older raw data to lower-cost S3 storage classes.&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>How I implemented ETL Pipeline Using AWS Glue</title>
      <dc:creator>Jessica Tiwari</dc:creator>
      <pubDate>Sun, 04 Jan 2026 15:22:24 +0000</pubDate>
      <link>https://forem.com/jessica_tiwari_dec39541e2/how-i-implemented-etl-pipeline-using-aws-glue-nh6</link>
      <guid>https://forem.com/jessica_tiwari_dec39541e2/how-i-implemented-etl-pipeline-using-aws-glue-nh6</guid>
      <description>&lt;p&gt;&lt;strong&gt;- Step 1: I considered:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Spark on EC2 (high control, high ops)&lt;/li&gt;
&lt;li&gt;Databricks &lt;/li&gt;
&lt;li&gt;AWS Glue
I selected AWS Glue to minimize operational complexity.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;- Step 2: Ingestion Strategy&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Data lands in raw/&lt;/li&gt;
&lt;li&gt;Glue Crawlers detect schema changes&lt;/li&gt;
&lt;li&gt;Catalog updated automatically&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;- Step 3: Transformation Logic&lt;/strong&gt;&lt;br&gt;
Glue Jobs perform:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Type casting&lt;/li&gt;
&lt;li&gt;Null handling&lt;/li&gt;
&lt;li&gt;Deduplication&lt;/li&gt;
&lt;li&gt;Format conversion (CSV → Parquet)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;- Step 4: Performance Optimization&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Enabled job bookmarks&lt;/li&gt;
&lt;li&gt;Tuned DPUs&lt;/li&gt;
&lt;li&gt;Used Parquet + Snappy compression&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;- Step 5: Output Strategy&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Processed data written to S3&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>aws</category>
      <category>awsdatalake</category>
      <category>awsglue</category>
    </item>
  </channel>
</rss>
