Forem: Shreyash Singh

Building a Scalable Data Architecture with Apache Tools: A Free and Open-Source Solution

Shreyash Singh — Sun, 13 Apr 2025 20:26:26 +0000

In today's data-driven world, organizations need a robust and scalable data architecture to handle large volumes of data. Apache offers a suite of free and open-source tools that can help build a comprehensive data architecture. In this article, we'll explore how to build a scalable data architecture using Apache tools, highlighting the benefits of using free and open-source software.

Technology

Here's a brief overview of the technologies used in this architecture:

1. Apache NiFi: A data integration tool that ingests data from various sources and writes it to destinations like HDFS and Kafka. NiFi provides a scalable and flexible way to manage data flows.
2. Apache Kafka: A distributed streaming platform that handles high-throughput and provides low-latency, fault-tolerant, and scalable data processing. Kafka is ideal for real-time data processing and event-driven architectures.
3. Apache Flink: A unified analytics and event-driven processing engine that provides real-time processing capabilities. Flink is designed for high-performance and scalability.
4. Apache HDFS: A distributed file system that provides a scalable and reliable way to store large amounts of data. HDFS is designed for big data storage and processing.
5. Apache Spark: A unified analytics engine that provides high-level APIs in Java, Python, and Scala for large-scale data processing. Spark is ideal for batch processing and machine learning.
6. Apache Iceberg: A table format that provides a scalable and efficient way to manage large datasets. Iceberg supports ACID transactions and is designed for big data analytics.
7. Apache Atlas: A data governance and metadata management platform that provides a centralized repository for storing and managing metadata. Atlas enables data governance, data discovery, and data lineage.
8. Apache Airflow: A workflow management platform that schedules and manages workflows. Airflow provides a flexible way to manage dependencies and monitor workflows.
9. Apache Superset: A business intelligence web application that provides data visualization capabilities. Superset supports a variety of data sources and provides an intuitive interface for users.

Architecture

Our data architecture will consist of the following components:

1. Data Ingestion: Apache NiFi will be used to ingest data from various sources, such as logs, APIs, and databases.
2. Real-time Processing: Apache Kafka will stream data to Apache Flink for real-time processing.
3. Batch Processing: Apache Spark will be used for batch processing of large datasets.
4. Data Storage: Apache Iceberg will be used to store processed data in a scalable and efficient table format.
5. Data Governance: Apache Atlas will be used to manage metadata, provide data governance, and enable data discovery and lineage.
6. Data Querying: Apache Spark and Apache Superset will be used to query data stored in Apache Iceberg.
7. Workflow Management: Apache Airflow will be used to schedule and manage workflows.

How it Works

Here's an overview of how the architecture works:

Data is ingested into Apache NiFi from various sources.
NiFi writes data to Apache Kafka for real-time processing and Apache HDFS for batch processing.
Apache Kafka streams data to Apache Flink for real-time processing.
Flink processes data in real-time and writes processed data to Apache Iceberg.
Apache Spark reads data from Apache HDFS, applies transformations and aggregations, and writes processed data to Apache Iceberg.
Apache Atlas manages metadata and provides data governance, data discovery, and data lineage capabilities.
Apache Spark and Apache Superset query data stored in Apache Iceberg.
Apache Airflow schedules and manages workflows, ensuring that data is processed and queried efficiently.

Benefits

The benefits of this architecture include:

1. Scalability: The architecture is designed to handle large volumes of data and scale horizontally.
2. Real-time Processing: Apache Kafka and Apache Flink enable real-time processing and analysis of data.
3. Batch Processing: Apache Spark enables batch processing of large datasets.
4. Data Visualization: Apache Superset provides data visualization capabilities, enabling users to gain insights into their data.
5. Free and Open-Source: All the tools used in this architecture are free and open-source, reducing costs and increasing flexibility.
6. Data Governance: Apache Atlas provides data governance, data discovery, and data lineage capabilities.

Apache offers a suite of powerful tools that can be used to build a scalable and efficient data architecture. By leveraging Apache NiFi, Apache Kafka, Apache Flink, Apache Spark, Apache Iceberg, Apache Airflow, and Apache Superset, organizations can build a comprehensive data architecture that supports both real-time and batch processing, as well as data querying and visualization. And the best part? All these tools are free and open-source, making it an attractive solution for organizations looking to reduce costs and increase flexibility.

Soon I will be starting a project on this architecture.

Building a Self-Optimizing Data Pipeline

Shreyash Singh — Tue, 01 Apr 2025 21:56:47 +0000

As a data engineer, have you ever dreamed of building a data pipeline that autonomously adjusts its performance and reliability in real-time? Sounds like science fiction, right? Well, it's not! In this article, we'll explore the concept of self-optimizing data pipelines.

What is a Self-Optimizing Data Pipeline?

A self-optimizing data pipeline is an automated system that dynamically adjusts its performance and reliability in real-time based on incoming data volume, system load, and other factors. It's like having a super-smart, self-driving car that navigates through the data landscape with ease!

Concept Overview

A self-optimizing data pipeline automates the following:

Performance Optimization: Dynamically adjusts parameters like partition sizes, parallelism, and resource allocation based on incoming data volume and system load.
Error Handling: Detects and resolves pipeline failures without manual intervention (e.g., retrying failed tasks, rerouting data).
Monitoring and Feedback: Continuously monitors system performance and learns from past runs to improve future executions.
Adaptability: Adapts to varying data types, sources, and loads.

Self-Optimization in ETL Process

1. Data Ingestion
• Goal: Ingest data from multiple sources in real time.
• Implementation:
◦ Use Apache Kafka or AWS Kinesis for real-time streaming.
◦ Ingest batch data using tools like Apache NiFi or custom Python scripts.
• Self-Optimization:
◦ Dynamically scale consumers to handle varying data volumes.
◦ Monitor lag in Kafka partitions and scale producers or consumers to maintain low latency.

2. Data Transformation
• Goal: Process and transform data into a usable format.
• Implementation:
◦ Use Apache Spark for batch processing or Apache Flink for stream processing.
◦ Implement transformations like filtering, joining, aggregating, and deduplication.
• Self-Optimization:
◦ Partitioning: Automatically adjust partition sizes based on input data volume.
◦ Resource Allocation: Dynamically allocate Spark executors, memory, and cores using workload metrics.
◦ Adaptive Query Execution (AQE): Leverage Spark AQE to optimize joins, shuffles, and partition sizes at runtime.

3. Data Storage
• Goal: Store transformed data for analysis.
• Implementation:
◦ Write data to a data lake (e.g., S3, HDFS) or data warehouse (e.g., Snowflake, Redshift).
• Self-Optimization:
◦ Use lifecycle policies to move old data to cheaper storage tiers.
◦ Optimize file formats (e.g., convert to Parquet/ORC for compression and query efficiency).
◦ Dynamically adjust compaction jobs to reduce small file issues.

4. Monitoring and Feedback
• Goal: Track pipeline performance and detect inefficiencies.
• Implementation:
◦ Use Prometheus and Grafana for real-time monitoring.
◦ Log key metrics like latency, throughput, and error rates.
• Self-Optimization:
◦ Implement an anomaly detection system to identify bottlenecks.
◦ Use feedback from historical runs to adjust configurations automatically (e.g., retry logic, timeout settings).

5. Error Handling
• Goal: Automatically detect and recover from pipeline failures.
• Implementation:
◦ Build retries for transient errors and alerts for critical failures.
◦ Use Apache Airflow or Prefect for workflow orchestration and fault recovery.
• Self-Optimization:
◦ Classify errors into recoverable and unrecoverable categories.
◦ Automate retries with exponential backoff and adaptive retry limits.

6. User Dashboard
• Goal: Provide real-time insights into pipeline performance and optimizations.
• Implementation:
◦ Use Streamlit, Dash, or Tableau Public to create an interactive dashboard.
• Self-Optimization:
◦ Allow users to adjust pipeline parameters directly from the dashboard.

Tech Stack

1. Ingestion: Kafka, Kinesis, or Apache NiFi.
2. Processing: Apache Spark (for batch), Apache Flink (for streaming), Python (Pandas) for small-scale transformations.
3. Storage: AWS S3, Snowflake, or HDFS.
4. Monitoring: Prometheus, Grafana, or CloudWatch.
5. Workflow Orchestration: Apache Airflow, Prefect.
6. Visualization: Streamlit, Dash, or Tableau Public.

Example Self-Optimization Scenarios

1. Scaling Spark Executors:
◦ Scenario: A spike in data volume causes jobs to run slowly.
◦ Action: Automatically increase executor cores and memory.
2. Handling Data Skew:
◦ Scenario: Some partitions have significantly more data than others.
◦ Action: Dynamically repartition data to balance load.
3. Retrying Failed Jobs:
◦ Scenario: A task fails due to transient network issues.
◦ Action: Retry with exponential backoff without manual intervention.

Read, Like & Share

Shreyash Singh — Fri, 20 Dec 2024 18:57:27 +0000

Data Warehousing Architectures

Shreyash Singh ・ Dec 20

#datascience #dataengineering #architecture #database

Data Warehousing Architectures

Shreyash Singh — Fri, 20 Dec 2024 18:52:50 +0000

Data warehousing architectures are essential frameworks that guide the organization, storage, and retrieval of data in a business environment. They play a crucial role in enabling businesses to make informed decisions by providing a structured way to manage large volumes of data. In this article, we will explore four prominent data warehousing architectures: Inmon Architecture, Kimball Architecture, Data Lake Architecture, and Lambda Architecture.

1. Inmon Architecture

Inmon Architecture, also known as the Corporate Information Factory, is a top-down approach to data warehousing. It involves creating a centralized data warehouse that serves as the single source of truth for the organization. From this central repository, dependent data marts are created to serve specific business needs.

Table Modeling

In Inmon Architecture, the centralized data warehouse is typically modeled using a normalized structure. The focus is on creating a well-organized, comprehensive data repository with minimized redundancy, which resembles an Entity-Relationship (ER) model in a 3NF (Third Normal Form) schema.

Core Tables (Entities): These are highly normalized tables representing core business entities such as Customer, Product, and Order.
Reference Tables: Contain static or slow-changing information, e.g., Product Categories.
Transaction Tables: Store operational transaction details, maintaining integrity and consistency across the data warehouse.

Data Marts

Once the main data warehouse is built, dependent data marts are created. These marts might adopt a denormalized structure for better query performance specific to business functions like marketing or sales.

Data Flow Explanation

Data is extracted from various operational systems and transformed into a consistent format before being loaded into the centralized data warehouse. From there, data marts are created to cater to specific departments or business functions, such as marketing or finance, by extracting relevant data from the central warehouse.

Advantages:

Provides a single, consistent view of the enterprise data.
Ensures data integrity and reduces redundancy.
Scalable and can handle large volumes of data.

Disadvantages:

Can be complex and time-consuming to implement.
Requires significant upfront investment and planning.
Changes in business requirements can be challenging to accommodate.

Companies Using Inmon Architecture

Large enterprises with complex data needs, such as banks and insurance companies, often use Inmon Architecture. Examples include Citibank and American Express.

2. Kimball Architecture

Kimball Architecture, also known as the Data Mart Bus Architecture, is a bottom-up approach. It focuses on creating independent data marts for specific business processes, which are later integrated into a comprehensive data warehouse.

Table Modeling

Kimball Architecture employs dimensional modeling, commonly utilizing Star Schema or Snowflake Schema designs.

Fact Tables: Central to the schema, these tables hold quantitative data for analysis and contain measurements like sales revenue or quantity.
Dimension Tables: These are denormalized tables that provide context to the facts, such as Time, Geography, Product, Customer, etc.

Each data mart is designed to address specific analytical needs and is connected through common dimensions if needed.

Data Flow Explanation

Data is extracted from operational systems and directly loaded into data marts after transformation. These data marts are designed to meet the needs of specific business processes. Over time, these marts are integrated to form a cohesive data warehouse.

Advantages:

Faster implementation as data marts can be developed independently.
Flexibility to adapt to changing business needs.
Easier to manage and maintain.

Disadvantages:

Potential for data inconsistency across different data marts.
Integration of data marts can be complex.
May lead to data redundancy.

Companies Using Kimball Architecture

Organizations that require quick deployment and flexibility, such as retail and e-commerce companies, often use Kimball Architecture. Examples include Amazon and Walmart.

3. Data Lake Architecture

Data Lake Architecture is a modern approach that involves storing raw, unprocessed data in a centralized repository. It allows organizations to store structured, semi-structured, and unstructured data in its native format.

Table Modeling

In a Data Lake Architecture, traditional table structures may not be explicitly used. Instead, data is stored in its raw format using a variety of storage formats, e.g., JSON, CSV, Avro, or even Parquet files if some structuring is needed.

Raw Data Storage: Data is stored as-is from sources without any transformation.
Curated Zones: Sometimes, after initial usage in raw zones, data is processed and moved into a curated zone for more structured querying and reporting.

Advanced indexing or metadata tagging is often used to make sense of the enormous variety of data types and formats within a data lake.

Data Flow Explanation

Data is ingested from various sources and stored in the data lake without transformation. When needed, data is processed and analyzed using various tools and frameworks, allowing for flexible and on-demand data processing.

Advantages:

Highly scalable and cost-effective for storing large volumes of data.
Supports a wide variety of data types and formats.
Facilitates advanced analytics and machine learning.

Disadvantages:

Can become a "data swamp" if not managed properly.
Requires sophisticated tools and skills for data processing.
Data governance and security can be challenging.

Companies Using Data Lake Architecture

Tech giants and data-driven companies, such as Netflix and Facebook, leverage Data Lake Architecture to handle vast amounts of diverse data.

4. Lambda Architecture

Lambda Architecture is designed to handle both batch and real-time data processing. It combines a batch layer for processing large volumes of historical data and a speed layer for real-time data processing.

Table Modeling

Lambda Architecture integrates different data modeling approaches for its batch and speed layers.

Batch Layer: Often modeled similarly to Inmon’s centralized data warehouse, focusing on historical data storage using normalized tables.
Speed Layer: Typically uses a less complex structure, perhaps even schema-less, to focus on storing streaming data in real-time. NoSQL databases are common here, allowing for flexible data modeling.
Serving Layer: Where results from both batch and speed layers are accessed. This could resemble a traditional star schema or even a more flattened table structure for quick data access.

Each approach in Lambda focuses on optimizing for either latency (speed layer) or throughput and accuracy (batch layer).

Data Flow Explanation

Data flows into two layers: the batch layer processes data in large volumes at scheduled intervals, while the speed layer processes data in real-time to provide immediate insights. The results from both layers are merged to provide a comprehensive view.

Advantages:

Provides both historical and real-time insights.
Fault-tolerant and scalable.
Supports complex analytics and machine learning.

Disadvantages:

Complex architecture with multiple layers to manage.
Requires expertise in both batch and real-time processing.
Higher operational costs due to dual processing layers.

Companies Using Lambda Architecture

Organizations that require real-time analytics, such as LinkedIn and Twitter, use Lambda Architecture to process and analyze data efficiently.

In conclusion, each data warehousing architecture has its unique strengths and challenges. The choice of architecture depends on the specific needs and goals of an organization, as well as its data processing requirements and resources.

Azure DevOps Services and Exploring Alternatives

Shreyash Singh — Sun, 23 Jun 2024 08:46:16 +0000

DevOps is a set of practices that combines software development (Dev) and IT operations (Ops) to streamline the software development lifecycle. Azure DevOps is a suite of services offered by Microsoft to support DevOps practices. In this article, we will explore the services offered in Azure DevOps, their alternatives, and their pros and cons.

Services in Azure DevOps:

1. Azure Boards: A project management tool for tracking work items and projects.
2. Azure Repos: A version control system for managing code repositories.
3. Azure Pipelines: A continuous integration and delivery tool for automating build, test, and deployment processes.
4. Azure Test Plans: A testing tool for managing and executing tests.
5. Azure Artifacts: A package management tool for managing and sharing packages.
6. GitHub Advanced Security for Azure DevOps: A security tool for identifying vulnerabilities in code.

Azure DevOps Services Cost:
$30/user/month (basic plan) = $30/month
Annual cost: $360

Pros:

Integrated platform for development, delivery, and collaboration
Automated pipelines and continuous integration/continuous deployment (CI/CD)
Advanced project management and tracking capabilities
Scalable and secure
Integrates with other Azure services

Cons:

Steep learning curve
Can be expensive for large teams or enterprises
Limited customization options for some features
Some users find the UI cluttered

Best Overall:

Offers a comprehensive, integrated platform for development, delivery, and collaboration
Scalable, secure, and flexible

Alternatives to Azure DevOps:

Set 1:

- Jira (Azure Boards alternative)
- GitHub (Azure Repos alternative)
- Jenkins (Azure Pipelines alternative)
- TestRail (Azure Test Plans alternative)
- Artifactory (Azure Artifacts alternative)
- SonarQube (GitHub Advanced Security for Azure DevOps alternative)

Price:

Jira: $7/user/month (standard plan) = $7/month
GitHub: $21/user/month (team plan) = $21/month
Jenkins: free (open-source) = $0/month
TestRail: $25/user/month (premium plan) = $25/month
Artifactory: $22/user/month (pro plan) = $22/month
SonarQube: $15/month (individual plan)
Annual cost: $420

Pros:

Jira offers robust project management capabilities
GitHub provides a popular version control system
Jenkins offers flexible automation options
TestRail provides comprehensive test management
Artifactory offers advanced artifact management
SonarQube provides detailed code analysis

Cons:

Multiple tools require multiple subscriptions and integrations
Can be costly for large teams or enterprises
Steep learning curve for some tools
Limited integration between tools

Best for Large Enterprises:

Offers advanced features, scalability, and security
Supports large teams and complex projects
Set 1 offers a range of robust tools, but requires multiple subscriptions and integrations

Set 2:

- Asana (Azure Boards alternative)
- Bitbucket (Azure Repos alternative)
- Travis CI (Azure Pipelines alternative)
- PractiTest (Azure Test Plans alternative)
- Google Container Registry (Azure Artifacts alternative)
- Veracode (GitHub Advanced Security for Azure DevOps alternative)

Price:

Asana: $13.49/user/month (premium plan) = $13.49/month
Bitbucket: $6/user/month (standard plan) = $6/month
Travis CI: $6/month (pro plan)
PractiTest: $29/user/month (pro plan) = $29/month
Google Container Registry: $6/month (standard plan)
Veracode: $10/month (pro plan)
Annual cost: $343.88

Pros:

Asana offers user-friendly project management
Bitbucket provides a cloud-based version control system
Travis CI offers easy automation options
PractiTest provides comprehensive test management
Google Container Registry offers secure artifact management
Veracode provides advanced code analysis

Cons:

Multiple tools require multiple subscriptions and integrations
Can be costly for large teams or enterprises
Limited customization options for some tools
Some tools have limited features compared to Set 1

*Best for Budget-Constrained Teams: *

Offers a range of tools at a lower cost than Set 1
Ideal for teams with limited budget but still need robust features

Set 3 (Free alternatives):

- Trello (Azure Boards alternative)
- GitLab (Azure Repos alternative)
- CircleCI (Azure Pipelines alternative)
- TestLink (Azure Test Plans alternative)
- Docker Hub (Azure Artifacts alternative)
- CodeCoverage (GitHub Advanced Security for Azure DevOps alternative)

Price:

Trello: free = $0/month
GitLab: free = $0/month
CircleCI: free = $0/month
TestLink: free = $0/month
Docker Hub: free = $0/month
CodeCoverage: free = $0/month
Annual cost: $0

Pros:

Completely free
Trello offers a user-friendly project management system
GitLab provides a comprehensive version control system
CircleCI offers easy automation options
TestLink provides comprehensive test management
Docker Hub offers secure artifact management
CodeCoverage provides detailed code analysis

Cons:

Limited features compared to paid tools
Limited support and documentation
May require additional setup and configuration
Some tools have limited scalability

Best for Small Teams/Startups:

Completely free
Offers a range of tools for project management, version control, automation, testing, and artifact management
Ideal for small teams or startups with limited budget

Based on this calculation, the annual cost for a single user is:

Azure DevOps: $360
Set 1: $420
Set 2: $343.88
Set 3 (Free alternatives): $0

Azure DevOps is the best overall choice for its comprehensive and integrated platform, scalability, and security. However, for small teams or startups with limited budget, Set 3 (Free alternatives) is a cost-effective option. Set 1 is suitable for large enterprises requiring advanced features, while Set 2 is a budget-friendly option for teams needing robust tools.
Using individual services, like Set 1 or Set 2, can be worth the effort if your team has specific needs that aren't met by an integrated platform like Azure DevOps. For example, if your team requires advanced project management, Jira might be a better choice. However, integrating individual services can be time-consuming and costly, and may not provide the same level of seamless integration as an all-in-one platform like Azure DevOps. Ultimately, it's essential to weigh the pros and cons of each option and consider your team's specific needs before making a decision.

DARQ - The Future Technology

Shreyash Singh — Wed, 11 Aug 2021 15:34:59 +0000

Many of us already know a lot of technologies irrespective of time and domain. But everyone wants to know the future technology. The technology I am talking about is not a single technology but a group of technologies which has not been explored yet. Till the date you won’t be able to find any projects on internet using this technology, but a lot of tech experts and giant firms believes that this will be one of the leading technologies in the future. I would like to relate this with a famous movie series AVENGERS. As you already know in Avengers all the superheroes come together and save the world. Similarly, here four super technologies are combining to form a technology known as DARQ. DARQ stands for Distributed Ledger Technologies, Artificial Intelligence, Extended Reality, and Quantum computing.

Distributed ledger Technology (DLT)

In our traditional system many data are stored in centralized system. Nowadays privacy is a myth, companies ensures that their data is safe but still the data breaches happens and users personal, public, business all kinds of data are available on the internet or dark web for sale. Many hackers hack systems or servers and steal the user data and sell it over the internet. To prevent this, we must increase system security. But the normal server all user data stores in a single or multiple servers in one place. If someone hacked one server, it’s easy to get access to others also. DLT overcome all drawbacks which other benefits.
DLT is a digital system used to record transactions in which all information is recorded. There is no central data or a designated admin functionality, rather data is recorded at multiple places. In a DLT system, each node is verified to ensure security and then a record is generated accordingly. This system has revolutionized the recordkeeping technique by practicing the gathering of data with proper communication. All data networks are encrypted, and no one can edit this data while data transmitting. Best example of this is cryptocurrency and blockchain because of this over all communication is encrypted and transition information is not readable.

Artificial Intelligence

Artificial Intelligence is the ability of a machine to think, to react like human in complex situations and work faster and better than humans. A set of instructions known as program is given to machine and this machine use programs to analyze, or study large amounts of information quickly, then they pick actions from among many choices. AI is close to human intelligence as it helps in planning, learning, reasoning, problem-solving, etc.

Extended Reality

Extended Reality is a combination of AR (Augmented Reality), VR (Virtual Reality) and MR (Mixed Reality). Humans are using VR for gaming, watching movies, military training, and medical fields too. AR is something that we use or see daily in our life, the best example is your Instagram and snapchat filter which AR filters. MR is Mixed Reality is combination of AR and VR which produce new environments and visuals, where physical and digital objects meet and interact in real time.

Quantum Computing

You must be aware of how our computer systems handle data storage and handling in individual bits, i.e., 0 and 1 known as binary digits. Unlike the normal computers, a quantum computer data is in 10,01 known as qubits which is a superposition of both 0 and 1(quantum theory). This increases computer capacity and makes computers more efficient. Many industries now use these computers to do their job such as space communication or high-level data processing. Quantum computer based on quantum theory.

Individually these technologies offer a wealth of opportunities for business but when used together it produces a tremendous amount of value. Many big firms agree that the combination of all four DARQ technologies will either change or will bring extensive change to their business. So, what do you think could be a possibility of DARQ technology?

My Journey towards FLUTTER

Shreyash Singh — Thu, 25 Jun 2020 05:05:42 +0000

One year ago, in the month of July 2019 I just entered into the second year of my Engineering with some core subjects. I had started learning a new Programming language called “Java”. Some of my classmates already knew a lot of things like one of them was Microsoft student partner, then there were other students who already had started with new things like Web Development, Data Science etc. Also, in second year the diploma students were supposed to join us who already knew Java and Python. At that time, I thought have I made the right choice by choosing computer Engineering? I was little bit insecure, nervous, depressed and confused about where am I going to land up. Because I had not started anything new or significant and I hadn't decided the domain in which I would go in future as well. In September I had went for IV in a company where they suggested us to do some internship part time with college. So, I decided to join any company as intern after semester 3. But now the big problem was how and when. How would I get an internship by only knowing the C, C++ and Java language? When would I do work as intern? In regular day I had 8 hrs. of college and 4 hrs. of travelling and after that I am completely exhausted. So, I thought I would do a full-time internship only for one month which I will get after my semester 3. So now I had to find how I may get an Internship Opportunity. There are plenty of online platforms which provide internship and jobs. But the probability of getting internship from such platforms were very less. So, I needed some new idea to get an internship.

I worked out and find out a way. So, I had made a list of 25 IT company nearer to my place using google map so I won't have to travel much. I had visited every single company’s website and noted their name, address, email id and phone number. I had shortlisted some 10-15 companies depending on their distance from my residence and based on their ratings on Google. I had shared this idea with my peers but they had not shown much interest in this so I decided to apply for the internship alone. Before my second last exam of semester 3 I sent a mail to all the companies which I had shortlisted. I was expecting few positive replies but it didn’t appear to work at that point of time. I still waited with a hope of better responses. I was done with my exams and yet there was no response. I started losing hope as I was quite upset but then a new strategy struck my head. This time I would call to each and every company and ask whether they have any place for interns in their company. I started making calls, some numbers were out of service, some were wrong. Only two numbers ringed in, but they didn’t receive the call. After few minutes I got a call from the previous call that I had made. I gave my introduction and asked them do they have any place for interns. The Person on the call heard me and asked me to call his partner for it. He forwarded me the number and I had made the call and he asked me to send my resume. So, I mailed him my resume with some lines which would impress him because this was the only opportunity through which I could get an internship.

I was not expecting anything but wished that I get this chance. I even started planning what was supposed to be done if I get rejected from here. I had applied for a Math’s Scholarship and planned to learn Python. After two days I got a call and they asked me to come for interview. I was delighted but tense at the same time as I had not prepared anything for interview. So, I had not slept that night I had revised all my basic concepts of programming language which I was knowing that was C, C++ and Java. I was very clear with OOPS concepts and also prepared some basic of HTML and visual basic which I had learned in my junior college. I had gone through the website of company so I was expecting internship under Web Development. Since I was not having any previous experience or any certification so I didn’t expect for a paid internship. They asked me some basic concepts of java like Function, Class, Data Types etc. and also the OOPs concepts. After a while they said as I don't know JavaScript and HTML much, they can't take me in web development. So, they offered me to learn an upcoming technology called as “Flutter”. There organization had also started with flutter so this was a great opportunity for me. As I needed internship to gain some experience and learn something new, I said yes to them. They showed me my place and ask me to join from tomorrow. I was really happy that day and little bit confident.

On 3 December 2019 that was my first day of internship - I was really nervous but confident as well. I met some new peoples in the company. I started learning the Dart language at 10 am and finished it at 3 pm and also made my own notes. Then Sir had given me some YouTube videos link which I had to watch and make my notes. Next day sir cleared my doubts on dart and took me deep into dart with Asynchronous Programming. He also explained me the normal concepts on programming which I had never learned in college. I think the knowledge which I had gained in that one week was greater than the knowledge that I had in past 6 months. After that he had given me the videos of Udemy courses. First 10-15 days I had only seen the videos and gain only the theoretical Knowledge. I had not written a single line of code in dart.

So, after 15-16 days I had started with my first flutter project. As my laptop was not that good so I was unable to install emulator in it. I have to debug it in my phone. I can't define that happiness of making just a splash screen. Then I made my first project in which I used to take input, display it on next page, store it in Firebase and retrieve back from Firebase. This was my first app. It took me around a week to complete this. I took 1-1.5 days just to pass the data from one screen to other. Anyway, I made my first app and I was really happy with that. Further I just upgrade the app by using route. Then in next version I made a new screen where all the stored data would be displayed. I had used provider in this version. Further I had made a local database for the same app. I had used a library known as Moor in that. Due to the community of flutter was not huge at that time so I came out with problems which were not there on any platform like stack over flow or anything. Even there was no YouTube videos on the topic moor. Since my laptop’s configuration was not that good it used to crash repeatedly once in hour or sometimes once in a day. But I really enjoyed my one month in the company.

I had also decided to work as part time intern after college starts. In January there was an NSS camp, after returning I just started with my new project. As I needed to focus on UI so sir and I decide to make a widget directory in which all the widgets list will be displayed and after clicking on them you will be directed to a new page which will show you its implementation. It became really difficult for me to complete this project on my laptop so on 31 January I bought a new laptop with a great configuration. Now this boosted my speed. I can use emulator now and also the running time was reduced. I completed the project in next one month with around 70 widgets some of them where animation too. So, after completing this project I was really confident.

Now I knew where we have to use which widgets, which widget has which property and stuff like this. I started working on a Food App. At that time that was the best User Interface I had made. I had made four screens with bottom navigator.

In between my college got shut due to COVID 19 pandemic. This was the opportunity for me, now I could give more time to flutter. I made a COVID 19 app by referring YouTube videos and now I was very much confident, I can build any screen without animation.

I started one food app with its UI and completed in 6 days. I have made all the screens of the app, around 20 screens. And this UI was really nice compare to my last food app. Now I am again making a new app which is under progress I hope it would be completed soon. Now, I love Flutter and wish to be the best in the business. However, it’s Great to have flutter and I look ahead to learn new things in it.