<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: maninekkalapudi</title>
    <description>The latest articles on Forem by maninekkalapudi (@maninekkalapudi).</description>
    <link>https://forem.com/maninekkalapudi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F86522%2Ff1196183-724b-486b-a22c-2433dc8f01b4.jpg</url>
      <title>Forem: maninekkalapudi</title>
      <link>https://forem.com/maninekkalapudi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/maninekkalapudi"/>
    <language>en</language>
    <item>
      <title>Book Review- Fundamentals of Data Engineering</title>
      <dc:creator>maninekkalapudi</dc:creator>
      <pubDate>Mon, 31 Jul 2023 17:02:49 +0000</pubDate>
      <link>https://forem.com/maninekkalapudi/book-review-fundamentals-of-data-engineering-3f98</link>
      <guid>https://forem.com/maninekkalapudi/book-review-fundamentals-of-data-engineering-3f98</guid>
      <description>&lt;p&gt;Hi! Hope youre doing well. Let me walk you through whats going on in my head when I need to explain What is Data Engineering? And What has been going on with it recently?.&lt;/p&gt;

&lt;p&gt;Where do I start? And how to not kill an enthusiast or a friend with my tedious explanation? Should I start from the beginning of the time? Or should start from the beginning of all data? Although I did not live through all the eras of data I wander off onto a lot of things in the field while explaining it.&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;After being in the data field for more than 4 years and witnessing two major industry shifts, Im somewhat comfortable talking about my experience in Big Data era (Hadoop Ecosystem) and Modern Data Engineering (Spark on Databricks, Cloud and Data Modelling).&lt;/p&gt;

&lt;p&gt;But, how does all of this can be contextualized by beginners and experienced folks? Be it for skills or where the industry is headed. A lot of buzzwords and fancy tools are thrown around without the right context IMO.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/"&gt;The Fundamentals of Data Engineering book&lt;/a&gt; provides the rightful context about Data Engineering (DE) from the past, the present and the future. Lets dive in!&lt;/p&gt;

&lt;h1&gt;
  
  
  What is this book about?
&lt;/h1&gt;

&lt;p&gt;The authors mentioned that they are recovering data scientists turned data engineers at the very beginning of the book. Much like the analytics industry, they started with Data Science stuff before moving into data engineering roles.&lt;/p&gt;

&lt;p&gt;In this book, the fundamentals of data engineering are expressed in a less opinionated manner. Everything about the data platform/s, architectures, processes, tools and managed services is put into a context of the broader data engineering lifecycle without arguing which tool is better than the other.&lt;/p&gt;

&lt;p&gt;The book also introduces some of the central ideas to the DE field like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Architecture is a living entity within the data engineering lifecycle and evolves based on the requirements&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use standardized tools and services mostly and build custom tools and services for competitive advantage&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The data maturity of an organization i.e., utilizing the data for business use cases; matters more than just having the latest and greatest tools in the data architecture&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Data Engineering team must collaborate with all the stakeholders from upstream systems to downstream data consumers to understand systems and automate the data serving&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and many more&lt;/p&gt;

&lt;h1&gt;
  
  
  Data Engineering Lifecycle
&lt;/h1&gt;

&lt;p&gt;The bulk of the book covers the different phases within the Data Engineering Lifecycle. As stated in the book, the Data engineering lifecycle comprises stages that turn raw data ingredients into a useful end product, ready for consumption by analysts, data scientists, ML engineers, and others.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KexrA-v8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1690822238904/be7e9cd7-cf60-4714-8bf5-8819a9507241.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KexrA-v8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1690822238904/be7e9cd7-cf60-4714-8bf5-8819a9507241.png" alt="" width="600" height="303"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The stages of the data engineering lifecycle are as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Generation&lt;/strong&gt; - How the data is generated, type of the system, frequency of data generation etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Storage&lt;/strong&gt; - How to store the generated data, choose the right data storage system etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ingestion&lt;/strong&gt; - Types of ingestion, frequency, ETL vs ELT, CDC etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Transformation&lt;/strong&gt; - Transform the data to a required format, tools available for transformation, the role of SQL in data transformations and data modeling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Serving&lt;/strong&gt; - Data served to different stakeholders, Reverse ETL etc.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While performing all the necessary activities to serve the data to the stakeholders, data engineering teams must also take care of the undercurrents like&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Security - Security of the data at rest, while transmission and in some cases while processing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data Management - How data is stored and exposed to various stakeholders&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DataOps - data quality, governance, and security&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data Architecture - data architecture reflects the current and future state of data systems that support an organizations long-term data needs and strategy&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Orchestration - the central hub that coordinates workflows across various systems&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Software Engineering - Coding and DevOps practices. Writing production-grade code and scaling backend systems for data applications&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The authors mention a variety of tools for each topic in the data engineering lifecycle and explain their benefits and tradeoffs on a high level. This gives us enough context to understand the tools but one should go much deeper than whats mentioned to understand them completely.&lt;/p&gt;

&lt;h1&gt;
  
  
  Future of Data Engineering
&lt;/h1&gt;

&lt;p&gt;The data industry has seen a lot of small to large transformations thanks to managed services on the cloud. &lt;a href="https://aws.amazon.com/what-is/batch-processing/#:~:text=Batch%20processing%20is%20the%20method,run%20on%20individual%20data%20transactions."&gt;Batch processing&lt;/a&gt;, &lt;a href="https://aws.amazon.com/compare/the-difference-between-etl-and-elt"&gt;ETL and ELT&lt;/a&gt; have served the data industry well for a long time and its showing its age in recent times.&lt;/p&gt;

&lt;p&gt;The next major transformation around the corner is near &lt;a href="https://www.hpe.com/us/en/what-is/real-time-processing.html#:~:text=Real%2Dtime%20processing%20is%20a,to%20maintain%20real%2Dtime%20insights."&gt;real-time data processing&lt;/a&gt; and serving. It is discussed at length in the book and offers a practical view of this trend.&lt;/p&gt;

&lt;p&gt;One thing that doesnt change for the foreseeable future in data engineering is the Data Engineering Lifecycle. It is fundamental to how the data is generated, stored, processed and served. It will still be relevant in the streaming era as well.&lt;/p&gt;

&lt;p&gt;The below picture illustrates the essence and complexity of data processes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XQ5bAksX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1690822323601/1180e647-66af-4f6e-b792-11319d461dad.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XQ5bAksX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1690822323601/1180e647-66af-4f6e-b792-11319d461dad.png" alt="" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The role of data engineers currently looks like the below picture. There is a lot of definitions for what a data engineer is and what they should do. I think it highly depends on the organization, team and who they are primarily serving the data to.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--t5adH_uh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1690822342927/732caf41-4a1f-4d27-9d6d-3b6ddc1b6c83.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--t5adH_uh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.hashnode.com/res/hashnode/image/upload/v1690822342927/732caf41-4a1f-4d27-9d6d-3b6ddc1b6c83.png" alt="" width="600" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Expect the definitions and roles to change a bit as the industry finds new ways to produce and consume the data in the future&lt;/p&gt;

&lt;h1&gt;
  
  
  Who is this book for?
&lt;/h1&gt;

&lt;p&gt;This book is for data engineers, of course; data professionals who would like to understand where the industry is headed.&lt;/p&gt;

&lt;p&gt;This book offers a very broad view of the data engineering field. If you are an experienced professional who would like to go in-depth on certain topics, it's better to pick books or material specific to those topics.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;If youre a beginner or someone trying to understand whats going on in the data engineering field, this book helps you contextualize a lot of jargon and trends in it well. You can expand your knowledge further on the relevant topics and in demand based on that.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Mani&lt;/p&gt;

</description>
    </item>
    <item>
      <title>What is Data Engineering?</title>
      <dc:creator>maninekkalapudi</dc:creator>
      <pubDate>Sat, 12 Nov 2022 17:21:13 +0000</pubDate>
      <link>https://forem.com/maninekkalapudi/what-is-data-engineering-209h</link>
      <guid>https://forem.com/maninekkalapudi/what-is-data-engineering-209h</guid>
      <description>&lt;h1&gt;
  
  
  Intro
&lt;/h1&gt;

&lt;p&gt;Hello all! Hope you are doing good.&lt;/p&gt;

&lt;p&gt;In the last couple of years, you mightve heard a lot about Data Engineering. It surely gained a lot of buzz in recent times and every company wanted data engineers. Needless to say, the demand for data engineers was at an all-time high.&lt;/p&gt;

&lt;p&gt;But what is data engineering though? And why do we need it? Let's understand all of that in this post with the following topics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is Data Engineering? And why do we need it?&lt;/li&gt;
&lt;li&gt;Responsibilities of a Data Engineering Team&lt;/li&gt;
&lt;li&gt;Challenges&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  What is Data Engineering? And why do we need it?
&lt;/h1&gt;

&lt;p&gt;Simply put, data engineering deals with collecting, storing, processing the data in a data warehouse and serving that data to various stakeholders.&lt;/p&gt;

&lt;p&gt;Data will be generated in a company by different teams at variety of systems like databases, APIs, streaming events, file servers etc. This is the data required by different teams to carry out various analysis.&lt;/p&gt;

&lt;p&gt;Generally, the incoming data is in different formats and sizes from different sources and that data is stored into an archival/analytics system like a data warehouse or a data lake. When the data is in data warehouse, it will be cleaned, transformed into a mutually agreed format between the stakeholders.&lt;/p&gt;

&lt;p&gt;The data engineering team will build and maintain the pipelines and processes like ETL/ELT for data ingestion and data transformations across all the data that is being received in the data warehouse.&lt;/p&gt;

&lt;p&gt;The end goal of a data engineering efforts is analytics ready data or clean data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Why a centralized system like data warehouse?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To ensure that all the companys data is in one single system and any team looking for particular data can easily access it. This ensures that there is no overhead for any team to obtain the required data in a common format that is used across the entire organization&lt;/p&gt;

&lt;p&gt;This also means that there is no duplication of effort across any teams for creating any dataset from multiple sources.&lt;/p&gt;

&lt;h1&gt;
  
  
  Responsibilities of a Data Engineering Team
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Identify data sources, analyze the data and ingest the data into the data warehouse&lt;/li&gt;
&lt;li&gt;Build and maintain the data pipelines for periodic ingestion and processing of the data&lt;/li&gt;
&lt;li&gt;Adding resiliency to the pipelines for failures&lt;/li&gt;
&lt;li&gt;Build and maintain the data warehouse tables and specialized datasets&lt;/li&gt;
&lt;li&gt;Maintaining data quality and integrity&lt;/li&gt;
&lt;li&gt;Last but not least, maintain and scale the data infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Challenges
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;The whole data engineering effort is internal to a company mostly. There is no customer interaction, nor any direct revenue generated here. There will be a number of questions on its viability, credibility and ROI&lt;/li&gt;
&lt;li&gt;There can be a lot of incoming data requests from various teams across entire company&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://maximebeauchemin.medium.com/the-downfall-of-the-data-engineer-5bfb701e5d6b"&gt;Context switching&lt;/a&gt;. The data engineering team handles a good number of pipelines, and they will be taking up further tasks collaborating with different teams to fulfill their data requests. Handling all of these things at once will require context switching and that might affect the quality in the long run.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can find more details, examples on the data engineering and few important questions about it in the below video&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/dEDc25k7Kck"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h1&gt;
  
  
  Resources
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=VdR2WxQNnwg"&gt;The Harsh Reality of Being a Data Engineer - YouTube&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=goT7gN1lwBI"&gt;Why Are Data Teams Still Struggling to Answer Basic Questions - YouTube&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=4NXzeZYaZqQ"&gt;Bloomberg Doesn't Understand My Job (As An Ex-Meta Data Engineer) - Triggered Data Guy - YouTube&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>A Typical Data Pipeline</title>
      <dc:creator>maninekkalapudi</dc:creator>
      <pubDate>Tue, 01 Nov 2022 06:33:02 +0000</pubDate>
      <link>https://forem.com/maninekkalapudi/a-typical-data-pipeline-2717</link>
      <guid>https://forem.com/maninekkalapudi/a-typical-data-pipeline-2717</guid>
      <description>&lt;h1&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Hello people, Hope you are doing well.&lt;/p&gt;

&lt;p&gt;As data engineers, we build data pipelines to collect data from different source systems and place it in an analytics system i.e., a data warehouse/data lake. The data is usually sourced systems like a database, web events, or an API and etc.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;In this post, we will talk about&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is a data pipeline?&lt;/li&gt;
&lt;li&gt;Stages of a data pipeline&lt;/li&gt;
&lt;li&gt;What is ETL and ELT?&lt;/li&gt;
&lt;li&gt;What is the difference between data warehouse and data lake?&lt;/li&gt;
&lt;li&gt;Why do we need a data pipeline?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  1. &lt;strong&gt;&lt;em&gt;What is a data pipeline?&lt;/em&gt;&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We know that a data pipeline collects data from different source systems and moves it to the analytics system. Let's improvise that definition a bit&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;A data pipeline is a series of interconnected systems that passes data in only one direction, i.e., from source to serving layer with increasing order of clarity and value in the data&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The data received from a source may contain duplicate records, test records and records that are problematic. Data pipeline should be designed in such a way that it eliminates all these issues and only then is the data moved from raw to staging to serving thus increasing the clarity and value in it&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Stages of a data pipeline
&lt;/h2&gt;

&lt;p&gt;A typical data pipeline will have the stages below (refer above picture).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sources&lt;/li&gt;
&lt;li&gt;Raw&lt;/li&gt;
&lt;li&gt;Staging&lt;/li&gt;
&lt;li&gt;Serving&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The below picture shows the different stages of a data pipeline&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_ddcueQU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1667283797078/bcbdThrWT.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_ddcueQU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1667283797078/bcbdThrWT.png" alt="Untitled.png" width="880" height="557"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;Data Sources&lt;/em&gt;&lt;/strong&gt; : Where data is generated, recorded or obtained for the data pipeline. For example, a database, an SFTP file server, an API and etc. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;Raw layer&lt;/em&gt;&lt;/strong&gt; : It is the first level of data storage in the data warehouse. This is an archival storage layer and the data stored in it will not be modified at any time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;Staging/Transform layer&lt;/em&gt;&lt;/strong&gt; - It is the second level of data storage in the data warehouse. The data in this layer, which is sourced from raw layer, is cleaned, transformed into a certain format and then stored in various staging tables.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;Serving layer&lt;/em&gt;&lt;/strong&gt; : This is the aggregated data layer, and the data will be sourced from different staging tables. For example, average amount spent by the customers over the years&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  3 &lt;strong&gt;. What is ETL and ELT&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;ETL stands for Extract, Transform and Load. ETL/ELT is a process to get the data into an analytical system like a data warehouse or data lake, preferably on a schedule.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;ETL&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In ETL, the data is extracted from a source, transformed into final format and loaded into DWH table. This is well suited when the data is in a database where the data needs minimal changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;ELT&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In ELT, the data is extracted from a source, loaded into the raw layer. When the transformations are defined in the future, the data is transformed readily and loaded into the final tables in the data warehouse.&lt;/p&gt;

&lt;p&gt;ELT is well suited when the data comes from different sources and the transformations for all the data are not defined completely. The data is transformed when they are available, and the subsequent staging table is modified accordingly.&lt;/p&gt;

&lt;h2&gt;
  
  
  4 &lt;strong&gt;. What is the difference between data warehouse and data lake?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;| Data Warehouse | Data Lake |&lt;br&gt;
| Stores processed data | Stores raw data |&lt;br&gt;
| Transformations are fully defined | Transformations are not fully defined |&lt;br&gt;
| Stores structured data in tabular format in a data warehouse tables | Stores the structured, semi-structured and unstructured data in raw form |&lt;br&gt;
| More complicated and costly to make changes to the tables | Highly accessible and quick to update |&lt;br&gt;
| File formats like parquet, ORC, delta and etc. are used | File formats like CSV, text files, parquet files, PDFs etc. are used |&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;4. Why do we need a data pipeline?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Large volumes of data come in from different sources (Apps, web events, transactions, images/videos, telemetry data). These data are of different types, and sizes&lt;/p&gt;

&lt;p&gt;These characteristics i.e., volume, variety, velocity is attributed to Big Data. To process the big data in a consistent manner while serving the data to an entire organization is a huge challenge. Data pipelines are built by data engineers to solve this problem.&lt;/p&gt;

&lt;p&gt;Once the data is processed, it is stored in a centralized data repository like a data warehouse. This ensures that the data is not siloed and anyone with the right access can always access the data and perform the analysis.&lt;/p&gt;

&lt;p&gt;Data pipelines can also ensure data quality at a scale. Any checks that need to be performed on the data can be applied on the transformation stage ensuring quality.&lt;/p&gt;

&lt;p&gt;Since we also maintain the raw data, the pipelines will be made resilient of failures, any data loss or corruption can be eliminated using them by reprocessing the existing data.&lt;/p&gt;

&lt;p&gt;Check out the video on the same topic:&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/0TYxyAqPZto"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://www.talend.com/resources/data-lake-vs-data-warehouse/#:~:text=Data%20lakes%20and%20data%20warehouses,processed%20for%20a%20specific%20purpose."&gt;Data Lake vs Data Warehouse: Key Differences | Talend&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.mongodb.com/databases/data-lake-vs-data-warehouse-vs-database"&gt;Databases Vs. Data Warehouses Vs. Data Lakes | MongoDB&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>Process management with Linux CLI</title>
      <dc:creator>maninekkalapudi</dc:creator>
      <pubDate>Sat, 17 Sep 2022 12:27:56 +0000</pubDate>
      <link>https://forem.com/maninekkalapudi/process-management-with-linux-cli-2e0</link>
      <guid>https://forem.com/maninekkalapudi/process-management-with-linux-cli-2e0</guid>
      <description>&lt;p&gt;Hello! In my last post I have written about &lt;a href="https://maninekkalapudi.com/permissions-in-linux"&gt;permissions in linux&lt;/a&gt;. In this post we will explore about &lt;strong&gt;processes in linux&lt;/strong&gt; and how to manage them from cli. Let's go!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Topics covered in this post:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Intro to modern systems&lt;/li&gt;
&lt;li&gt;What is a process?&lt;/li&gt;
&lt;li&gt;Processes in Linux&lt;/li&gt;
&lt;li&gt;Interacting with processes in cli&lt;/li&gt;
&lt;li&gt;Signals&lt;/li&gt;
&lt;li&gt;Shutting down the system with cli&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  1. Intro to modern systems
&lt;/h2&gt;

&lt;p&gt;Many modern systems are usually multitasking, meaning they can perform more than one task at once or at least they pretend to do this so well. In reality, the kernel in the operating system rapidly switches from one process to another and provides an impression that the system is multitasking.&lt;/p&gt;

&lt;p&gt;This is true for linux as well. When we switch from a text document to a terminal, the linux kernel also switches the underlying processes as well. Switching a process means allotting the CPU time execution cycle and resources like memory to the processes&lt;/p&gt;

&lt;h2&gt;
  
  
  2. What is a process?
&lt;/h2&gt;

&lt;p&gt;A program or a script is stored in a file and a process is nothing but a program in motion. Whenever we execute a program, the linux kernel has to allocate certain resources like CPU and memory to that program. The kernel also tracks the execution by assigning an ID called PID or Process ID.&lt;/p&gt;

&lt;p&gt;This is shown graphically in operating systems like windows as below&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cu1YPJiP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416108171/F6OUQIN6D.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cu1YPJiP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416108171/F6OUQIN6D.png" alt="Untitled.png" width="880" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Processes in Linux
&lt;/h2&gt;

&lt;p&gt;When a linux system boots up, the kernel starts a few processes using &lt;code&gt;init&lt;/code&gt;. &lt;code&gt;init&lt;/code&gt; is the first program that is launched and it also launches a series of shell scripts (located in &lt;code&gt;/etc&lt;/code&gt;) called &lt;strong&gt;&lt;em&gt;init scripts&lt;/em&gt;&lt;/strong&gt; which start all the system services. The following image shows these init scripts in my system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9MFPPDOr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416139531/SYygkliop.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9MFPPDOr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416139531/SYygkliop.png" alt="Untitled 1.png" width="880" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Many of these services will run as &lt;strong&gt;deamon programs&lt;/strong&gt; , programs that run in the background and they generally dont have any UI or require user interaction.&lt;/p&gt;

&lt;p&gt;Even when we just login to the system and not necessarily perform any tasks, it manages few processes to keep the system up.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;init&lt;/code&gt; process here becomes the &lt;strong&gt;&lt;em&gt;parent&lt;/em&gt;&lt;/strong&gt; or " &lt;strong&gt;&lt;em&gt;grandparent process&lt;/em&gt;&lt;/strong&gt;" to all the process launched through it. The processes launched by other processes are called &lt;strong&gt;&lt;em&gt;child processes&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Interacting with processes in cli
&lt;/h2&gt;

&lt;p&gt;The following commands are widely used in monitoring and managing the processes in linux.&lt;/p&gt;

&lt;p&gt;a. &lt;code&gt;ps&lt;/code&gt;- provides snapshots of all the processes running to the terminal&lt;/p&gt;

&lt;p&gt;b. &lt;code&gt;top&lt;/code&gt;- provides a dynamic view of all the processes running on the system within the terminal. The output looks similar to the GUI of a modern task manager&lt;/p&gt;

&lt;p&gt;c. &lt;code&gt;&amp;lt;command/script&amp;gt; &amp;amp;&lt;/code&gt;- runs the process in the background&lt;/p&gt;

&lt;p&gt;d. &lt;code&gt;jobs&lt;/code&gt;- shows the list of processes running in the background&lt;/p&gt;

&lt;p&gt;e. &lt;code&gt;fg %&amp;lt;job_number&amp;gt;&lt;/code&gt;- returns the process to the foreground&lt;/p&gt;

&lt;p&gt;a. &lt;code&gt;**ps&lt;/code&gt; command: **&lt;/p&gt;

&lt;p&gt;It is the most commonly used command to view processes for the user. It shows the snapshot of the processes running on the system at that time. The result looks like the below image&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--99yOpAEE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416160519/SlM7AED7m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--99yOpAEE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416160519/SlM7AED7m.png" alt="Untitled 2.png" width="378" height="138"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;PID&lt;/em&gt;&lt;/strong&gt; or Processes ID, is the number assigned by the kernel for that process to track resources allotted. Ex: CPU time (TIME) and memory. &lt;strong&gt;&lt;em&gt;TTY&lt;/em&gt;&lt;/strong&gt; or teletype, refers to the controlling terminal for the process. " &lt;strong&gt;&lt;em&gt;TIME&lt;/em&gt;&lt;/strong&gt;", the amount of CPU time consumed by the process.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ps aux&lt;/code&gt; command will display the process from all the users(&lt;code&gt;a&lt;/code&gt;) that can be attached to any terminal(&lt;code&gt;x&lt;/code&gt;) along with its user/owner info(&lt;code&gt;u&lt;/code&gt;) &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;b. &lt;code&gt;**top&lt;/code&gt; (table of processes) command: **&lt;/p&gt;

&lt;p&gt;It displays a real-time view of the running processes, their resource consumption and also displays kernel-managed tasks. The output is refreshed every 3 seconds by default.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CdUmAdGN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416217439/V2uDFil9a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CdUmAdGN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416217439/V2uDFil9a.png" alt="Untitled 4.png" width="880" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To exit the &lt;code&gt;top&lt;/code&gt; command output prompt and get back to the terminal, press &lt;code&gt;q&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;c. &lt;code&gt;&amp;lt;command/script&amp;gt; &amp;amp;&lt;/code&gt; command&lt;/p&gt;

&lt;p&gt;When we run a program with GUI like notepad (gedit) from the cli, it will open a window and the terminal will be busy until the window is closed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--v1USXD-2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416237725/XnUkPknVB.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--v1USXD-2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416237725/XnUkPknVB.png" alt="Untitled 5.png" width="880" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Any command/script that is suffixed with &lt;code&gt;&amp;amp;&lt;/code&gt; operator will run in the background and the terminal is available for the user. For example, &lt;code&gt;gedit &amp;amp;&lt;/code&gt; command will launch the program and terminal will be available to the user right after it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Mditstwm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416252454/ESZZfM4q7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Mditstwm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416252454/ESZZfM4q7.png" alt="Untitled 6.png" width="880" height="469"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice that there is a number printed on the terminal after the command. It is the PID assigned a shell feature called &lt;strong&gt;&lt;em&gt;job contol&lt;/em&gt;&lt;/strong&gt; which shows the PID of the process.&lt;/p&gt;

&lt;p&gt;d. &lt;code&gt;jobs&lt;/code&gt; command&lt;/p&gt;

&lt;p&gt;It will show the list of background processes/jobs that are launched from the terminal as shown below&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--t1WEHmXF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416282536/w9KSUPyEj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--t1WEHmXF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416282536/w9KSUPyEj.png" alt="Untitled 7.png" width="661" height="98"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;ps&lt;/code&gt; command will also show info about the above process&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mOLHeDj0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416298994/thYI05Mj3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mOLHeDj0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416298994/thYI05Mj3.png" alt="Untitled 8.png" width="600" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e. &lt;code&gt;fg %&amp;lt;job_number&amp;gt;&lt;/code&gt; command&lt;/p&gt;

&lt;p&gt;Our process (job id 1) is running in the background and any process running in the background is immune from terminal keyboard input, including any attempt to interrupt it with Ctrl-c. &lt;code&gt;fg %&amp;lt;job_number&amp;gt;&lt;/code&gt; command will bring the process to the foreground&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--v0Hk9ouC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416339173/2qilrZv3Q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--v0Hk9ouC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416339173/2qilrZv3Q.png" alt="Untitled 9.png" width="640" height="202"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we press &lt;code&gt;Ctrl+C&lt;/code&gt;, the process will terminate&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Z0UF6bCL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416357122/JMvkfPQGz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Z0UF6bCL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416357122/JMvkfPQGz.png" alt="Untitled 10.png" width="626" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Signals
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;kill&lt;/code&gt; command is used to kill the processes. Heres an example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JrcbbFAT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416390172/rMdRnuCmy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JrcbbFAT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1663416390172/rMdRnuCmy.png" alt="Untitled 11.png" width="719" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;gedit &amp;amp;&lt;/code&gt; command will launch the program in the background and our PID is available on the terminal. Next, we use &lt;code&gt;kill &amp;lt;PID&amp;gt;&lt;/code&gt; command to terminate the process.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;kill&lt;/code&gt; command here doesnt actually kill the program, rather it will send a signal to the process to terminate. This gives the process to save the work in progress and the processes will also listen to these signals.&lt;/p&gt;

&lt;p&gt;When the process is running on the foreground, &lt;code&gt;CTRL+C&lt;/code&gt; and &lt;code&gt;CTRL+Z&lt;/code&gt; will send a signal to interrupt and terminate the process respectively.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;killall&lt;/code&gt; command will terminate multiple processes with same or matching name or by username.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;killall -u &amp;lt;username&amp;gt;&lt;/code&gt;- will kill all the processes under the username provided&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;killall name&lt;/code&gt;- will kill all the processes which matches the provided name&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the above example, multiple instances of &lt;code&gt;gedit&lt;/code&gt; program is launched in the background and will &lt;code&gt;killall&lt;/code&gt; command, we can terminate all those instances at once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; If one user wants to terminate the processes that dont belong to them, they need superuser privileges.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Shutting down the system with CLI
&lt;/h2&gt;

&lt;p&gt;Yes, we can do it! Shutting down a system involves orderly termination of all the processes in the system. It also requires admin privileges to perform this action.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sudo reboot&lt;/code&gt;- restart the system&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sudo shutdown -h now&lt;/code&gt;- shuts down the system without any delay&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sudo shutdown -r now&lt;/code&gt;- reboots the system without any delay&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;shutdown&lt;/code&gt; command options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;-h&lt;/code&gt;- specifies shutting down the system&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;now&lt;/code&gt;- The time string may either be in the format "hh:mm" for hour/minutes (24h format) specifying the time to execute the shutdown at. We can also specify minutes with &lt;code&gt;+m&lt;/code&gt; in place of now. Example, +0 means now and by default the values is +1 when not specified in the command.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Process management is a task that is usually maintained by the sysadmins and devops engineers to name a few. This helps in maintaining the health of the machines or servers and resource monitoring as well. Managing a linux system using cli is effective and swift.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://linuxcommand.org/tlcl.php"&gt;Linux Command Line Books by William Shotts&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.redhat.com/sysadmin/linux-command-basics-7-commands-process-management"&gt;Linux Command Basics: 7 commands for process management | Enable Sysadmin (redhat.com)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://unix.stackexchange.com/questions/106847/what-does-aux-mean-in-ps-aux"&gt;linux - What does aux mean in &lt;code&gt;ps aux&lt;/code&gt;? - Unix &amp;amp; Linux Stack Exchange&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;[How to Use the top Command in Linux (phoenixnap.com)](&lt;a href="https://phoenixnap.com/kb/top-command-in-linux#:%7E:text=The%20top%20(table%20of%20processes,including%20CPU%20and%20memory%20usage.)"&gt;https://phoenixnap.com/kb/top-command-in-linux#:~:text=The%20top%20(table%20of%20processes,including%20CPU%20and%20memory%20usage.)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opensource.com/article/18/9/linux-commands-process-management"&gt;8 Linux commands for effective process management | Opensource.com&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>Permissions in Linux</title>
      <dc:creator>maninekkalapudi</dc:creator>
      <pubDate>Mon, 18 Jul 2022 04:54:48 +0000</pubDate>
      <link>https://forem.com/maninekkalapudi/permissions-in-linux-1k68</link>
      <guid>https://forem.com/maninekkalapudi/permissions-in-linux-1k68</guid>
      <description>&lt;p&gt;Hello! In my &lt;a href="https://dev.to/maninekkalapudi/redirecting-linux-command-output-6lj-temp-slug-9172606"&gt;last post&lt;/a&gt; I have written about how I/O redirection works in linux. When we think of a command, which is a file in linux; it will be assigned with a set of permissions. Only users with the right access will be able to run the command. All of this will be detailed in the following post. Lets go!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Topics covered in this post:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Preface&lt;/li&gt;
&lt;li&gt;Permission Groups&lt;/li&gt;
&lt;li&gt;Permission Types&lt;/li&gt;
&lt;li&gt;Modifying Permissions&lt;/li&gt;
&lt;li&gt;Executing Commands with sudo&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  1. Preface
&lt;/h2&gt;

&lt;p&gt;Linux or any UNIX-like operating system is built to the core with multi-user model in mind. Before the computers were personal, they filled up buildings as often seen in universities. A practical way to utilize the computer was to connect it with multiple &lt;a href="https://dev.to/maninekkalapudi/the-linux-command-line-experience-4ihg-temp-slug-4874114"&gt;terminals&lt;/a&gt; for say, each department.&lt;/p&gt;

&lt;p&gt;This is a multi-user model in a nutshell. Multiple users can connect to the same computer via terminals using relevant credentials and each user will have certain permissions for certain actions only.&lt;/p&gt;

&lt;p&gt;In a modern system like cloud, multiple users connect to a remote server using SSH (Secure SHell). Each user accessing the server will have a separate account and permissions that are relevant to the role. For example, developer can have SSH access to the dev server but a tester may not have the same access.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Permission Groups
&lt;/h2&gt;

&lt;p&gt;The way permissions are assigned can be grouped into 3 major categories.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Owner - Users may own files and directories and they have control over their access. The owner level access will not impact the actions of other users.&lt;/li&gt;
&lt;li&gt;Group - Users can be grouped based on their role, say developers or admins; and assign access to files and directories to all the users under the specific group.&lt;/li&gt;
&lt;li&gt;All Users - In addition to the above two groups, any user who can access the system will have access to some files and directories granted by the owner.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Lets see all of this in action. First, let's see what permissions are assigned to current user in the cli. We use &lt;code&gt;id&lt;/code&gt; command and it shows&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BF-xFvb3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118027105/IhTLUis2N.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BF-xFvb3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118027105/IhTLUis2N.png" alt="Untitled.png" width="874" height="66"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;uid - user ID. A number (1000) is assigned when user is created and it is mapped to the user ID.&lt;/li&gt;
&lt;li&gt;gid - primary group ID. User is assigned a primary group ID (gid) and may belong to additional groups&lt;/li&gt;
&lt;li&gt;groups- Different groups that the user is part of. Example: 27 is the sudo(root) user group&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;id&lt;/code&gt; command will show how the permissions are enabled to a user using different permission groups. Access to any resources within the system can be assigned using the groups which have a common function. Ex: SSH access to a machine for only developers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The uid and gid starts with 1000 for ubuntu and it might be different for other linux operating systems. Ex: Fedora starts with 500&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Permission Types
&lt;/h2&gt;

&lt;p&gt;Every file or a directory in linux has 3 basic permissions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read(&lt;code&gt;r&lt;/code&gt;)- To read the contents of a file or a directory (if execute(x) permission is also set for the directory)&lt;/li&gt;
&lt;li&gt;Write(&lt;code&gt;w&lt;/code&gt;)- Create, Edit, rename or delete the contents of a file or a directory (if execute(x) permission is also set for the directory)&lt;/li&gt;
&lt;li&gt;Execute(&lt;code&gt;x&lt;/code&gt;)- Run or execute a file or view the contents of a directory. This allows a file to be treated as a program and executed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's take a look at these permissions in the cli. When we run the long list command i.e., &lt;code&gt;ls -l&lt;/code&gt; command on a file or a directory, the first column in the resulting list is the permission to that object (highlighted in the below image).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--N9fE2V-c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118079766/UUmqU6HPp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--N9fE2V-c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118079766/UUmqU6HPp.png" alt="Untitled 1.png" width="838" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first character in the permissions is the object indicator, directory is &lt;code&gt;d&lt;/code&gt; and file is &lt;code&gt;-&lt;/code&gt;. The rest of the characters represent permission groups.&lt;/p&gt;

&lt;p&gt;The first set of three characters after the object indicator are Owner permission, next set are Group permissions and last set are All User permissions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ekfn-Qsf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118091388/Mh9TALXFC.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ekfn-Qsf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118091388/Mh9TALXFC.png" alt="Untitled 2.png" width="880" height="254"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example, we can understand the permission in the above long list as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;dir1&lt;/code&gt;- This is a directory(&lt;code&gt;d&lt;/code&gt;). The owner has all the permissions(&lt;code&gt;rwx&lt;/code&gt;), the group and all users have read and execute permissions(&lt;code&gt;r-x&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;logs.txt&lt;/code&gt;- This is file (&lt;code&gt;-&lt;/code&gt;). The owner has only read and write access(&lt;code&gt;rw-&lt;/code&gt;). The group and the all users han only read permissions(&lt;code&gt;r&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. Modifying Permissions
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;chmod&lt;/code&gt; command is used to change the permissions of a file or a directory. Only the files owner or the superuser can change the mode of a file or directory.&lt;/p&gt;

&lt;p&gt;The permission groups for the &lt;code&gt;chmod&lt;/code&gt; commands will be mentioned as&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;u&lt;/code&gt;- Owner&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;g&lt;/code&gt;- Group&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;o&lt;/code&gt;- Owner&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;a&lt;/code&gt;- All users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additionally, &lt;code&gt;+&lt;/code&gt; and &lt;code&gt;-&lt;/code&gt; assignment operators are used to add or remove permissions respectively. Again, the permission here are read, write and execute(&lt;code&gt;rwx&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The syntax for &lt;code&gt;chmod&lt;/code&gt; command is as follows:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;chmod &amp;lt;permission_group&amp;gt;&amp;lt;assignment_operator&amp;gt;&amp;lt;permission&amp;gt; &amp;lt;file_name/directory_name&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1&lt;/strong&gt; :&lt;/p&gt;

&lt;p&gt;Remove the execution permission for all users for &lt;code&gt;dir1&lt;/code&gt; (from previous example).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jBaTcwkq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118126607/jusjxQiGm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jBaTcwkq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118126607/jusjxQiGm.png" alt="Untitled 3.png" width="732" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2&lt;/strong&gt; :&lt;/p&gt;

&lt;p&gt;Assign execution permission to only group for &lt;code&gt;dir1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--U-Zm-UZF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118136740/92xiRwjnE.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--U-Zm-UZF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118136740/92xiRwjnE.png" alt="Untitled 4.png" width="753" height="202"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 3&lt;/strong&gt; :&lt;/p&gt;

&lt;p&gt;Add execute permission for the owner and set the permissions for the group and others to read and execute. Multiple specifications may be separated by commas&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BppAnMEg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118144866/rG3SdGJY4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BppAnMEg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118144866/rG3SdGJY4.png" alt="Untitled 5.png" width="786" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Note&lt;/em&gt;&lt;/strong&gt; : Assigning the right permissions to the relevant users is a very necessary activity to maintain best security practices. Generally, the less permission given to a user or a group, the better to avoid any mishaps.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Executing Commands with sudo
&lt;/h2&gt;

&lt;p&gt;Sudo (su do) allows a system administrator to delegate authority to give certain users (or groups of users) the ability to run some (or all) commands as root or another user while providing an audit trail of the commands and their arguments.&lt;/p&gt;

&lt;p&gt;In linux, some resources that are very fundamental to the system are managed by administrators only to ensure the integrity. The simplest example to use a sudo is update command.&lt;/p&gt;

&lt;p&gt;In ubuntu, &lt;code&gt;apt-get update&lt;/code&gt; will update the operating system. &lt;code&gt;apt&lt;/code&gt; is the package manager here and &lt;code&gt;update&lt;/code&gt; is the option to tell the system to fetch the latest version of the software.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lRKnUJVf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118156916/nBx9CXr8t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lRKnUJVf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1658118156916/nBx9CXr8t.png" alt="Untitled 6.png" width="880" height="243"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The current user does not have the necessary permission to run the command successfully. The update command refers to the systems files(&lt;code&gt;/var/lib/apt/lists/lock&lt;/code&gt;) that only admin users have permissions.&lt;/p&gt;

&lt;p&gt;To execute the update command successfully, &lt;code&gt;sudo&lt;/code&gt; will be prefixed to the update command. This will temporarily allow us to execute the commands as an admin. This is demonstrated in the second half in the above example image.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thoughts on using &lt;code&gt;sudo&lt;/code&gt; command:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sudo&lt;/code&gt; command gives the privileges of a sysadmin to any user. In a well-designed system, admin privileges will not be provided to any user. As a general rule of thumb, less privileges to a user are better to avoid any change of full system compromise under an attack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Multi-user systems like UNIX/linux are designed to ensure that multiple users would be able to use one machine. This also brings interesting questions on who gets to access what data on that machine.&lt;/p&gt;

&lt;p&gt;Linux has always been developed with this in mind. Using &lt;code&gt;chmod&lt;/code&gt; command in cli to assign right permission is a lot easier especially in server environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://linuxcommand.org/tlcl.php"&gt;Linux Command Line Books by William Shotts&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://linuxfoundation.org/blog/classic-sysadmin-understanding-linux-file-permissions/"&gt;Classic SysAdmin: Understanding Linux File Permissions - Linux Foundation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://linuxize.com/post/how-to-create-users-in-linux-using-the-useradd-command/"&gt;How to Create Users in Linux (useradd Command) | Linuxize&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://linuxize.com/post/how-to-create-a-sudo-user-on-ubuntu/"&gt;How To Create a Sudo User on Ubuntu | Linuxize&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://osr507doc.xinuos.com/en/OSUserG/_Changing_file_permissions.html"&gt;Changing file permissions (xinuos.com)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.sudo.ws/"&gt;Sudo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.redhat.com/sysadmin/sudo"&gt;Linux command line basics: sudo | Enable Sysadmin (redhat.com)&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>Redirecting linux command output</title>
      <dc:creator>maninekkalapudi</dc:creator>
      <pubDate>Tue, 21 Jun 2022 03:55:24 +0000</pubDate>
      <link>https://forem.com/maninekkalapudi/redirecting-linux-command-output-38lg</link>
      <guid>https://forem.com/maninekkalapudi/redirecting-linux-command-output-38lg</guid>
      <description>&lt;p&gt;Hello! Hope youre doing great. In my &lt;a href="https://dev.to/maninekkalapudi/know-your-linux-commands-36pb-temp-slug-2353119"&gt;last post&lt;/a&gt;, I have written about commands in linux CLI. In this post we will understand how to play with any commands output, store it in files or even connect multiple commands together into command pipelines. Lets go!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Topics covered in this post:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What are Standard Input, Output and Error in Linux?&lt;/li&gt;
&lt;li&gt;File Descriptors&lt;/li&gt;
&lt;li&gt;Redirecting stdout&lt;/li&gt;
&lt;li&gt;Redirecting stderr&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cat&lt;/code&gt; command&lt;/li&gt;
&lt;li&gt;Command pipelines&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  1. What is Standard Input, Output and Error in Linux?
&lt;/h2&gt;

&lt;p&gt;Let's say we use a command like &lt;code&gt;ls&lt;/code&gt; in the linux cli. &lt;code&gt;ls&lt;/code&gt; will list all the files and directories in a given path and if the path is non-existent or incorrect path it will throw an error. This is shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--av29MS4A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655778030942/oWjt0LoGZ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--av29MS4A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655778030942/oWjt0LoGZ.png" alt="Untitled.png" width="880" height="219"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, every linux command like &lt;code&gt;ls&lt;/code&gt; is designed to produce the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Command output&lt;/li&gt;
&lt;li&gt;Status messages and error messages&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The output and the error messages are displayed on the screen and we should know that &lt;a href="https://www.tecmint.com/explanation-of-everything-is-a-file-and-types-of-files-in-linux/"&gt;everything is a file in linux&lt;/a&gt;. Which means that a command will (not necessarily) send its output to a special file called &lt;strong&gt;&lt;em&gt;Standard Output&lt;/em&gt;&lt;/strong&gt; (&lt;em&gt;stdout&lt;/em&gt;) and error to &lt;strong&gt;&lt;em&gt;Standard Error&lt;/em&gt;&lt;/strong&gt; (&lt;em&gt;stderr&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;By default, both &lt;em&gt;stdout&lt;/em&gt; and &lt;em&gt;stdout&lt;/em&gt; are linked to the screen and not saved into a disk file. Also, many commands or programs take input from &lt;strong&gt;&lt;em&gt;Standard Input&lt;/em&gt;&lt;/strong&gt; (&lt;em&gt;stdin&lt;/em&gt;) which is by default attached to the keyboard.&lt;/p&gt;

&lt;p&gt;So stdout, stderr and stdin are files that are attached to screen and keyboard respectively. When a command is used in the cli, these files receive the output, errors and input.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rFp7j8ti--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655782412571/LFqNe81-t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rFp7j8ti--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655782412571/LFqNe81-t.png" alt="image.png" width="880" height="537"&gt;&lt;/a&gt;Source: &lt;a href="https://linuxhint.com/redirect-stderr-stdout-bash/"&gt;https://linuxhint.com/redirect-stderr-stdout-bash/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. File Descriptors
&lt;/h2&gt;

&lt;p&gt;A file descriptor is a unique number that identifies an open file in an operating system. It has a record of all the files opened and their locations stored in a global table along with the permissions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mTBbzMS5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655778149967/p9C8a3ZGb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mTBbzMS5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655778149967/p9C8a3ZGb.png" alt="Untitled 1.png" width="225" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://www.computerhope.com/jargon/f/file-descriptor.htm#:~:text=A%20file%20descriptor%20is%20a,Grants%20access."&gt;What is a File Descriptor? (computerhope.com)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On a Unix-like operating system, the first three file descriptors, by default, are STDIN (standard input), STDOUT (standard output), and STDERR (standard error).&lt;/p&gt;

&lt;p&gt;| Name | File descriptor | Description |&lt;br&gt;
| Standard input(stdin) | 0 | The default data stream for input. In the terminal, this defaults to keyboard input from the user |&lt;br&gt;
| Standard output(stdout) | 1 | The default data stream for output, for example when a command prints text. In the terminal, this defaults to the user's screen |&lt;br&gt;
| Standard error(stderr) | 2 | The default data stream for output that relates to an error occurring. In the terminal, this defaults to the user's screen |&lt;/p&gt;

&lt;p&gt;The file descriptors are used to input/output(I/O) redirection in the linux cli. Let's discuss this in the next one.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Redirecting stdout
&lt;/h2&gt;

&lt;p&gt;I/O redirection basically means that we can define where the output ends up finally with the redirection operator &lt;code&gt;&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To redirect the stdout to a file, we will use &lt;code&gt;&amp;lt;command&amp;gt; &amp;gt; &amp;lt;output_file_name&amp;gt;&lt;/code&gt;. Let's take a look at this with an example:&lt;/p&gt;

&lt;p&gt;Here we are using the command &lt;code&gt;ls -l &amp;gt; ls_op.txt&lt;/code&gt; to long list the files and write the output to &lt;code&gt;ls_op.txt&lt;/code&gt; file. &lt;code&gt;cat&lt;/code&gt; command will display the output on the screen&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0HTpSDSs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780473971/VVcnIs_Gf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0HTpSDSs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780473971/VVcnIs_Gf.png" alt="Untitled 2.png" width="747" height="638"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What happens if we provide a non-existing path to the ls command?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6deORtZs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780489646/v51_lZrhh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6deORtZs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780489646/v51_lZrhh.png" alt="Untitled 3.png" width="671" height="92"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, &lt;code&gt;/bn/usr/&lt;/code&gt; path doesnt exist and we anticipated an error message written to the &lt;code&gt;ls_op.txt&lt;/code&gt; file. Instead, the error is shown on the screen. How about the output file?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4arGnSlv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780508658/Hn9EvBFsF.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4arGnSlv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780508658/Hn9EvBFsF.png" alt="Untitled 4.png" width="629" height="106"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ls_op.txt&lt;/code&gt; file is empty and the error is displayed on the screen. What happened here?&lt;/p&gt;

&lt;p&gt;First, we received an error message for the non-existing path. The error will be sent to &lt;em&gt;stderr&lt;/em&gt; instead of &lt;em&gt;stdout.&lt;/em&gt; Next, the redirection operator (&lt;code&gt;&amp;gt;&lt;/code&gt;) will overwrite the data on the output file. Since we didnt receive any output, the file was overwritten with nothing and the error was redirected to screen.&lt;/p&gt;

&lt;p&gt;Intrestingly, we can use &lt;code&gt;&amp;gt;&lt;/code&gt; to create a new file or even truncate an existing file.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;gt; &amp;lt;filename&amp;gt;&lt;/code&gt; will create a new file and if this file exists, its contents will be truncated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XwqAbkKR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780527995/nlqfmZEu6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XwqAbkKR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780527995/nlqfmZEu6.png" alt="Untitled 5.png" width="880" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What if we want to just append the output to an existing file? &lt;code&gt;&amp;gt;&amp;gt;&lt;/code&gt; will append the data to the existing file and create a new file if it is not present.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vXYpTwgc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780540075/EsHkIA292.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vXYpTwgc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780540075/EsHkIA292.png" alt="Untitled 6.png" width="663" height="263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Redirecting stderr
&lt;/h2&gt;

&lt;p&gt;Redirecting the stderr must need its file descriptor. As discussed above, the file descriptor for the stderr is 2 and this will be used along with redirection operator for stderr.&lt;/p&gt;

&lt;p&gt;The stderr can be redirected to a file as mentioned below:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ls -l /bn/usr 2&amp;gt; ls-err.txt&lt;/code&gt;. Here 2 is the file descriptor for stderr followed by the redirection operator. Lets see the output&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ax7Tg70C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780554951/9y5AkcLxA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ax7Tg70C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780554951/9y5AkcLxA.png" alt="Untitled 7.png" width="713" height="117"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Redirecting both stdout and stderr to single file&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. cat command
&lt;/h2&gt;

&lt;p&gt;cat, which is short for concatenate, can perform multiple operations like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Display file contents&lt;/li&gt;
&lt;li&gt;Concate multiple files into a new file using redirection operator&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a new file using cat command and redirection operator&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Display file contents&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;cat &amp;lt;filename&amp;gt;&lt;/code&gt; will display the contents of the given file. If multiple files were to be displayed, then add the option &lt;code&gt;-n&lt;/code&gt; and pass the file names one after the other.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qhUAQQhv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780599100/OQmdPV5K1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qhUAQQhv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780599100/OQmdPV5K1.png" alt="Untitled 9.png" width="868" height="306"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Concate multiple files into a new file using redirection operator&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What &lt;code&gt;cat&lt;/code&gt; commad essentially did in the above step is that it read the file contents and passed it to stdout file which is attached to the screen. We can use the I/O redirection technique to redirect the output of &lt;code&gt;cat&lt;/code&gt; command to a file. Lets see this in action.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GjmPr0HT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780613132/6vTTQlZQq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GjmPr0HT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780613132/6vTTQlZQq.png" alt="Untitled 10.png" width="587" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cat testfile &amp;gt; catfile&lt;/code&gt; command put the testfile contents to catfile using &lt;code&gt;&amp;gt;&lt;/code&gt; operator. Also, the catfile is created by the command on the go.&lt;/p&gt;

&lt;p&gt;This can be easily applied in a scenario where we want to write multiple file contents to a single file. &lt;code&gt;cat &amp;lt;input1&amp;gt; &amp;lt;input2&amp;gt; &amp;gt; &amp;lt;ouputfile&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---nU1I0a_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780625131/9URRCDixH.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---nU1I0a_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780625131/9URRCDixH.png" alt="Untitled 11.png" width="796" height="100"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Create a new file using cat command and redirection operator&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What happens when we dont pass a file name for the &lt;code&gt;cat&lt;/code&gt; command? &lt;code&gt;cat&lt;/code&gt; command expects a file name and when not provided, it will read from the stdin. This is provided in the manual (&lt;code&gt;man cat&lt;/code&gt;)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lK2slWdS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780638901/uUJxfFFTa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lK2slWdS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780638901/uUJxfFFTa.png" alt="Untitled 12.png" width="880" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, the &lt;code&gt;cat&lt;/code&gt; command will continuously accept the input from keyboard and displays it back on the screen as the stdout is still attached to the screen. By pressing &lt;code&gt;ctrl+d&lt;/code&gt; or &lt;code&gt;cmd+d&lt;/code&gt; keys will exit the prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--F57jhh2T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780648095/2R6gcQPkQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--F57jhh2T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780648095/2R6gcQPkQ.png" alt="Untitled 13.png" width="621" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What if we redirect the above examples output to an output file? The command for this scenario would be &lt;code&gt;cat &amp;gt; &amp;lt;filename&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FvgSPPrI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780658819/eBrXC6ejN.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FvgSPPrI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1655780658819/eBrXC6ejN.png" alt="Untitled 14.png" width="880" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The input will be accepted continuously from the keyboard and output will be written to the file. Again, by pressing &lt;code&gt;ctrl+d&lt;/code&gt; or &lt;code&gt;cmd+d&lt;/code&gt; keys will exit the prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Command Pipelines
&lt;/h2&gt;

&lt;p&gt;Till now, we have used single commands in the cli and played with the output from those commands (&lt;code&gt;&amp;gt;&lt;/code&gt;). Piplines in the shell help utilize the stdout of one command to be piped into stdin of other command with the help of pipe (&lt;code&gt;|&lt;/code&gt;) operator.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ls -l /usr/bin | less&lt;/code&gt;. &lt;code&gt;ls -l&lt;/code&gt; command will list all the contents in the path &lt;code&gt;/usr/bin&lt;/code&gt;. Later the output of this command will be passed to &lt;code&gt;less&lt;/code&gt; command. &lt;code&gt;less&lt;/code&gt; command will display the contents page (one screen) by page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Filters&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ls -l /usr/bin | sort&lt;/code&gt;- This command will provide the sorted lists&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ls -l /usr/bin | sort | uniq | less&lt;/code&gt;- This command will provide the sorted, unique list in pages. The &lt;code&gt;uniq&lt;/code&gt; command is often used in conjunction with &lt;code&gt;sort&lt;/code&gt; command. &lt;code&gt;uniq&lt;/code&gt; command accepts either stdin or a single filename argument.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;ls -l | sort | uniq | grep zip&lt;/code&gt;- This command will search for a pattern in the sorted, unique list using &lt;code&gt;grep&lt;/code&gt; command. &lt;code&gt;grep&lt;/code&gt; command accepts a pattern within a file or a sdtin. A pattern here means a word or a regex pattern.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;ls /usr/bin | tee ls.txt | grep zip&lt;/code&gt;- The &lt;code&gt;tee&lt;/code&gt; command reads the stdin from the previous command and copies it to the stdout and also to one or more files. This allows us to capture the output from the intermediate steps in the process&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The I/O redirection is a powerful technique and pairing with the pipelines will help us build a quick script or first versions of data pipeline. Commands like &lt;code&gt;grep&lt;/code&gt; will help trigger other commands based on the seach results or even help understand the output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://linuxcommand.org/tlcl.php"&gt;Linux Command Line Books by William Shotts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.computerhope.com/jargon/f/file-descriptor.htm#:~:text=A%20file%20descriptor%20is%20a,Grants%20access."&gt;What is a File Descriptor? (computerhope.com)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://phoenixnap.com/kb/linux-cat-command"&gt;Cat Command in Linux {15 Commands with Examples} | phoenixNAP KB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.geeksforgeeks.org/less-command-linux-examples/#:~:text=Less%20command%20is%20a%20Linux,accesses%20it%20page%20by%20page."&gt;less command in Linux with Examples - GeeksforGeeks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://linuxhint.com/redirect-stderr-stdout-bash/"&gt;How to Redirect stderr to stdout in Bash (linuxhint.com)&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>Know Your Linux Commands</title>
      <dc:creator>maninekkalapudi</dc:creator>
      <pubDate>Sun, 08 May 2022 14:11:57 +0000</pubDate>
      <link>https://forem.com/maninekkalapudi/know-your-linux-commands-l34</link>
      <guid>https://forem.com/maninekkalapudi/know-your-linux-commands-l34</guid>
      <description>&lt;p&gt;Hello! Hope youre doing great. In my &lt;a href="https://dev.to/maninekkalapudi/working-with-files-and-directories-in-linux-cli-24g7-temp-slug-8707293"&gt;last post&lt;/a&gt; I have written about working with files and directories in linux CLI. In this post, lets discuss what actually is a command and how to create a command of our own.&lt;/p&gt;

&lt;p&gt;Topics covered in this post:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is a command?&lt;/li&gt;
&lt;li&gt;Identifying a command&lt;/li&gt;
&lt;li&gt;Know your commands via CLI&lt;/li&gt;
&lt;li&gt;Create your own command using alias&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  1. What is a command?
&lt;/h3&gt;

&lt;p&gt;Command(s) in general means an instruction or a set of instructions given to a machine to perform an action. A command in linux world can be any of the following:&lt;/p&gt;

&lt;p&gt;a. &lt;strong&gt;An executable program&lt;/strong&gt; - &lt;code&gt;/usr/bin&lt;/code&gt; in linux has all the compiled binaries(installed programs). These are written in C, C++, Python, Shell and etc.&lt;/p&gt;

&lt;p&gt;b. &lt;strong&gt;Shell bulit-in&lt;/strong&gt; - bash shell supports a number of commands called as *shell built-ins. Ex: &lt;code&gt;cd&lt;/code&gt; command&lt;/p&gt;

&lt;p&gt;c. &lt;strong&gt;Shell function&lt;/strong&gt; - Shell scripts that are included in the environment.&lt;/p&gt;

&lt;p&gt;d. &lt;strong&gt;Alias&lt;/strong&gt; - Aliases, like the name suggests, we can give an alias to the built-in functions&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Identifying a command
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;type&lt;/code&gt; command&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZmRb_qd2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018553718/dg17-mK-L.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZmRb_qd2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018553718/dg17-mK-L.png" alt="Untitled.png" width="880" height="147"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;type ls&lt;/code&gt; command on the other hand shows that &lt;code&gt;ls&lt;/code&gt; is in fact an alias to the command &lt;code&gt;ls --color=auto&lt;/code&gt;. When we use &lt;code&gt;ls&lt;/code&gt; command, the results will be displayed with color coding as above. An alias will work just like any command and when we use the alias it will invoke the command it is pointing to.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;which&lt;/code&gt; command&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GLg69YUO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018567838/qxTa9qirU.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GLg69YUO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018567838/qxTa9qirU.png" alt="Untitled 1.png" width="813" height="185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;type&lt;/code&gt; and &lt;code&gt;which&lt;/code&gt; commands are two ways we can determine the type of a command and where it is referenced(installed) from.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Know your command
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;--help&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To know more about any command we can use &lt;code&gt;--help&lt;/code&gt; option for any command. &lt;code&gt;&amp;lt;command&amp;gt; --help&lt;/code&gt; will show all the options for the command. In the below example, we can see the documentation for &lt;code&gt;mv&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PcKCnY8D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018594863/VkL8DX-Eb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PcKCnY8D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018594863/VkL8DX-Eb.png" alt="Untitled 2.png" width="880" height="671"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each option will give additional functionality to the command. For example: &lt;code&gt;mv&lt;/code&gt; command with option &lt;code&gt;-u&lt;/code&gt; will only move those files from source directory that are new or updated than the destination directory.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;manual (&lt;code&gt;man&lt;/code&gt; command)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;man&lt;/em&gt; short for &lt;em&gt;manual&lt;/em&gt; will provide the formal documentation for any executable programs. man command will provide all the information for the command in different sections like name, synopsis, description and others.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--B0ZPne01--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018609297/5CK-qPRbW.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--B0ZPne01--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018609297/5CK-qPRbW.png" alt="Untitled 3.png" width="880" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;apropos command&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;apropos &amp;lt;search_term&amp;gt;&lt;/code&gt; command will show the appropriate commands by scanning the man pages based on the search term.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--VU-pbFBp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018622143/d1SyxYnJA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--VU-pbFBp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018622143/d1SyxYnJA.png" alt="Untitled 4.png" width="880" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The results of the apropos command covers a wide range of cases from man pages thus very different results. It is recommended to use those commands that are suitable for a scenario. A brief description of the command is given after the command in the results.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Create your own command using alias
&lt;/h3&gt;

&lt;p&gt;Till now we saw the examples that had only one command. We can use semicolon(&lt;code&gt;;&lt;/code&gt;) between each command to run all of them at once(&lt;code&gt;cmd1; cmd2; cmd3;&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;For example: &lt;code&gt;echo "Hi, there!"; ls; ls destdir&lt;/code&gt;. &lt;code&gt;echo&lt;/code&gt; command is the print statement of the linux cli and &lt;code&gt;ls&lt;/code&gt; command will list the files and directories.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--22s48xLR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018637130/F9LZiZSGh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--22s48xLR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018637130/F9LZiZSGh.png" alt="Untitled 5.png" width="880" height="95"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, we can use these commands and create an alias and use the alias to perform the same action every time. Note that the user-defined alias is specific to machine.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;alias &amp;lt;name&amp;gt; = '&amp;lt;command_string&amp;gt;'&lt;/code&gt; will create an alias with the supplied name. Now, lets create the alias and see it in action.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WDGqxPTO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018649740/Gio47OMe2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WDGqxPTO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018649740/Gio47OMe2.png" alt="Untitled 6.png" width="880" height="238"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After creating alias with &lt;code&gt;alias mycommand="echo \"Hi, there\"; ls; ls destdir&lt;/code&gt; command, we can invoke the alias like any linux command shown above. When we check the type of the alias using &lt;code&gt;type mycommand&lt;/code&gt;, it shows &lt;code&gt;mycommand is aliased to&lt;/code&gt;echo "Hi, there"; ls; ls destdir'`.&lt;/p&gt;

&lt;p&gt;To list all the aliases that are currently in the system, use &lt;code&gt;alias&lt;/code&gt; command and to remove any alias use &lt;code&gt;unalias &amp;lt;alias_name&amp;gt;&lt;/code&gt;. For example, &lt;code&gt;unalias mycommand&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NcYCX53y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018665580/IvaMO3_dW.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NcYCX53y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1652018665580/IvaMO3_dW.png" alt="Untitled 7.png" width="880" height="245"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://linuxcommand.org/tlcl.php"&gt;Linux Command Line Books by William Shotts&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>Working with Files and Directories in Linux CLI</title>
      <dc:creator>maninekkalapudi</dc:creator>
      <pubDate>Mon, 21 Feb 2022 08:45:26 +0000</pubDate>
      <link>https://forem.com/maninekkalapudi/working-with-files-and-directories-in-linux-cli-3koh</link>
      <guid>https://forem.com/maninekkalapudi/working-with-files-and-directories-in-linux-cli-3koh</guid>
      <description>&lt;p&gt;Hello! Hope youre doing great. In my &lt;a href="https://dev.to/maninekkalapudi/the-linux-command-line-experience-4ihg-temp-slug-4874114"&gt;last post&lt;/a&gt; I have written about the how to get started with linux command line(cli) and terms like shell, terminal and etc. We also tried few basic commands to list the files and directories in a path. In this post, we will take this further and discuss about how we can interact with files and directories. Lets dive in!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Topics covered in this post&lt;/strong&gt; :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Files and directories in Linux&lt;/li&gt;
&lt;li&gt;Create and edit files in cli&lt;/li&gt;
&lt;li&gt;Create directories with cli&lt;/li&gt;
&lt;li&gt;File permissions in linux&lt;/li&gt;
&lt;li&gt;Manipulating files and directories in cli&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  1. Files and directories in Linux
&lt;/h3&gt;

&lt;p&gt;Files are the basic entities in linux which can store some data, text or a script/program. Directories (folders in other operating systems) contain either files or other directories. Both files and directories are common among all the operating systems.&lt;/p&gt;

&lt;p&gt;Linux filesystem, shown in the diagram below; has different files and directories for various operations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ky0wno7s--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424413545/Kt5j7r4gA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ky0wno7s--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424413545/Kt5j7r4gA.png" alt="Untitled.png" width="550" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Linux Filesystem Source: &lt;a href="http://www2.hawaii.edu/~walbritt/ics240/materials/module2-session07.htm"&gt;ICS 240: Operating Systems by William McDaniel Albritton (hawaii.edu)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a user is logged in, they will land in &lt;code&gt;/home/&amp;lt;username&amp;gt;&lt;/code&gt;(shown as &lt;code&gt;~&lt;/code&gt; in cli). The user can create and delete files and directories within their home directory. There will be some files and directories that are created by the sysadmin(&lt;code&gt;root&lt;/code&gt; or admin user) in your user directory which cannot be modified or deleted.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ls&lt;/code&gt; command shows contents in the current directory. When we run &lt;code&gt;ls&lt;/code&gt; command with all option(&lt;code&gt;-a&lt;/code&gt;), it will show hidden files/directories (filenames starting &lt;code&gt;.&lt;/code&gt;) along with the regular content. Hidden files can be the configuration files(&lt;code&gt;.bashrc&lt;/code&gt;), environment files(&lt;code&gt;.profile&lt;/code&gt;) and etc. More on this in upcoming posts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LjxYr-zv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424428038/qKyo7VHlV.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LjxYr-zv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424428038/qKyo7VHlV.png" alt="Untitled 1.png" width="880" height="208"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also, there is no concept of file extensions in linux. The type of the file is determined by the contents of the file (or file header) rather when not provided. Operating systems like Windows do rely upon the file extension to determine the file type. For example, &lt;code&gt;.txt&lt;/code&gt; is a text file, &lt;code&gt;.exe&lt;/code&gt; is an executable program and &lt;code&gt;.jpg&lt;/code&gt; is an image file and etc.&lt;/p&gt;

&lt;p&gt;To check the type of file, we can use &lt;code&gt;file &amp;lt;filename&amp;gt;&lt;/code&gt; command. In the below example, weve &lt;code&gt;OMENCity&lt;/code&gt; without any file extension mentioned. When we run the command &lt;code&gt;file OMENCity&lt;/code&gt;, we get the file metadata (file information).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--iOoYGOOu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424459275/DAmEjYeW7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--iOoYGOOu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424459275/DAmEjYeW7.png" alt="Untitled 2.png" width="858" height="102"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Create and edit files in cli
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Creating a file with &lt;code&gt;touch&lt;/code&gt; command:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;touch &amp;lt;filename&amp;gt;&lt;/code&gt; will create a new file in the current working directory. Optionally, we can pass the &lt;code&gt;&amp;lt;path&amp;gt;/to/&amp;lt;file&amp;gt;/&amp;lt;filename&amp;gt;&lt;/code&gt; to the touch command to create a file in the specific location (shown in the next example).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3y2zifqx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424481114/1dPaoedzN.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3y2zifqx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424481114/1dPaoedzN.png" alt="Untitled 3.png" width="880" height="238"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice that we didnt pass any file extension with the &lt;code&gt;touch&lt;/code&gt; command like this &lt;code&gt;touch &amp;lt;filename.extension&amp;gt;&lt;/code&gt;. We can do that as well. For example, &lt;code&gt;touch index.html&lt;/code&gt;. Let's try this example.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5f7wT_df--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424493365/ojx6ylhpZ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5f7wT_df--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424493365/ojx6ylhpZ.png" alt="Untitled 4.png" width="880" height="127"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Editing a file with Vim(&lt;code&gt;vi&lt;/code&gt;) editor:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that we have created the files, let's edit the file in the command line. We can use command line editors like Vim (personal preference) or nano. The command to open a file using vim editor is &lt;code&gt;vim &amp;lt;filename&amp;gt;&lt;/code&gt; (we can use &lt;code&gt;vi&lt;/code&gt; only instead of &lt;code&gt;vim&lt;/code&gt; for the command). We can use the similar command for nano editor as well, &lt;code&gt;nano &amp;lt;filename&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Let's edit the &lt;code&gt;testfile&lt;/code&gt; in home directory. Once we say &lt;code&gt;vim testfile&lt;/code&gt; command and press the return(enter) key, the below screen is presented. We cant just edit the file yet. Alternatively, we can use &lt;code&gt;vim /path/to/&amp;lt;filename&amp;gt;&lt;/code&gt; to edit the file in a different path than the current working directory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--W27KmiLI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424518535/eKEohTjbp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--W27KmiLI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424518535/eKEohTjbp.png" alt="Untitled 5.png" width="710" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, to edit the file we need to press &lt;code&gt;esc&lt;/code&gt; key and then &lt;code&gt;I&lt;/code&gt; key. This will turn the vim editor to insert mode. We can observe the INSERT at the bottom of the screen. Now, we can write text to the file.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4-JjzuV3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424539874/GKGREvyVe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4-JjzuV3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424539874/GKGREvyVe.png" alt="Untitled 6.png" width="880" height="199"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After entering the text, we need to save the file with the latest changes. Press &lt;code&gt;esc&lt;/code&gt; key and type &lt;code&gt;:wq&lt;/code&gt; and press enter to save the latest changes. &lt;code&gt;:wq&lt;/code&gt; is the command to save the file(&lt;code&gt;w&lt;/code&gt;) and quit the vim editor(&lt;code&gt;q&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--c4cASbPq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424547246/oBVjNYT4w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--c4cASbPq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424547246/oBVjNYT4w.png" alt="Untitled 7.png" width="702" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Viewing the file content with &lt;code&gt;cat&lt;/code&gt; command&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To view the contents of the file, we can use &lt;code&gt;cat&lt;/code&gt;(concatenate) command. &lt;code&gt;cat &amp;lt;filename&amp;gt;&lt;/code&gt; will spit out the contents of the file directly to the command line.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1u-w8HlE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424561293/bqyTHGALm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1u-w8HlE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424561293/bqyTHGALm.png" alt="Untitled 8.png" width="880" height="98"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creating files with Vim editor:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We have seen an example on creating a file with &lt;code&gt;touch&lt;/code&gt; command. We can also use the vim editor to do the same and we can eliminate the file creation step altogether. Let's see this with an example.&lt;/p&gt;

&lt;p&gt;Previously, we have used the &lt;code&gt;vim &amp;lt;filename&amp;gt;&lt;/code&gt; or &lt;code&gt;vim /path/to/&amp;lt;filename&amp;gt;&lt;/code&gt; commands to edit the file in vim editor on an existing file. We can use the same &lt;code&gt;vim &amp;lt;filename&amp;gt;&lt;/code&gt; command to create a non-existing file as well. Let's see this in action.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--l7OVJ77L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424636237/pO4uffGQR.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--l7OVJ77L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424636237/pO4uffGQR.png" alt="Untitled 9.png" width="880" height="99"&gt;&lt;/a&gt; &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--p_jVLrDk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424612365/4EDeAWTWR.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--p_jVLrDk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424612365/4EDeAWTWR.png" alt="Untitled 10.png" width="706" height="154"&gt;&lt;/a&gt; &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ahRI5D5g--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424655716/y1H8dLEU0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ahRI5D5g--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424655716/y1H8dLEU0.png" alt="Untitled 11.png" width="709" height="237"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The following steps were taken in the above example:&lt;/p&gt;

&lt;p&gt;a. List files using &lt;code&gt;ls&lt;/code&gt; command&lt;/p&gt;

&lt;p&gt;b. Open vim editor for the new file &lt;code&gt;vi vinewfile&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;c. Change the vim editor mode to Insert using &lt;code&gt;esc&lt;/code&gt; key and &lt;code&gt;I&lt;/code&gt; key&lt;/p&gt;

&lt;p&gt;d. Edit the contents of the file and save it using &lt;code&gt;:wq&lt;/code&gt; command&lt;/p&gt;

&lt;p&gt;e. &lt;code&gt;cat&lt;/code&gt; command to view the contents of the file.&lt;/p&gt;

&lt;p&gt;We can use the above steps to create a hidden file. For example, &lt;code&gt;vim .vimhiddenfile&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; : While using &lt;code&gt;vim&lt;/code&gt; command to create a file on the go, we must save(&lt;code&gt;:wq&lt;/code&gt;) it to appear in the path. Else the file will not be created.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Create directories with cli
&lt;/h3&gt;

&lt;p&gt;Creating a directory is pretty straightforward. &lt;code&gt;mkdir&lt;/code&gt;, short for make directory, is the command to create the directories. Let's see this in action.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--H4nhSy8J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424698499/yEXempBS5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--H4nhSy8J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424698499/yEXempBS5.png" alt="Untitled 12.png" width="862" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The following steps were taken in the above example:&lt;/p&gt;

&lt;p&gt;a. &lt;code&gt;ls&lt;/code&gt; command to list all the files&lt;/p&gt;

&lt;p&gt;b. &lt;code&gt;mkdir &amp;lt;dirname&amp;gt;&lt;/code&gt;(&lt;code&gt;mkdir newdir&lt;/code&gt;) command to create the directory in the current working directory&lt;/p&gt;

&lt;p&gt;c. &lt;code&gt;mkdir ./realtive/path/&amp;lt;dirname&amp;gt;&lt;/code&gt; command to create a directory in the specified path&lt;/p&gt;

&lt;p&gt;d. To create an empty folder path i.e., creating a directory within a non-existing directory in a path, we should use &lt;code&gt;-p&lt;/code&gt; option with the &lt;code&gt;mkdir&lt;/code&gt; command. Else, the error (similar one) &lt;code&gt;mkdir: cannot create directory ./newnewdir/dir1: No such file or directory&lt;/code&gt;. For example: &lt;code&gt;mkdir -p ./newnewdir/dir1&lt;/code&gt; will create both &lt;code&gt;newnewdir&lt;/code&gt; and &lt;code&gt;dir1&lt;/code&gt; within it&lt;/p&gt;

&lt;p&gt;e. To change the directory we use &lt;code&gt;cd /path/to/dir&lt;/code&gt; command&lt;/p&gt;

&lt;h3&gt;
  
  
  4. File permissions in linux
&lt;/h3&gt;

&lt;p&gt;Now, let's run &lt;code&gt;ls -al&lt;/code&gt; or &lt;code&gt;ll&lt;/code&gt; (long list) command in the home directory and check the output. The output of both the commands are similar and they display the following info in the columns respectively&lt;/p&gt;

&lt;p&gt;a. File permissions&lt;/p&gt;

&lt;p&gt;b. &lt;a href="https://www.redhat.com/sysadmin/linking-linux-explained"&gt;File's number of hard links&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;c. File owner username&lt;/p&gt;

&lt;p&gt;d. name of the group that owns the file&lt;/p&gt;

&lt;p&gt;e. Size of the file in bytes&lt;/p&gt;

&lt;p&gt;f. Date and time of the file's last modification&lt;/p&gt;

&lt;p&gt;g. Name of the file&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--24aAWNdC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424722642/8Powtftwz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--24aAWNdC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424722642/8Powtftwz.png" alt="Untitled 13.png" width="834" height="919"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the above list the directories are marked with &lt;code&gt;d&lt;/code&gt; for the first letter in the File permissions and similarly for files it is &lt;code&gt;-&lt;/code&gt;. Each file and directory in linux will have read(&lt;code&gt;r&lt;/code&gt;), write(&lt;code&gt;w&lt;/code&gt;) and execute(&lt;code&gt;x&lt;/code&gt;) permissions for the below categories of users:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;a. Owner - Who owns a file or directory
b. Group - A group of users with the same permissions provided by the owner
c. World - Any user who is granted some permissions provided by the owner

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2eFsuURp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424745209/l4qXZ0ef8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2eFsuURp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424745209/l4qXZ0ef8.png" alt="Untitled 14.png" width="880" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first 3 letters after the directory/file indicator are the permissions for the Owner, followed by Group and finally World. Let's take 3 examples of the files/directories highlighted in the above picture.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.bashrc&lt;/code&gt; file(-rw-r--r) - The owner has read and write permissions for the file. No execution permissions for the owner. The group and the world share same permissions i.e., read only&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;newdir&lt;/code&gt; directory(drwxr-xr-x) - The owner has all 3 permissions for this directory i.e., read, write and execute. The group and the world share the same permissions which is read and execute&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;File permissions, changing the permissions for file and user access in linux is a pretty interesting topic and we have barely touched the surface. Will dive deep in an upcoming post.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Manipulating files and directories in cli
&lt;/h3&gt;

&lt;p&gt;The basic operations we perform in a filesystem (graphical or cli based) are create, copy, move, delete and rename the files and directories. Let's see how that happens in a cli.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creating a file with &lt;code&gt;touch&lt;/code&gt; and &lt;code&gt;vim&lt;/code&gt; commands:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9hWbDjVw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424762989/UUYqI6QmF.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9hWbDjVw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424762989/UUYqI6QmF.png" alt="Untitled 15.png" width="880" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copy files and directories with &lt;code&gt;cp&lt;/code&gt; command&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5AivTq2l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424800755/0WbEQsw6j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5AivTq2l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424800755/0WbEQsw6j.png" alt="Untitled 16.png" width="732" height="212"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copying a file to another file will overwrite the file in destination path. For example, the command&lt;code&gt;cp /path/to/sourcefile /path/to/destinationfile&lt;/code&gt; will overwrite the contents of destination file with source files contents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--59UHKzRY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424810965/jrbLoItq1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--59UHKzRY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424810965/jrbLoItq1.png" alt="Untitled 17.png" width="659" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To copy a directory to a path, we need an additional option &lt;code&gt;-r&lt;/code&gt; which stands for recursive and it will allow us to copy a directory and its contents recursively. For example, &lt;code&gt;cp -r ./srcdir ./destdir&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xngs-D4A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424821807/cvw3r_J5e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xngs-D4A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424821807/cvw3r_J5e.png" alt="Untitled 18.png" width="590" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point one might wonder why do we need a cli to perform these simple tasks which could be done in GUI very easily. Like drag and drop a file/directory to copy. The answer is power and flexibility.&lt;/p&gt;

&lt;p&gt;Let say we have thousands of files common between two directories (&lt;code&gt;src&lt;/code&gt; and &lt;code&gt;dest&lt;/code&gt;). We need to copy only those files that are not in to &lt;code&gt;dest&lt;/code&gt; path from the &lt;code&gt;src&lt;/code&gt; path. Doing this in the GUI is a tedious task and if we have to repeat it every day or even every hour, it would be nearly impossible to finish the task with consistent results. But in cli it is just a simple command.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cp -u srcdir/* destdir/&lt;/code&gt; command will copy only the files that are not present in &lt;code&gt;destdir&lt;/code&gt; directory from &lt;code&gt;srcdir&lt;/code&gt;and also the files that are modified recently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--VHHztmc0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424833378/c9PIL5j7Z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--VHHztmc0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424833378/c9PIL5j7Z.png" alt="Untitled 19.png" width="657" height="647"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Move files/directories with &lt;code&gt;mv&lt;/code&gt; command:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--w3-n5EbO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424845180/c4CHjS63m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--w3-n5EbO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424845180/c4CHjS63m.png" alt="Untitled 20.png" width="787" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove/delete a file or directory with &lt;code&gt;rm&lt;/code&gt; command:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;rm /path/to/file&lt;/code&gt; command will delete the file from the path. The following shows how &lt;code&gt;rm&lt;/code&gt; command works&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4y91FXA1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424857119/_8dUU_XXa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4y91FXA1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424857119/_8dUU_XXa.png" alt="Untitled 21.png" width="798" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;rm -d /path/to/dir&lt;/code&gt; command will delete empty directory from the path. if we try the same command with non-empty folder, we will see &lt;code&gt;Directory not empty&lt;/code&gt; error. To remove a directory along with its contents, we use &lt;code&gt;-r&lt;/code&gt;(recursive) option which will delete the contents recursively.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GYJItf4V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424865705/YIt7bXmeG.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GYJItf4V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424865705/YIt7bXmeG.png" alt="Untitled 22.png" width="819" height="536"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To check how to delete process is carried out we can use &lt;code&gt;i&lt;/code&gt; option with &lt;code&gt;rm&lt;/code&gt; command. This will show which file or directory is being deleted. Example shown below. For every file and subdirectories in the directory will ask for prompt to delete it (yes-y, no-n)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9HelzEL0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424874928/Tl7jF7ry4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9HelzEL0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1645424874928/Tl7jF7ry4.png" alt="Untitled 23.png" width="809" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; To perform copy, delete or move operations the users should have necessary permissions(rwx) as discussed in the above section.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://linuxcommand.org/tlcl.php"&gt;Linux Command Line Books by William Shotts&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.tutorialspoint.com/unix/unix-file-management.htm"&gt;Unix / Linux - File Management (tutorialspoint.com)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.freecodecamp.org/news/vim-editor-modes-explained/"&gt;Vim Editor Modes Explained (freecodecamp.org)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.redhat.com/sysadmin/linking-linux-explained"&gt;Hard links and soft links in Linux explained | Enable Sysadmin (redhat.com)&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>The Linux Command Line Experience</title>
      <dc:creator>maninekkalapudi</dc:creator>
      <pubDate>Wed, 02 Feb 2022 05:33:13 +0000</pubDate>
      <link>https://forem.com/maninekkalapudi/the-linux-command-line-experience-f29</link>
      <guid>https://forem.com/maninekkalapudi/the-linux-command-line-experience-f29</guid>
      <description>&lt;p&gt;Hello! Hope youre doing well. In this post well talk about Command Line Interface(CLI) in Linux. In the &lt;a href="https://dev.to/maninekkalapudi/what-is-linux-2aki-temp-slug-7977297"&gt;last post&lt;/a&gt; we discussed about what Linux is. In this one we will get a taste of working with Linux command line. Lets go!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Topics covered in this post&lt;/strong&gt; :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is a Shell?&lt;/li&gt;
&lt;li&gt;Terminal Emulators&lt;/li&gt;
&lt;li&gt;Linux Filesystem&lt;/li&gt;
&lt;li&gt;Navigating Linux filesystem in CLI&lt;/li&gt;
&lt;li&gt;Linux command behavior&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  1. What is a Shell?
&lt;/h3&gt;

&lt;p&gt;When we refer to the command line what we really mean is shell. The shell is a program that takes keyboard commands and passes them to the operating system to carry out. Almost all Linux distributions supply a shell program from the GNU Project called &lt;strong&gt;bash&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The name bash is an acronym for Bourne Again SHell. It is an enhanced replacement for Shell(sh), the original Unix shell program written by Steve Bourne.&lt;/p&gt;

&lt;p&gt;The popular shells used in linux are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;C Shell (csh)&lt;/li&gt;
&lt;li&gt;Kron Shell (ksh)&lt;/li&gt;
&lt;li&gt;Z Shell(zsh)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Linux shell offers a way to interact with the kernel through commands. Ex: &lt;code&gt;ls&lt;/code&gt; lists all the files and folders in a directory. Each command represents a task to be performed.&lt;/p&gt;

&lt;p&gt;Every shell bash or zsh will offer the similar functionality with same commands for the most part with some additional functionality of their own. One such example is the difference between &lt;a href="https://askanydifference.com/difference-between-bash-and-shell/"&gt;bash and shell&lt;/a&gt;. The original shell didnt offer command history(list of previously executed commands) and bash has it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Terminal Emulators
&lt;/h3&gt;

&lt;p&gt;A terminal is a program which passes the user input commands to the shell and displays the command output from shell to the user. A number of terminal emulators are available for Linux, but they all basically do the same thing; give us access to the shell.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ODr-mzMl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643778638093/6wXWORLLv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ODr-mzMl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643778638093/6wXWORLLv.png" alt="Untitled.png" width="880" height="539"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Terminal offers a way to customize the appearance of the text(commands, progress bars, icons and output) displayed and much more. The possibilities for customization are endless. One such example is &lt;a href="https://itsfoss.com/customize-linux-terminal/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Linux Filesystem
&lt;/h3&gt;

&lt;p&gt;Linux is a hierarchical directory structure. This means that every directory can have files and other directories in them. When represented pictorially, it looks like a tree(data structure). The Root directory is the first directory in the filesystem and it has the various folders that are assigned for a purpose mentioned below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2xzNnOdN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779473147/Ym4S66ydi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2xzNnOdN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779473147/Ym4S66ydi.png" alt="Linux Filesystem. Source:[Linux File Hierarchy Structure - GeeksforGeeks](https://www.geeksforgeeks.org/linux-file-hierarchy-structure/)" width="602" height="623"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Linux Filesystem. Source:&lt;a href="https://www.geeksforgeeks.org/linux-file-hierarchy-structure/"&gt;Linux File Hierarchy Structure - GeeksforGeeks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we first login to the Linux machine as a typical user we will be logged into the &lt;code&gt;/home/&amp;lt;username&amp;gt;&lt;/code&gt; directory. Only a root user will be logged into root (&lt;code&gt;/&lt;/code&gt;) directory.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Navigating Linux filesystem in CLI
&lt;/h3&gt;

&lt;p&gt;One of the fundamental actions we perform in any operating system is navigating the filesystem. We are familiar with the graphical file manager in Windows and MacOS and even in the desktop linux distros. But, how about a server with only CLI? Lets dive in.&lt;/p&gt;

&lt;p&gt;As soon as you login to the linux machine(ex: a server) as a user, you will be logged in to &lt;code&gt;~&lt;/code&gt;(&lt;code&gt;/home/&amp;lt;username&amp;gt;&lt;/code&gt;) directory. In the below picture, Ive logged into ubuntu using &lt;a href="https://pureinfotech.com/install-windows-subsystem-linux-2-windows-10/"&gt;WSL on Windows&lt;/a&gt;. We will be using this going further.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1c0AXcq0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779534665/3ajL2N0l1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1c0AXcq0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779534665/3ajL2N0l1.png" alt="Untitled 2.png" width="880" height="591"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At a given time, we are inside a single directory per terminal session. We can see the files contained in the directory and the pathway to the directory above us (called the parent directory) and any subdirectories below us. The directory we are standing in is called the &lt;em&gt;current working directory&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Lets try two basic commands after logging in to linux terminal&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;whoami&lt;/code&gt;- shows the username&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pwd&lt;/code&gt;- gives current working directory that we are in&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ls&lt;/code&gt;- gives the list of files and directories in the current directory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--w-AkZ2eE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779632786/Ppu7TBbVA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--w-AkZ2eE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779632786/Ppu7TBbVA.png" alt="Untitled 3.png" width="880" height="630"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This gives us a pretty good understanding about who we are(&lt;code&gt;whoami&lt;/code&gt;), where we are(&lt;code&gt;pwd&lt;/code&gt;) and what do we have in our current directory(&lt;code&gt;ls&lt;/code&gt;). To navigate the filesystem we use &lt;code&gt;cd&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cd&lt;/code&gt; is short for change directory, which allows us to change the current working directory. &lt;code&gt;cd&lt;/code&gt; command expects a folder or a path in the filesystem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2vN8PWO9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779642273/Ev5brHvEQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2vN8PWO9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779642273/Ev5brHvEQ.png" alt="Untitled 4.png" width="880" height="657"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the above example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cd /usr/bin&lt;/code&gt;- change the current working directory to &lt;code&gt;/usr/bin&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cd ~&lt;/code&gt;- &lt;code&gt;~&lt;/code&gt; is notation for users home directory, which is &lt;code&gt;/home/&amp;lt;username&amp;gt;/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cd ./test&lt;/code&gt;- &lt;code&gt;.&lt;/code&gt; represents current directory and &lt;code&gt;/test&lt;/code&gt; represents test directory under the current one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notice that the path is provided in two different ways. We have two ways to define a path in linux.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Absolute path:&lt;/strong&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Relative path:&lt;/strong&gt; &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Few Tips:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;While navigating the subdirectories() within the current working directory, the &lt;code&gt;.&lt;/code&gt; can be ommited from the &lt;code&gt;cd&lt;/code&gt; command. For example: We are in pdir. pdir contains dir1 and dir1 has dir2. To navigate to dir2 from pdir, we can use &lt;code&gt;cd dir1/dir2&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Just enter &lt;code&gt;cd&lt;/code&gt; command and hit return to navigate to users home directory from any directory.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cd ~username&lt;/code&gt;- changes the working directory to the home directory of username&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  5. Linux command behavior
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;ls&lt;/code&gt; command gives the list of files and directories in the current working directory. But, what if we want more info while displaying the info? This is where command options and arguments comes handy. The options for a command will modify its behavior thus giving different results.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ls -a&lt;/code&gt;- displays all the files even the hidden files in the current working directory. Hidden files and directories start with &lt;code&gt;.&lt;/code&gt;. For example: &lt;code&gt;.profile&lt;/code&gt;is a hidden file and &lt;code&gt;.landscape&lt;/code&gt; is a hidden directory in the below example.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6LRwQoSs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779667495/0jWGu7T3u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6LRwQoSs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779667495/0jWGu7T3u.png" alt="Untitled 5.png" width="880" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ls -altr&lt;/code&gt;-displays all the files in the current working directory(&lt;code&gt;a&lt;/code&gt;), in long list(&lt;code&gt;l&lt;/code&gt;), sorted by modification date(&lt;code&gt;t&lt;/code&gt;) and in reverse order(&lt;code&gt;t&lt;/code&gt;). We can use multiple options for a command for a desired behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qq3XPv2C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779680942/L97yK2IOp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qq3XPv2C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779680942/L97yK2IOp.png" alt="Untitled 6.png" width="880" height="943"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ls /usr . -altr&lt;/code&gt; command takes two paths as arguments (&lt;code&gt;/usr&lt;/code&gt; and &lt;code&gt;.&lt;/code&gt; ) and displays the results for both the paths as shown below&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pWWBtX1t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779692385/kW62sBhyOZ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pWWBtX1t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779692385/kW62sBhyOZ.png" alt="Untitled 7.png" width="880" height="794"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An important question here is... do we have to remember all of the commands and their options available in linux? NO! You cant possibly do that but we have a command for that as well.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;man&lt;/code&gt; short for manual gives the full documentation for almost any command in linux. We can get documentation for any command using &lt;code&gt;man &amp;lt;command&amp;gt;&lt;/code&gt;. Lets try for &lt;code&gt;ls&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Fe1fhmcr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779704367/ozZMf0cK5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Fe1fhmcr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1643779704367/ozZMf0cK5.png" alt="Untitled 8.png" width="880" height="953"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the upcoming posts we will dive deep into working with files and directories in linux. Stay tuned!&lt;/p&gt;

&lt;p&gt;Sources:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://linuxcommand.org/tlcl.php"&gt;Linux Command Line Books by William Shotts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.journaldev.com/39194/different-types-of-shells-in-linux"&gt;What are the Different Types of Shells in Linux? - JournalDev&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cs.dartmouth.edu/~campbell/cs50/shell.html#:~:text=The%20shell%20is%20the%20Linux,shell%20executes%20the%20ls%20command."&gt;https://www.cs.dartmouth.edu/~campbell/cs50/shell.html#:~:text=The shell is the Linux,shell executes the ls command.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.tecmint.com/linux-terminal-emulators/"&gt;22 Useful Terminal Emulators for Linux Desktop (tecmint.com)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://askanydifference.com/difference-between-bash-and-shell/"&gt;Difference Between Bash and Shell (With Table) Ask Any Difference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.geeksforgeeks.org/linux-file-hierarchy-structure/"&gt;Linux File Hierarchy Structure - GeeksforGeeks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pureinfotech.com/install-windows-subsystem-linux-2-windows-10/"&gt;How to install WSL2 (Windows Subsystem for Linux 2) on Windows 10 Pureinfotech&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>What is Linux?</title>
      <dc:creator>maninekkalapudi</dc:creator>
      <pubDate>Sun, 23 Jan 2022 06:33:50 +0000</pubDate>
      <link>https://forem.com/maninekkalapudi/what-is-linux-5aj5</link>
      <guid>https://forem.com/maninekkalapudi/what-is-linux-5aj5</guid>
      <description>&lt;p&gt;Hello! Hope youre doing well. In this post well talk about Linux. Linux is a free, opensource software that is and highly customizable and ubiquitous in the computing world. Large parts of the internet as we know it is runs on Linux based operating systems(OS). So, knowing Linux and working with Linux command line will take us a long way in the software industry and others as well.&lt;/p&gt;

&lt;p&gt;Have you ever wondered whey there is no Linux OS out there? Im sure you heard of MacOS, Windows and even Linux distros but never Linux OS. Let find out in this blog post.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Topics covered in this post&lt;/strong&gt; :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Types of software development and distribution&lt;/li&gt;
&lt;li&gt;What is Linux?&lt;/li&gt;
&lt;li&gt;What is a Linux distro?&lt;/li&gt;
&lt;li&gt;What is a Linux command line?&lt;/li&gt;
&lt;li&gt;Linux commands&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  1. Types of software development and distribution
&lt;/h3&gt;

&lt;p&gt;Lets understand the following terms to get a better idea how software is developed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Opensource- The source code in the software is available to view and modify. A community of developers contribute to build and maintain it.&lt;/li&gt;
&lt;li&gt;Free- Software that is free to use for the individual. One may not be able to view the source code and in some cases the code cannot be modified or distributed&lt;/li&gt;
&lt;li&gt;Closed source- The source code is not visible and the end user cannot modify anything in the product or even redistribute it. Users have to pay to obtain it&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  2. What is Linux?
&lt;/h3&gt;

&lt;p&gt;Linux is a free and opensource, unix-like operating system(actually a kernel) developed by Linus Torvalds as a free operating system &lt;a href="https://www.cs.cmu.edu/~awb/linux.history.html"&gt;in 1991&lt;/a&gt;. It was based of(not a copy) Unix operating system, which was developed by AT&amp;amp;T(Bell Labs) as a proprietary OS(some versions).&lt;/p&gt;

&lt;p&gt;Linux on the other hand was developed as a free and opensource alternative for Unix. We can get the publicly available source code for Linux, modify it and even redistribute it without any cost involved. Also, developers across the globe participate in contributing to the Linux development.&lt;/p&gt;

&lt;p&gt;Opensource nature of Linux allowed to modify it for all kinds of purposes ranging from microcontrollers to a massive supercomputers and even in space vehicles there are on moon and mars.&lt;/p&gt;

&lt;p&gt;Linux as your probably might think, is not a full-fledged OS but it is a kernel. A kernel is a part of the OS which talks to the hardware components like CPU, RAM etc., and other components in the OS as shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--s8RhK31q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642918933035/VldxwDIXk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--s8RhK31q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642918933035/VldxwDIXk.png" alt="Linux Architecture" width="880" height="677"&gt;&lt;/a&gt; &lt;strong&gt;Linux Architecture. Source: &lt;a href="https://www.interviewbit.com/linux-interview-questions/"&gt;https://www.interviewbit.com/linux-interview-questions/&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. What is a Linux distro?
&lt;/h3&gt;

&lt;p&gt;When the first version of the Linux kernel was developed, it was distributed with the with a set of &lt;a href="https://en.wikipedia.org/wiki/GNU_project"&gt;GNU&lt;/a&gt; utilities and tools for setting up a file system, the Graphical User Interface(GUI) and apps like terminal. This is where it gets the name Linux distribution(Linux distro).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uuNHvC-l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642919106362/1EZ9sIwb9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uuNHvC-l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642919106362/1EZ9sIwb9.png" alt="Untitled 1.png" width="600" height="600"&gt;&lt;/a&gt;Linux distribution. Source: &lt;a href="https://www.suse.com/c/how-suse-builds-its-enterprise-linux-distribution-part-2/"&gt;How SUSE builds its Enterprise Linux distribution PART 2 | SUSE Communities&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is still the case to this day and there &lt;a href="https://distrowatch.com/dwres.php?resource=popularity"&gt;hundreds of Linux distributions&lt;/a&gt; are available and every distro has Linux kernel and GNU components. One can download for free and even customize them to their hearts content. There are distros that are derived from other distros. Needless to say this is a huge hobby among enthusiasts to try out various distros and tinker them.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. What is a Linux command line?
&lt;/h3&gt;

&lt;p&gt;A Linux command line is a text interface which allows us to interact with the computer using commands. It is often referred to as shell, terminal, console or various other names and definitions are mentioned below:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Terminal&lt;/strong&gt; : A text based environment where you input the commands and see the output. A terminal will pass the input commands to shell for execution and display the output from it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Shell&lt;/strong&gt; : A shell is the program that the terminal sends user input to. The shell generates output and passes it back to the terminal for display&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Console&lt;/strong&gt; : A console is a physical device that had the terminal with screen and keyboard. In the software world a terminal and a console are referred interchangeably&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now you might ask, how is this useful? Well, in a normal desktop environment you will get all the GUI components installed. Which would look like in the below picture&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--l9NypWAx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642919220933/Fi3Fl2BhV.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--l9NypWAx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642919220933/Fi3Fl2BhV.png" alt="Untitled 2.png" width="800" height="450"&gt;&lt;/a&gt;Ubuntu GUI&lt;/p&gt;

&lt;p&gt;This is great for personal use and everything in the GUI seems to be laid out perfectly. But when it comes to &lt;a href="https://en.wikipedia.org/wiki/Server_(computing)"&gt;servers&lt;/a&gt;, where Linux is a primary choice; there will be no GUI and all the work should be done through the terminal.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5yeiacsG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642919241274/d6xPIcT6e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5yeiacsG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642919241274/d6xPIcT6e.png" alt="Untitled 3.png" width="660" height="442"&gt;&lt;/a&gt;Ubuntu command line&lt;/p&gt;

&lt;p&gt;When you login to a server or open terminal app in your Linux distro, youll see the above window. What does the text &lt;code&gt;me@linuxbox:~$&lt;/code&gt; mean?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;me&lt;/code&gt;: username which you logged in&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;linuxbox&lt;/code&gt;: name of the machine or server&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;~&lt;/code&gt;: home directory(folder) for the user. At any point one terminal session will be on one single directory. Likewise, multiple terminals can point to different directories&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;$&lt;/code&gt;: represents that you are normal user. In few shells, the admin/root user will see &lt;code&gt;#&lt;/code&gt; instead of the &lt;code&gt;$&lt;/code&gt; sign. &lt;code&gt;#&lt;/code&gt; also represents elevated privileges to the system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can enter any command after this window appears and hit &lt;code&gt;Return&lt;/code&gt; key to display the output. We will discuss about the user types, permissions and other topics in the future posts.&lt;/p&gt;

&lt;p&gt;Im using &lt;a href="https://www.youtube.com/watch?v=Owrk9UxnMdI"&gt;WSL with Ubuntu&lt;/a&gt; going further and it looks like the below picture&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FBOlw3cL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642919255093/ulhynOYs1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FBOlw3cL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642919255093/ulhynOYs1.png" alt="Untitled 4.png" width="880" height="525"&gt;&lt;/a&gt;Ubuntu on Windows Terminal with WSL2&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/mnt/c/Users/manik&lt;/code&gt; is the home directory for the user.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Linux commands
&lt;/h3&gt;

&lt;p&gt;Commands are reserved keywords that signifies an action in the system. Here are the few example commands that we can try out in a Linux command line.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;date&lt;/code&gt;- displays current date&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cal&lt;/code&gt;- displays calendar with only current month and current date is highlighted&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ls&lt;/code&gt;- lists all the directories and files in the current folder&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pwd&lt;/code&gt;- present working directory&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;clear&lt;/code&gt;- clears(hides) all the contents on the terminal window&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--g1aY-YPw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642919277926/BOv5JlJRs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--g1aY-YPw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642919277926/BOv5JlJRs.png" alt="Untitled 5.png" width="880" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Meanwhile if you type some random gibberish into the command line, it will throw error saying &lt;code&gt;command not found&lt;/code&gt;. We can use the arrow keys to go through the command history(up arrow key) and also navigate within the command(left and right arrow keys)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--H-YnkUMF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642919294265/itMXQivmK.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--H-YnkUMF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1642919294265/itMXQivmK.png" alt="Untitled 6.png" width="880" height="58"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can perform almost any action within the operating system using the commands. We will further explore all the important commands in the future blog posts&lt;/p&gt;

&lt;h3&gt;
  
  
  References:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://linuxcommand.org/tlcl.php"&gt;Linux Command Line Books by William Shotts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ubuntu.com/tutorials/command-line-for-beginners#1-overview"&gt;The Linux command line for beginners | Ubuntu&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.hanselman.com/blog/whats-the-difference-between-a-console-a-terminal-and-a-shell"&gt;What's the difference between a console, a terminal, and a shell? - Scott Hanselman's Blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.geeksforgeeks.org/difference-between-terminal-console-shell-and-command-line/"&gt;Difference between Terminal, Console, Shell, and Command Line - GeeksforGeeks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://askubuntu.com/questions/506510/what-is-the-difference-between-terminal-console-shell-and-command-line"&gt;What is the difference between Terminal, Console, Shell, and Command Line? - Ask Ubuntu&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://linuxcommand.org/"&gt;LinuxCommand.org: Learn The Linux Command Line. Write Shell Scripts.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.suse.com/c/how-suse-builds-its-enterprise-linux-distribution-part-2/"&gt;How SUSE builds its Enterprise Linux distribution PART 2 | SUSE Communities&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>WordCount Example with MapReduce</title>
      <dc:creator>maninekkalapudi</dc:creator>
      <pubDate>Sat, 14 Aug 2021 05:10:29 +0000</pubDate>
      <link>https://forem.com/maninekkalapudi/wordcount-example-with-mapreduce-4ngf</link>
      <guid>https://forem.com/maninekkalapudi/wordcount-example-with-mapreduce-4ngf</guid>
      <description>&lt;p&gt;Hello! Hope you're doing well. In my last &lt;a href="https://dev.to/maninekkalapudi/hadoop-mapreduce-a-programming-paradigm-5h4g-temp-slug-1504943"&gt;post&lt;/a&gt; I've explained about internals of Hadoop MapReduce. As promised in that post, we will write and execute a MapReduce program in Java for a simple wordcount example. Let's dive in!&lt;/p&gt;

&lt;h3&gt;
  
  
  Topics covered in this post
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Pre-requisites&lt;/li&gt;
&lt;li&gt;Hadoop cluster setup on local machine and on Cloud&lt;/li&gt;
&lt;li&gt;Writing a MapReduce program on Eclipse&lt;/li&gt;
&lt;li&gt;Create a JAR file for the MapReduce Program and Uploading to HDFS&lt;/li&gt;
&lt;li&gt;Executing the MapReduce Program on the Hadoop Cluster&lt;/li&gt;
&lt;li&gt;Results&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  1. Pre-requisites
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Admin access to the machine (local preferably)&lt;/li&gt;
&lt;li&gt;Hadoop Cluster (Single/Multi node cluster) on local machine or on cloud&lt;/li&gt;
&lt;li&gt;Install &lt;a href="https://www.oracle.com/in/java/technologies/javase/javase-jdk8-downloads.html"&gt;JDK 1.8 or later&lt;/a&gt; on the local machine&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.eclipse.org/downloads/"&gt;Eclipse IDE&lt;/a&gt; or any Java IDE installed on the local machine&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  2. Hadoop cluster setup on local machine and on Cloud
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;i. Single Node cluster setup&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As we already discussed, the DataNodes store and process the data. We need at least a single node Hadoop cluster to run the MapReduce program and process the data.&lt;/p&gt;

&lt;p&gt;Setting up a single node Hadoop cluster on a local machine is a bit lengthy process and often could lead us to errors. I'm sharing the guides that I've used to setup the cluster on my local for testing below for both and Windows and Linux.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Windows- &lt;a href="https://towardsdatascience.com/installing-hadoop-3-2-1-single-node-cluster-on-windows-10-ac258dd48aef"&gt;https://towardsdatascience.com/installing-hadoop-3-2-1-single-node-cluster-on-windows-10-ac258dd48aef&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Linux (Ubuntu)- &lt;a href="https://phoenixnap.com/kb/install-hadoop-ubuntu"&gt;https://phoenixnap.com/kb/install-hadoop-ubuntu&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;ii. Multi node Cluster setup&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Alternatively, we can use a cloud-based Hadoop cluster like &lt;a href="https://cloud.google.com/dataproc"&gt;DataProc&lt;/a&gt; on Google Cloud platform (GCP) which doesn't require any setup other than selecting the configuration of the NameNode and the DataNodes. The GCP account setup can be referred &lt;a href="https://www.youtube.com/watch?v=W5mPX1-015o"&gt;here&lt;/a&gt;. We'll see the setup in the following steps.&lt;/p&gt;

&lt;p&gt;Before going any further you should consider two important steps while operating in any cloud environment.&lt;/p&gt;

&lt;p&gt;a. Setting up the &lt;a href="https://www.youtube.com/watch?v=F4omjjMZ54k"&gt;billing alerts&lt;/a&gt; to avoid any unexpected bills.&lt;/p&gt;

&lt;p&gt;b. Turn off/delete the resources soon after the work is done&lt;/p&gt;

&lt;p&gt;a. Sign up to the &lt;a href="https://cloud.google.com/"&gt;Google Cloud&lt;/a&gt; and login to your account&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qucTIds1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628912768681/2_2-FDIWB.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qucTIds1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628912768681/2_2-FDIWB.png" alt="Untitled.png" width="880" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;b. Search for " &lt;strong&gt;DataProc&lt;/strong&gt;" and select the option with the same name in the results&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EPRk0htN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628912893364/S69V9zPhp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EPRk0htN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628912893364/S69V9zPhp.png" alt="Untitled 1.png" width="880" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;c. Select the " &lt;strong&gt;Create Cluster&lt;/strong&gt;" option&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yglbQ_Is--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628912915773/AqE0VVMnE.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yglbQ_Is--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628912915773/AqE0VVMnE.png" alt="Untitled 2.png" width="880" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;d. Provide the following details in the create cluster page under "setup a cluster page"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;i. Cluster name - **test-cluster**

ii. Cluster region and Zone - **us-central1** , **us-central1-a**

iii. Cluster Type - **Standard (1 master, N workers)**

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oTzAMSpV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913390141/DG7bLloJU.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oTzAMSpV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913390141/DG7bLloJU.png" alt="Untitled 3.png" width="880" height="397"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;iv. Autoscaling Policy - **None**

v. Image type and version - **2.0-debian10 (default)**

vi. **Select Enable Component gateway**

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xKVU4qRX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913685987/yPy-I9sDR.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xKVU4qRX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913685987/yPy-I9sDR.png" alt="Untitled 4.png" width="880" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--S-3o86V---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913703498/C0h2AYaUx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--S-3o86V---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913703498/C0h2AYaUx.png" alt="Untitled 5.png" width="880" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e. Under " &lt;strong&gt;Configure nodes&lt;/strong&gt;" select the following for Master node&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;i. Machine family - **General-Purpose (default)**

ii. Series - **N1 (default)**

iii. Machine type - **n1-standard-2 (2 vCPU, 7.5 GB memory)**

iv. Primary disk size (min 15 GB) - **100GB**

v. Primary disk type - **Standard Persistent Disk**

vi. Number of local SSDs - **0**

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jBYj9FcJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913833256/cKcr_XIwO.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jBYj9FcJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913833256/cKcr_XIwO.png" alt="Untitled 6.png" width="880" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;f. Select the following for "Worker Nodes"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;i. Machine family - **General-Purpose (default)**

ii. Series - **N1 (default)**

iii. Machine type - **n1-standard-2 (2 vCPU, 7.5 GB memory)**

iv. Number of worker nodes - **2**

v. Primary disk size (min 15 GB) - **100GB**

vi. Primary disk type - **Standard Persistent Disk**

vii. Number of local SSDs **- 0**

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ttx-J1qB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913858019/9Y_b3vHrixP.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ttx-J1qB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913858019/9Y_b3vHrixP.png" alt="Untitled 7.png" width="880" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;g. Leave the rest of the config as is and select on "CREATE"&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--T9AML453--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913890948/aTYipyWQX.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--T9AML453--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913890948/aTYipyWQX.png" alt="Untitled 8.png" width="880" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;h. Click on the cluster name and select the "VM Instances" tab in the page&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--c4nePUiH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913933980/aE70UXDXp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--c4nePUiH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913933980/aE70UXDXp.png" alt="Untitled 9.png" width="880" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;i. Click on "SSH" for the master node and you'll be presented with a new browser window connected to the master node of our HDFS cluster. I've used local terminal to connect to the master node for the rest of the post.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TJb1mXXw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913980494/qIVrZF_BI.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TJb1mXXw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628913980494/qIVrZF_BI.png" alt="Untitled 10.png" width="880" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; In real world scenarios, we would connect to the Hadoop cluster via a gateway node or edge node. We'll not use the NameNode for connecting to the cluster since it'll be very busy in handling the cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Writing a MapReduce program on Eclipse
&lt;/h3&gt;

&lt;p&gt;a. Create a new Java project called "wordcountmapreduce" in Eclipse IDE on your local machine. Here, I'm using a Linux (ubuntu) machine to create the project. The rest of the steps should stay same for Windows machine as well.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TNdLf-wF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914002133/RA0y2kZlC.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TNdLf-wF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914002133/RA0y2kZlC.png" alt="Untitled 11.png" width="880" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;b. Create a new Class for Map by right clicking on the project and select "Class". Once you select it, enter the name of the Map class as "WordCountMapper" and hit Finish.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bjfj2BNq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914016333/ZlKA4zogp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bjfj2BNq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914016333/ZlKA4zogp.png" alt="Untitled 12.png" width="880" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;c. Once &lt;code&gt;WordCountMapper&lt;/code&gt; class is created, use the following link for the mapper, reducer, partitioner implementation for the wordcount example. Refer the &lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples/blob/main/Hadoop/MapReduce/eclipseprojects/wordcountmapreduce/src/wordcountpackage/WordCountMapper.java"&gt;GitHub link&lt;/a&gt; for the code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yoLXkTRi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914077782/BuL35ZKAT.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yoLXkTRi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914077782/BuL35ZKAT.png" alt="Untitled 13.png" width="880" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;d. To remove the errors in the IDE, we must mention the Hadoop libraries in project build path. The following are the libraries (only jar files) that should be added to the project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;hadoop_dir&amp;gt;&lt;/code&gt;/share/hadoop/mapreduce (&lt;code&gt;&amp;lt;hadoop_dir&amp;gt;&lt;/code&gt; is the path where you saved the hadoop distribution. Ex: /home/&lt;code&gt;&amp;lt;username&amp;gt;&lt;/code&gt;/hadoop-3.3.1)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;hadoop_dir&amp;gt;&lt;/code&gt;/share/hadoop/hdfs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;hadoop_dir&amp;gt;&lt;/code&gt;/share/hadoop/client&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;hadoop_dir&amp;gt;&lt;/code&gt;/share/hadoop/common&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;hadoop_dir&amp;gt;&lt;/code&gt;/share/hadoop/yarn&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ncCA74Cb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914131283/3Aq2q4jYgI.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ncCA74Cb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914131283/3Aq2q4jYgI.png" alt="Untitled 14.png" width="880" height="472"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on "Add External JARs" and navigate to the paths mentioned in the above list. After all the required JARs, click "Apply and Close"&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TvNIEcnx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914152194/-sP8awahw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TvNIEcnx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914152194/-sP8awahw.png" alt="Untitled 15.png" width="880" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e. After adding the jars to the project build path, we can see the errors disappeared in the IDE in the below image. Use the code for the reducer (WordCountReducer.java), partitioner (WordCountPartitioner.java) and the driver (WordCount.java) classes from the &lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples/tree/main/Hadoop/MapReduce/eclipseprojects/wordcountmapreduce/src/wordcountpackage"&gt;GitHub link&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tsKw-QKJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914175506/-p_LXluEg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tsKw-QKJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914175506/-p_LXluEg.png" alt="Untitled 16.png" width="880" height="472"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;f. Once the project setup is done, we will have a look at the "WordCount.java" class. This is a driver class which executes the Map, Reduce, Combiner and the Partitioner classes on the cluster. This class includes config like&lt;/p&gt;

&lt;p&gt;i. Job Name - setJobName ii. Driver class - setJarByClass iii. Mapper class - setMapperClass iv. Combiner class - setCombinerClass. Same as Reducer class for wordcount example v. Reducer class - setReducerClass vi. Number of Reducers - setNumReduceTasks vii. Output data types from each class - setOutputKeyClass, setOutputValueClass viii. Input and Output paths - addInputPath, setOutputPath respectively&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GDCr4RbW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914189286/bDF_Ajx4Ys.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GDCr4RbW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914189286/bDF_Ajx4Ys.png" alt="Untitled 17.png" width="880" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is basically the end of the project and code setup required for the wordcount problem in MapReduce.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Create a JAR file for the MapReduce Program and Uploading to HDFS
&lt;/h3&gt;

&lt;p&gt;Once the project and the Mapreduce code setup is done, there are two ways we could execute the MapReduce Java program:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run the Java program within the eclipse. You can find the guide for the same &lt;a href="https://projectgurukul.org/create-hadoop-mapreduce-project-eclipse/"&gt;here&lt;/a&gt;. &lt;/li&gt;
&lt;li&gt;Package the Java program as a JAR file with all the dependencies and execute on the Hadoop cluster. We'll follow this method for this guide&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Steps to package the wordcount MapReduce Java program as a JAR file:&lt;/p&gt;

&lt;p&gt;a. Right click on the project and select "Export" option&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IvfHxPFG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914624601/YX6KLCX0L.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IvfHxPFG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914624601/YX6KLCX0L.png" alt="Untitled 18.png" width="880" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;b. Under Java, select "JAR" option and click Next.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UkXmFFqO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914218157/Zq1YKqSVb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UkXmFFqO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914218157/Zq1YKqSVb.png" alt="Untitled 19.png" width="880" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;c. Select the path for saving the JAR file. Click Next until the final step&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_vF7U3dd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914700912/fS9cJp-_N.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_vF7U3dd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914700912/fS9cJp-_N.png" alt="Untitled 20.png" width="613" height="663"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;d. Select the Main class as "WordCount" using Browse window.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7Gacj50m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914249227/mcdUibZtq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7Gacj50m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914249227/mcdUibZtq.png" alt="Untitled 21.png" width="880" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e. Select Finish to create the jar file&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--41FtpOUd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914263820/lVsQh02LR.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--41FtpOUd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914263820/lVsQh02LR.png" alt="Untitled 22.png" width="610" height="663"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;f. The jar file will be created as shown below. Once the jar file is created, we'll upload it to the GCP Hadoop cluster and run it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FySVlYSI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914277414/f66rco09u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FySVlYSI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628914277414/f66rco09u.png" alt="Untitled 23.png" width="880" height="63"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;g. Now, we'll upload this to the master node in the HDFS cluster using SCP. You can configure SSH to connect to HDFS cluster instance on GCP using this &lt;a href="https://www.youtube.com/watch?v=2ibBF9YqveY"&gt;link&lt;/a&gt;. I've used Windows + Windows Terminal and the same steps mentioned below are followed. To copy the jar file(s) to master node on the cluster, we use the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SCP -i "`&amp;lt;Path/to/SSH/key/ssh-key&amp;gt;`" Path/to/jar/file/wordcountmapperonly.jar username@`&amp;lt;master-ip&amp;gt;`:/path/on/server

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JnotHqvb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915047434/nPBEsRKFz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JnotHqvb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915047434/nPBEsRKFz.png" alt="Untitled 24.png" width="880" height="40"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;h. Once the jar file is available on the master node instance, we can use the following commands to copy the jar file to the HDFS cluster. Please note master node instance and the HDFS cluster are different.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SSH -i "`&amp;lt;Path/to/SSH/key/&amp;gt;`" username@`&amp;lt;master-ip&amp;gt;`
hadoop fs -put -f Path/to/jar/file/wordcountmapperonly.jar `&amp;lt;hdfs_path&amp;gt;`
hadoop fs -ls `&amp;lt;hdfs_path&amp;gt;`

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5wbusUuS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915383503/kTFe6Y4UY.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5wbusUuS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915383503/kTFe6Y4UY.png" alt="Untitled 25.png" width="880" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, we are copying the jar files &lt;code&gt;wordcountmapperonly.jar&lt;/code&gt;, &lt;code&gt;wordcountmapreduce.jar&lt;/code&gt; and &lt;code&gt;wordcountmapreducepartitioner.jar&lt;/code&gt; and the input data folder &lt;code&gt;HadoopInputFiles&lt;/code&gt; for the Hadoop Directory &lt;code&gt;'/'&lt;/code&gt;. The input folder contains 3 text files&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Executing the MapReduce Program on the Hadoop Cluster
&lt;/h3&gt;

&lt;p&gt;As we've seen already, the MapReduce driver class (WordCount.java) will be configured to execute Mapper, Combiner, Reducer and Partitioner. We'll run the MapReduce program with different configurations using the driver class&lt;/p&gt;

&lt;p&gt;i. Only Mapper&lt;/p&gt;

&lt;p&gt;ii. Mapper and Reducer&lt;/p&gt;

&lt;p&gt;ii. Mapper, Reducer and Partitioner&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;i. Only Mapper&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To run Mapper only, we need to comment out the Combiner, Reducer and Partitioner classes configured in the driver class and package the jar file as shown in the above step. The driver class should look like the below picture. The code for the same is &lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples/blob/main/Hadoop/MapReduce/eclipseprojects/wordcountmapperonly/src/wordcountpackage/WordCount.java"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rytryJt6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915418933/-c9HKnRfM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rytryJt6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915418933/-c9HKnRfM.png" alt="Untitled 26.png" width="880" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The input files are in "/HadoopInputFiles" and has data as in three files as mentioned below. You can find the input files &lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples/tree/main/Hadoop/MapReduce/HadoopInputFiles"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BYjrORud--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915438667/hI3ABHCJJ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BYjrORud--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915438667/hI3ABHCJJ.png" alt="Untitled 27.png" width="880" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, run the jar file "wordcountmapperonly.jar" on the Hadoop cluster with the following command and above input files. The steps to copy the jar file to HDFS location is shown above section.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hadoop jar `&amp;lt;hdfs_path&amp;gt;`/wordcountmapperonly.jar `&amp;lt;input_file_or_dir_path&amp;gt;` `&amp;lt;output_path&amp;gt;`

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following image show how to run the mapreduce jars on Hadoop cluster. The full output log of the run is &lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples/blob/main/Hadoop/MapReduce/logs/mapperonly-log.log"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--S-6cLmWx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915462139/ZoGafYArR.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--S-6cLmWx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915462139/ZoGafYArR.png" alt="Untitled 28.png" width="880" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The output of the mapper only phase contains all the words with count 1 as shown below&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RYby4ZLv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915545193/-GjPz7VsJ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RYby4ZLv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915545193/-GjPz7VsJ.png" alt="Untitled 29.png" width="880" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once we run the MapReduce job, we can see the application is tracked under YARN which is a resource manager for the cluster. Every run gets an entry here. The default YARN URL is &lt;strong&gt;&lt;code&gt;&amp;lt;cluster-hostname&amp;gt;&lt;/code&gt;:8088&lt;/strong&gt;. For DataProc cluster though, we need to go to cluster details in the GCP console, select " &lt;strong&gt;Web Interfaces&lt;/strong&gt;" tab under cluster details and select " &lt;strong&gt;YARN ResourceManager&lt;/strong&gt;" to get the YARN web interface.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1I7yGwl0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915560469/7f4jqAw71.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1I7yGwl0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915560469/7f4jqAw71.png" alt="Untitled 30.png" width="880" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kKFF0fGq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915573450/4_SWn0iL_.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kKFF0fGq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915573450/4_SWn0iL_.png" alt="Untitled 31.png" width="880" height="966"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In case where the output path in the &lt;code&gt;hadoop jar&lt;/code&gt; command already exists, the MapReduce framework throws " &lt;strong&gt;Output directory already exists&lt;/strong&gt;" error as shown below. This is to avoid the overwriting of any output data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3_bzsjge--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915587019/4Q5Pspg0A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3_bzsjge--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915587019/4Q5Pspg0A.png" alt="Untitled 32.png" width="880" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; &lt;code&gt;-D mapred.reduce.tasks&lt;/code&gt; is set to 3 by default and we need only map phase to run. We can force the reducer count to zero using this property.&lt;/p&gt;

&lt;p&gt;In the output path, we can see four different files&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;_SUCCESS&lt;/strong&gt; - Indicates the job status&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;part-m-00000 to part-m-00002&lt;/strong&gt; - output file corresponding each input files. here 'm' in the output filename indicates 'mapper' phase. Since we don't have a reduce phase configured for this run, we'll get an output file for an input file&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FHOCt9yJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915598287/X0LhV63YQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FHOCt9yJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915598287/X0LhV63YQ.png" alt="Untitled 33.png" width="880" height="105"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we already know, each mapper produces the key-value pairs &lt;code&gt;&amp;lt;word,1&amp;gt;&lt;/code&gt; for all the words in the input sentence as output shown below&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qAtqAVxH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915612631/f-ihJ81jr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qAtqAVxH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915612631/f-ihJ81jr.png" alt="Untitled 34.png" width="880" height="486"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ii. Mapper and Reducer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now, Let's run the '&lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples/tree/main/Hadoop/MapReduce/jarfiles"&gt;wordcountmapreduce.jar&lt;/a&gt;' with the same input files and a different output path. This has both map and reduce phase configured in the driver class. Logs for the run are &lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples/tree/main/Hadoop/MapReduce/logs"&gt;here&lt;/a&gt; and code for the same is &lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples/tree/main/Hadoop/MapReduce/eclipseprojects/wordcountmapreduce"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nA2Fd_wo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915634488/gd7Jl_TIg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nA2Fd_wo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915634488/gd7Jl_TIg.png" alt="Untitled 35.png" width="880" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The output is generated after the reduce phase into a single output file. Since we have only one reducer by default in the cluster&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jSMD0D6u--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915647512/d2IJte2vn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jSMD0D6u--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915647512/d2IJte2vn.png" alt="Untitled 36.png" width="880" height="72"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GkmsT3z4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915662432/RF4SRauMR.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GkmsT3z4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915662432/RF4SRauMR.png" alt="Untitled 37.png" width="880" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;iii. Mapper, Reducer and Partitioner&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now, Let's run the '&lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples/tree/main/Hadoop/MapReduce/jarfiles"&gt;wordcountmapreducepartitioner.jar&lt;/a&gt;' with the same input files and a different output path. This has map, partition and reduce phases configured in the driver class. Logs for the run are &lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples/tree/main/Hadoop/MapReduce/logs"&gt;here&lt;/a&gt; and code for the same is &lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples/tree/main/Hadoop/MapReduce/eclipseprojects/wordcountmapreducepartitioner"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GPcQXzsX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915695861/b6luheBzE.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GPcQXzsX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915695861/b6luheBzE.png" alt="Untitled 38.png" width="880" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The output for the MapReduce with partitioner is as follows. As per the partitioner logic &lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples/blob/main/Hadoop/MapReduce/eclipseprojects/wordcountmapreducepartitioner/src/wordcountpackage/WordCountPartitioner.java"&gt;here&lt;/a&gt;, for each letter at the starting of the word, there will be a different output file created. This means we are creating 26 partitions which will create same number of reducers to process the records Example: all the words starting with letter 'a' will end up in &lt;code&gt;'part-r-00001'&lt;/code&gt; file with the count.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---ajSpk0T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915707491/GZ7ZeN-dI.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---ajSpk0T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915707491/GZ7ZeN-dI.png" alt="Untitled 39.png" width="880" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---4WU-EH7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915714608/3pywbng6r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---4WU-EH7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1628915714608/3pywbng6r.png" alt="Untitled 40.png" width="880" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;We have seen a practical example of wordcount with MapReduce as promised in my last &lt;a href="https://dev.to/maninekkalapudi/hadoop-mapreduce-a-programming-paradigm-5h4g-temp-slug-1504943"&gt;post&lt;/a&gt;. This is an exhaustive guide to capture most known ways to create and execute the MapReduce programs in Java.&lt;/p&gt;

&lt;p&gt;MapReduce as a compute has lost its edge to new compute framework like Spark. But do you know that we can use the MapReduce to ingest the data into HDFS from an RDBMS source? or write SQL like queries to execute MapReduce job? We will discuss about those in detail in my next blog posts. Stay tuned!&lt;/p&gt;

&lt;p&gt;For now though, I'll delete the cloud resource that I've spun up for the tutorial. If you did the same, please delete the resource you have created else you'll end up with something like this&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/forrestbrazeal/status/1389622850567421952?s=20"&gt;https://twitter.com/forrestbrazeal/status/1389622850567421952?s=20&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://trendytech.in/"&gt;Big Data course&lt;/a&gt; by &lt;a href="https://www.linkedin.com/in/bigdatabysumit/"&gt;Sumit M&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/maninekkalapudi/dataengineeringbyexamples"&gt;https://github.com/maninekkalapudi/dataengineeringbyexamples&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If my conent helped you in anyway and like to contribute to my knowledge quest and sharing, you can contribute to me here&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Mani&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Hadoop MapReduce - A Programming Paradigm</title>
      <dc:creator>maninekkalapudi</dc:creator>
      <pubDate>Sun, 01 Aug 2021 12:43:33 +0000</pubDate>
      <link>https://forem.com/maninekkalapudi/hadoop-mapreduce-a-programming-paradigm-4cpd</link>
      <guid>https://forem.com/maninekkalapudi/hadoop-mapreduce-a-programming-paradigm-4cpd</guid>
      <description>&lt;p&gt;Hello! Hope you're doing well. In my last &lt;a href="https://dev.to/maninekkalapudi/hdfs-hadoop-distributed-filesystem-f26-temp-slug-182135"&gt;post&lt;/a&gt; I've explained about the internals of HDFS in detail with hands-on examples. In this post we will discuss about MapReduce, a big data processing framework. It is not a mere compute framework or a tool. It is a completely new programming paradigm that simplifies the big data processing in parallel with key-value pairs. We'll discuss everything in detail with examples in this post. Let's dive in!&lt;/p&gt;

&lt;h3&gt;
  
  
  Topics covered in this post
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;What is MapReduce?&lt;/li&gt;
&lt;li&gt;Traditional Programming vs MapReduce&lt;/li&gt;
&lt;li&gt;Higher Order Functions &lt;/li&gt;
&lt;li&gt;MapReduce Framework Components&lt;/li&gt;
&lt;li&gt;MapReduce on Hadoop Cluster&lt;/li&gt;
&lt;li&gt;MapReduce with Combiner&lt;/li&gt;
&lt;li&gt;MapReduce with Partitioner&lt;/li&gt;
&lt;li&gt;Wordcount example in MapReduce&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  1. What is MapReduce?
&lt;/h3&gt;

&lt;p&gt;MapReduce is a distributed parallel compute framework, and it was developed by engineers at Google around 2004. This new framework addressed the challenges Google was facing at time to process large volumes of data for indexing websites for its search engine.&lt;/p&gt;

&lt;p&gt;Suppose when a user search for "shopping" on Google we will receive all the shopping websites or businesses most relevant to the term. To produce such relevant search results Google must crawl through every website on the internet and understand what a user might be looking for in each website and group similar websites.&lt;/p&gt;

&lt;p&gt;Of course, this is an oversimplification of the how the search works. But our focus is to understand how Google engineers came up with a solution to understand every website (search engine indexing) at planetary scale through big data processing.&lt;/p&gt;

&lt;p&gt;This is probably the first time where a group of inexpensive computers connected over a network (in the form of a cluster) to perform data processing in parallel. Distribution among the nodes alone was not a sufficient answer [2]. This distribution of work must be performed in parallel for the following three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The processing must be able to expand and contract automatically&lt;/li&gt;
&lt;li&gt;The processing must be able to proceed regardless of failures in the network or the individual systems&lt;/li&gt;
&lt;li&gt;Developers leveraging this approach must be able to create services that are easy to leverage by other developers. Therefore, this approach must be independent of where the data and computations have executed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;MapReduce solved all the above problems by abstracting the job orchestration by providing APIs out of the box to the end user to overlook all the steps it performed.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Traditional Programming vs MapReduce.
&lt;/h3&gt;

&lt;p&gt;In traditional programming style when we write a program in any programing language of our choice, the program runs on the machine and the data is present on the same machine. This is a very efficient way to process the small-scale data and it can scale up to few GBs easily.&lt;/p&gt;

&lt;p&gt;However, In MapReduce style, the data is present on a group of machines and the program is moved to all the machines where the relevant data is present, and the data is processed locally on those machines. This avoids the data transfer over the network (which is a precious resource in datacenters) to the machine which has the program. This is especially true when the data size is massive.&lt;/p&gt;

&lt;p&gt;In my last &lt;a href="https://dev.to/maninekkalapudi/hdfs-hadoop-distributed-filesystem-f26-temp-slug-182135"&gt;post&lt;/a&gt; we understood how a distributed filesystem works. A Hadoop cluster can also perform the big data processing using MapReduce framework on the same node which store the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Higher Order Functions
&lt;/h3&gt;

&lt;p&gt;Before we understand how MapReduce we need to understand a programming concept called Higher Order Functions. All the modern programming languages support higher order functions. A &lt;a href="https://en.wikipedia.org/wiki/Higher-order_function"&gt;higher order function&lt;/a&gt; is a function that at least does one of the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;takes one or more functions as arguments&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;returns a function as its result&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;map&lt;/code&gt; is a higher order function. It takes two parameters:

&lt;ol&gt;
&lt;li&gt;A function which performs a task (Ex: multiply number by 2)&lt;/li&gt;
&lt;li&gt;List of values (Ex: List of numbers)&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;map&lt;/code&gt; function takes a list of values and return same number of values in the result after applying a specified function. Here is an example of a map function in python&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Map function - Python example:
# Define a function which takes a parameter(a) and returns 2*a
def double(a):
    return a*2

# Create a list with some numbers
lst = [1,3,5,7,9]

# map the double function to the list of values in list 'lst'
double_lst = map(double, lst)

# Printing map object double_lst will give you the address of the map
print(double_lst)
`&amp;lt;map obect at 0x000001756B5FACF8&amp;gt;`

# Convert the map object to list and print the results
print(list(double_lst))

#[2, 6, 10, 14, 18] -&amp;gt; result of map function. Every number in the list is doubled

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;reduce&lt;/code&gt; is also a higher order function takes a list of values along with the function as parameters and returns a single value. Here is an example of reduce function in python
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import functools # This is a python library which has reduce function
# create a list of numbers
lst = [100, 353, 565, 976, 128, 232]

# Define a function which takes two numbers and gives the higher number as the output
def greater(a,b):
    if a &amp;gt; b:
        return a
    else:
        return b

print(functools.reduce(greater, lst))

# 976 -&amp;gt; Output of reduce function. It returns the highest numbers in the list

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the above example we get an understanding of how a map and a reduce function work independently. This idea can be generalized to any type of tasks, and we'll see this for wordcount problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. MapReduce Framework Components
&lt;/h3&gt;

&lt;p&gt;The three important components of MapReduce framework are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Mapper&lt;/li&gt;
&lt;li&gt;Reducer&lt;/li&gt;
&lt;li&gt;Combiner&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;MapReduce library is written in Java and above components are Java classes. Every component in the MapReduce library works only with &lt;code&gt;&amp;lt;key-value&amp;gt;&lt;/code&gt; pairs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Mapper&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A Mapper/Map are individual tasks which maps the input &lt;code&gt;&amp;lt;key-value&amp;gt;&lt;/code&gt; pairs to an intermediate &lt;code&gt;&amp;lt;key-value&amp;gt;&lt;/code&gt; pairs. The transformed intermediate records need to be of the same type as the input records. &lt;strong&gt;The output of the Mapper is not the final output, and the output will be passed to a Reducer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We will understand this with a word count problem. For the Mapper, the input will be a &lt;code&gt;&amp;lt;rk (randomkey), line&amp;gt;&lt;/code&gt; and the output of the Mapper will be &lt;code&gt;&amp;lt;word, 1&amp;gt;&lt;/code&gt;. Every line in the input key-value will be split into words and each word will have the count of 1, even if the word is repeating. the input key (randomkey) will be ignored.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FD5b7Yqs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627820806932/9RMxVrTDu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FD5b7Yqs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627820806932/9RMxVrTDu.png" alt="Untitled 3.png" width="768" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our input to the Mapper or MapReduce program will be raw text for a wordcount problem. But how did we get the &lt;code&gt;&amp;lt;rk (randomkey), line&amp;gt;&lt;/code&gt; as input to the Mapper? We'll see that in the next section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Reducer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A Reducer works on the intermediate output from the Mappers and aggregates the results&lt;/strong&gt;. &lt;strong&gt;The output of the Reducer is the final output&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In the below example, the output from the Mapper contains every word with count 1. The Reducer will take this input and aggregates (sum up) the count for each distinct keyword.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cfpJDaAK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627820843244/JBIamRSbU.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cfpJDaAK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627820843244/JBIamRSbU.png" alt="Untitled 1.png" width="635" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Combiner&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A combiner will have the same aggregation logic as the Reducer (in most cases), and it runs along with the Mapper on the same machine. It performs the local aggregations at Mapper level before sending the data to a Reducer for final aggregation. Thus, decreasing the amount of data transfer from Mapper to Reducer by a huge degree.&lt;/p&gt;

&lt;p&gt;Combiners work fine for the aggregations like count and sum, but one must be careful when implementing aggregation like average. A combiner and a Reducer should not perform average at once.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. MapReduce on Hadoop Cluster
&lt;/h3&gt;

&lt;p&gt;Typically, the compute nodes and the storage nodes are the same in a HDFS cluster, that is, the MapReduce framework and the HDFS are running on the same set of nodes.&lt;/p&gt;

&lt;p&gt;As shown below, A client will send the program (jar file containing Mapper, Reducer and Combiner classes along with all necessary libraries) to each node where the relevant data is present in &lt;a href="https://en.wikipedia.org/wiki/Serialization"&gt;serialized format&lt;/a&gt;, The computations will take place on the same machine. Now, we need to focus on how computations (map and reduce) are done.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SEk9FUcg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627820880430/h4Vc6x8_v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SEk9FUcg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627820880430/h4Vc6x8_v.png" alt="Untitled 2.png" width="880" height="560"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When the MapReduce program (Mapper and Reducer classes) is sent to the nodes, the framework will split the data into logical &lt;code&gt;InputSplit&lt;/code&gt;s and assign them to the Mapper using &lt;code&gt;InputFormat&lt;/code&gt;class. So, each block (128MB max) will be the input size to each Mapper. The number of Mappers that runs is equal to number of blocks. Sometimes the logical &lt;code&gt;InputSplit&lt;/code&gt;s are not enough, we can use &lt;code&gt;RecordReader&lt;/code&gt; class for the splitting as per the special case.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RecordReader&lt;/code&gt; class typically, converts the byte-oriented view of the input, provided by the InputSplit, and presents a record-oriented view for the Mapper &amp;amp; Reducer tasks for processing.&lt;/li&gt;
&lt;li&gt;What is byte-oriented view and record-oriented view? HDFS splits files as blocks (byte-oriented view). So, it is not following a logical split, meaning a part of last record may reside in one block and rest of it is in another block. This seems correct for storage but while processing, the partial records in a block cannot be processed as it is. So, the record-oriented view comes into place. This will ensure to get the remaining part of the last record in the other block to make it a block of complete records. This is called input-split (record-oriented view).[5]&lt;/li&gt;
&lt;li&gt;When the &lt;code&gt;RecordReader&lt;/code&gt; reads each line in the file, it converts them into key-value pairs. The keys assigned to each line will be generated randomly. This will be the input (&lt;code&gt;&amp;lt;rk (randomkey), line&amp;gt;&lt;/code&gt;) to the Mapper class in our MapReduce program. The above steps are taken care by the MapReduce framework itself. We can also customize the behavior of &lt;code&gt;RecordReader&lt;/code&gt; as per our requirement. More on that &lt;a href="https://data-flair.training/blogs/hadoop-recordreader/"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wOKvw65u--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627820985583/xAROPK_Db.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wOKvw65u--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627820985583/xAROPK_Db.png" alt="Untitled 3.png" width="768" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Once the Mapper receives the output from the &lt;code&gt;RecordReader&lt;/code&gt;, the random generated keys in the input will be ignored and only the values will be considered by the Mapper for processing. In our wordcount example, we will consider only the lines and ignore random keys&lt;/li&gt;
&lt;li&gt;The output of each Mapper will be intermediate output and it is stored on the disk once it finishes the processing. Since Mappers runs on each node on HDFS cluster, the mapper stage will be executed parallelly and it is monitored by &lt;code&gt;JobTracker&lt;/code&gt; in the framework. The output of the Map stage for a wordcount example is as mentioned below:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--h6ogvyHs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627821009620/j8RfsynlL.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--h6ogvyHs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627821009620/j8RfsynlL.png" alt="Untitled.png" width="631" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After all the Mappers are done with the processing, the data is sorted on a disk and sent to another node within the cluster or one of the nodes which performed Map stage. This operation is called &lt;strong&gt;Sorting and Shuffle,&lt;/strong&gt; and the node to which the data is called &lt;strong&gt;Reducer&lt;/strong&gt;. Without a Reducer, there's no &lt;strong&gt;Sorting and Shuffle&lt;/strong&gt; shuffling phase. Every mapper will create an output file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gfzjSxOz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627821060643/uT2f5K8RD.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gfzjSxOz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627821060643/uT2f5K8RD.png" alt="Untitled 4.png" width="880" height="494"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Reducer will aggregate the results that it receives from all the mappers in key-value pairs. The final output will be stored in the location provided by the user. For the wordcount example, the output will be &lt;code&gt;&amp;lt;word, count of all the occurrences&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;All the work should be done at the Mappers because it runs in parallel. Only the final aggregation should take place at the Reducer. The output of the reduce phase will be stored in a single output file.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. MapReduce with Combiner
&lt;/h3&gt;

&lt;p&gt;A Combiner is an optional class that operates by accepting the inputs from the Map and thereafter passing the output key-value pairs to the Reducer. The Combiner has the exact same logic (in most cases) as the Reducer, and it performs the local aggregations at each Map level.&lt;/p&gt;

&lt;p&gt;Since it runs along with the Mapper, the Combiner also runs in parallel. Having a Combiner with the Mappers will reduce the amount of data shuffled between Mappers and Reducer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--km-7GMaD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627821105669/x-KSueePz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--km-7GMaD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627821105669/x-KSueePz.png" alt="Untitled 5.png" width="880" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The output of the MapReduce with combiner looks as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gsTTh19B--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627821123204/E2vcDAKEz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gsTTh19B--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627821123204/E2vcDAKEz.png" alt="Untitled 6.png" width="880" height="494"&gt;&lt;/a&gt;Source: &lt;a href="https://tutorials.freshersnow.com/map-reduce-tutorial/combiner-in-hadoop-mapreduce/"&gt;https://tutorials.freshersnow.com/map-reduce-tutorial/combiner-in-hadoop-mapreduce/&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  7. MapReduce with Partitioner
&lt;/h3&gt;

&lt;p&gt;A partitioner will partition the intermediate output from the Mappers into different groups. The partitions will be created based on the user-defined function which work as a hash function.&lt;/p&gt;

&lt;p&gt;In normal MapReduce application, we will have only one Reducer that aggregates all the data in the final stage but with addition of partitioners each partition will be aggregated by a separate reducer. So, the number of partitioners is equal to number of reducers. Now, Let's discuss a bit about user-defined function for a partition.&lt;/p&gt;

&lt;p&gt;In our wordcount example, let suppose we want to create groups of words based on alphabetical order, i.e., all the words starting with letter 'a' should be present in a group and likewise for all the letters. The partitioner will require 26 conditions (each for a letter) to achieve this, and we'll have 26 output files from same num reducers.&lt;/p&gt;

&lt;p&gt;MapReduce with Partitioner will look as mentioned below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--J0pq-_0F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627821187394/bB-Va-0UJ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--J0pq-_0F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.hashnode.com/res/hashnode/image/upload/v1627821187394/bB-Va-0UJ.png" alt="Untitled 7.png" width="880" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Wordcount example in MapReduce
&lt;/h3&gt;

&lt;p&gt;Well, that's a lot of content for a single blog post. I will cover the Wordcount example with great detail in my next post.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;As we have seen in the wordcount example, the data from every website on the internet is processed by Google. This is later used to create inverted indices for search engine indexing. More on the same &lt;a href="https://www.deepcrawl.com/knowledge/technical-seo-library/search-engine-indexing/#:~:text=An%20inverted%20index%20is%20a,to%20store%20and%20retrieve%20data"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;MapReduce is a pioneer big data processing framework. Though it does most of the processing in parallel with Mappers, it is certainly slow due to its frequent disk writes and reads at each stage. Newer and better compute engines like &lt;a href="https://spark.apache.org/"&gt;Apache Spark&lt;/a&gt; does a better job with in-memory computing but the understanding of the core functionality of a distributed data processing comes from MapReduce.&lt;/p&gt;

&lt;p&gt;I've not covered &lt;code&gt;jobtracker&lt;/code&gt; for Mapreduce jobs in this post since it would be incomplete to talk about it without mentioning YARN (An operating system for big data applications). We'll discuss about YARN in a separate post.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sources
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://trendytech.in/"&gt;Big Data course&lt;/a&gt; by &lt;a href="https://www.linkedin.com/in/bigdatabysumit/"&gt;Sumit M&lt;/a&gt;. All drawn pictures are courtesy of the course&lt;/li&gt;
&lt;li&gt;&lt;a href="https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf"&gt;MapReduce Paper by Google&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.dummies.com/programming/big-data/engineering/big-data-and-the-origins-of-mapreduce/"&gt;https://www.dummies.com/programming/big-data/engineering/big-data-and-the-origins-of-mapreduce/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html"&gt;https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://data-flair.training/blogs/hadoop-recordreader/"&gt;https://data-flair.training/blogs/hadoop-recordreader/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stackoverflow.com/questions/34871351/hadoop-input-splits-and-record-reader"&gt;https://stackoverflow.com/questions/34871351/hadoop-input-splits-and-record-reader&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.tutorialspoint.com/map_reduce/map_reduce_combiners.htm"&gt;https://www.tutorialspoint.com/map_reduce/map_reduce_combiners.htm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.tutorialspoint.com/map_reduce/map_reduce_partitioner.htm"&gt;https://www.tutorialspoint.com/map_reduce/map_reduce_partitioner.htm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nareshit.com/mapreduce-online-training/"&gt;Logo image&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Thanks, Mani&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
