<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Piyush Raj</title>
    <description>The latest articles on Forem by Piyush Raj (@piyushraj).</description>
    <link>https://forem.com/piyushraj</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F910441%2Fb6b62488-6650-4d19-8dc0-941d3dc83850.jpeg</url>
      <title>Forem: Piyush Raj</title>
      <link>https://forem.com/piyushraj</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/piyushraj"/>
    <language>en</language>
    <item>
      <title>Pandas - EDA Case Study - 7 Days of Pandas</title>
      <dc:creator>Piyush Raj</dc:creator>
      <pubDate>Tue, 27 Dec 2022 10:25:26 +0000</pubDate>
      <link>https://forem.com/piyushraj/pandas-eda-case-study-7-days-of-pandas-4le4</link>
      <guid>https://forem.com/piyushraj/pandas-eda-case-study-7-days-of-pandas-4le4</guid>
      <description>&lt;p&gt;Welcome to the seventh (and final) article in the "7 Days of Pandas" series where we cover the &lt;code&gt;pandas&lt;/code&gt; library in Python which is used for data manipulation.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-read-and-write-data-from-csv-files-7-days-of-pandas-1o4f"&gt;first article&lt;/a&gt; of the series, we looked at how to read and write CSV files with Pandas. In this tutorial, we will look at some of the most common operations that we perform on a dataframe in Pandas.  &lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-basic-data-manipulation-7-days-of-pandas-4c47"&gt;second article&lt;/a&gt;, we looked at how to perform basic data manipulation.   &lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-basic-exploratory-data-analysis-7-days-of-pandas-3816"&gt;third article&lt;/a&gt;, we looked at how to perform EDA (exploratory data analysis) with Pandas.  &lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-handling-missing-values-7-days-of-pandas-a16"&gt;fourth article&lt;/a&gt;, we looked at how to handle missing values in a dataframe.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-aggregating-and-grouping-data-7-days-of-pandas-p60"&gt;fifth article&lt;/a&gt; we looked at how to aggregate and group data in Pandas.&lt;/p&gt;

&lt;p&gt;In the &lt;a href=""&gt;sixth article&lt;/a&gt; we looked at how to visualize the data in a pandas dataframe.&lt;/p&gt;

&lt;p&gt;In this tutorial, we will look apply the methods learned so far in a case-study. We'll be working with a &lt;a href="https://www.kaggle.com/code/kashnitsky/a1-demo-pandas-and-uci-adult-dataset/notebook"&gt;demo assignment&lt;/a&gt; on performing EDA from the open source &lt;a href="https://mlcourse.ai/book/index.html"&gt;mlcourse.ai&lt;/a&gt; project.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Task
&lt;/h2&gt;

&lt;p&gt;In this task you should use Pandas to answer a few questions about the &lt;a href="https://archive.ics.uci.edu/ml/datasets/Adult"&gt;Adult dataset&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Unique values of all features (for more information, please see the links above):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;age&lt;/code&gt;: continuous.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;workclass&lt;/code&gt;: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;fnlwgt&lt;/code&gt;: continuous.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;education&lt;/code&gt;: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;education-num&lt;/code&gt;: continuous.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;marital-status&lt;/code&gt;: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;occupation&lt;/code&gt;: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;relationship&lt;/code&gt;: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;race&lt;/code&gt;: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sex&lt;/code&gt;: Female, Male.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;capital-gain&lt;/code&gt;: continuous.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;capital-loss&lt;/code&gt;: continuous.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;hours-per-week&lt;/code&gt;: continuous.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;native-country&lt;/code&gt;: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&amp;amp;Tobago, Peru, Hong, Holand-Netherlands.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;salary&lt;/code&gt;: &amp;gt;50K,&amp;lt;=50K&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's now read the data as a dataframe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# read data from csv file
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"adult.data.csv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# display the first five rows
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;age&lt;/th&gt;
      &lt;th&gt;workclass&lt;/th&gt;
      &lt;th&gt;fnlwgt&lt;/th&gt;
      &lt;th&gt;education&lt;/th&gt;
      &lt;th&gt;education-num&lt;/th&gt;
      &lt;th&gt;marital-status&lt;/th&gt;
      &lt;th&gt;occupation&lt;/th&gt;
      &lt;th&gt;relationship&lt;/th&gt;
      &lt;th&gt;race&lt;/th&gt;
      &lt;th&gt;sex&lt;/th&gt;
      &lt;th&gt;capital-gain&lt;/th&gt;
      &lt;th&gt;capital-loss&lt;/th&gt;
      &lt;th&gt;hours-per-week&lt;/th&gt;
      &lt;th&gt;native-country&lt;/th&gt;
      &lt;th&gt;salary&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;39&lt;/td&gt;
      &lt;td&gt;State-gov&lt;/td&gt;
      &lt;td&gt;77516&lt;/td&gt;
      &lt;td&gt;Bachelors&lt;/td&gt;
      &lt;td&gt;13&lt;/td&gt;
      &lt;td&gt;Never-married&lt;/td&gt;
      &lt;td&gt;Adm-clerical&lt;/td&gt;
      &lt;td&gt;Not-in-family&lt;/td&gt;
      &lt;td&gt;White&lt;/td&gt;
      &lt;td&gt;Male&lt;/td&gt;
      &lt;td&gt;2174&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;40&lt;/td&gt;
      &lt;td&gt;United-States&lt;/td&gt;
      &lt;td&gt;&amp;lt;=50K&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;50&lt;/td&gt;
      &lt;td&gt;Self-emp-not-inc&lt;/td&gt;
      &lt;td&gt;83311&lt;/td&gt;
      &lt;td&gt;Bachelors&lt;/td&gt;
      &lt;td&gt;13&lt;/td&gt;
      &lt;td&gt;Married-civ-spouse&lt;/td&gt;
      &lt;td&gt;Exec-managerial&lt;/td&gt;
      &lt;td&gt;Husband&lt;/td&gt;
      &lt;td&gt;White&lt;/td&gt;
      &lt;td&gt;Male&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;13&lt;/td&gt;
      &lt;td&gt;United-States&lt;/td&gt;
      &lt;td&gt;&amp;lt;=50K&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;Private&lt;/td&gt;
      &lt;td&gt;215646&lt;/td&gt;
      &lt;td&gt;HS-grad&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;Divorced&lt;/td&gt;
      &lt;td&gt;Handlers-cleaners&lt;/td&gt;
      &lt;td&gt;Not-in-family&lt;/td&gt;
      &lt;td&gt;White&lt;/td&gt;
      &lt;td&gt;Male&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;40&lt;/td&gt;
      &lt;td&gt;United-States&lt;/td&gt;
      &lt;td&gt;&amp;lt;=50K&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;53&lt;/td&gt;
      &lt;td&gt;Private&lt;/td&gt;
      &lt;td&gt;234721&lt;/td&gt;
      &lt;td&gt;11th&lt;/td&gt;
      &lt;td&gt;7&lt;/td&gt;
      &lt;td&gt;Married-civ-spouse&lt;/td&gt;
      &lt;td&gt;Handlers-cleaners&lt;/td&gt;
      &lt;td&gt;Husband&lt;/td&gt;
      &lt;td&gt;Black&lt;/td&gt;
      &lt;td&gt;Male&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;40&lt;/td&gt;
      &lt;td&gt;United-States&lt;/td&gt;
      &lt;td&gt;&amp;lt;=50K&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;Private&lt;/td&gt;
      &lt;td&gt;338409&lt;/td&gt;
      &lt;td&gt;Bachelors&lt;/td&gt;
      &lt;td&gt;13&lt;/td&gt;
      &lt;td&gt;Married-civ-spouse&lt;/td&gt;
      &lt;td&gt;Prof-specialty&lt;/td&gt;
      &lt;td&gt;Wife&lt;/td&gt;
      &lt;td&gt;Black&lt;/td&gt;
      &lt;td&gt;Female&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;40&lt;/td&gt;
      &lt;td&gt;Cuba&lt;/td&gt;
      &lt;td&gt;&amp;lt;=50K&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;1. How many men and women (sex feature) are represented in this dataset?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# we need to get the value counts in the "sex" column
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"sex"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Male      21790
Female    10771
Name: sex, dtype: int64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;2. What is the average age (age feature) of women?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# filter for women and then get their average age
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"sex"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="s"&gt;"Female"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;"age"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;36.85823043357163
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;3. What is the percentage of German citizens (native-country feature)?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# find the number of German citizens and divide that by the total population
&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"native-country"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="s"&gt;"Germany"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0.42074874850281013
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Only 0.42%&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4-5. What are the mean and standard deviation of age for those who earn more than 50K per year (salary feature) and those who earn less than 50K per year?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# group on salary and then calculate the mean and std for the age
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"salary"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s"&gt;"age"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;'mean'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'std'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;mean&lt;/th&gt;
      &lt;th&gt;std&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;salary&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;&amp;lt;=50K&lt;/th&gt;
      &lt;td&gt;36.783738&lt;/td&gt;
      &lt;td&gt;14.020088&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;&amp;gt;50K&lt;/th&gt;
      &lt;td&gt;44.249841&lt;/td&gt;
      &lt;td&gt;10.519028&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;6. Is it true that people who earn more than 50K have at least high school education? (education – Bachelors, Prof-school, Assoc-acdm, Assoc-voc, Masters or Doctorate feature)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# filter the dataframe for &amp;gt;50k and see the distribution of education
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"salary"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"&amp;gt;50K"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;"education"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Bachelors       2221
HS-grad         1675
Some-college    1387
Masters          959
Prof-school      423
Assoc-voc        361
Doctorate        306
Assoc-acdm       265
10th              62
11th              60
7th-8th           40
12th              33
9th               27
5th-6th           16
1st-4th            6
Name: education, dtype: int64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;No, we can see that there are individuals with less than high-school education in the &amp;gt;50K bucket.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Display age statistics for each race (race feature) and each gender (sex feature). Use groupby() and describe(). Find the maximum age of men of Amer-Indian-Eskimo race.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# for each race
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"race"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s"&gt;"age"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;count&lt;/th&gt;
      &lt;th&gt;mean&lt;/th&gt;
      &lt;th&gt;std&lt;/th&gt;
      &lt;th&gt;min&lt;/th&gt;
      &lt;th&gt;25%&lt;/th&gt;
      &lt;th&gt;50%&lt;/th&gt;
      &lt;th&gt;75%&lt;/th&gt;
      &lt;th&gt;max&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;race&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;Amer-Indian-Eskimo&lt;/th&gt;
      &lt;td&gt;311.0&lt;/td&gt;
      &lt;td&gt;37.173633&lt;/td&gt;
      &lt;td&gt;12.447130&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;28.0&lt;/td&gt;
      &lt;td&gt;35.0&lt;/td&gt;
      &lt;td&gt;45.5&lt;/td&gt;
      &lt;td&gt;82.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Asian-Pac-Islander&lt;/th&gt;
      &lt;td&gt;1039.0&lt;/td&gt;
      &lt;td&gt;37.746872&lt;/td&gt;
      &lt;td&gt;12.825133&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;28.0&lt;/td&gt;
      &lt;td&gt;36.0&lt;/td&gt;
      &lt;td&gt;45.0&lt;/td&gt;
      &lt;td&gt;90.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Black&lt;/th&gt;
      &lt;td&gt;3124.0&lt;/td&gt;
      &lt;td&gt;37.767926&lt;/td&gt;
      &lt;td&gt;12.759290&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;28.0&lt;/td&gt;
      &lt;td&gt;36.0&lt;/td&gt;
      &lt;td&gt;46.0&lt;/td&gt;
      &lt;td&gt;90.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Other&lt;/th&gt;
      &lt;td&gt;271.0&lt;/td&gt;
      &lt;td&gt;33.457565&lt;/td&gt;
      &lt;td&gt;11.538865&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;25.0&lt;/td&gt;
      &lt;td&gt;31.0&lt;/td&gt;
      &lt;td&gt;41.0&lt;/td&gt;
      &lt;td&gt;77.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;White&lt;/th&gt;
      &lt;td&gt;27816.0&lt;/td&gt;
      &lt;td&gt;38.769881&lt;/td&gt;
      &lt;td&gt;13.782306&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;28.0&lt;/td&gt;
      &lt;td&gt;37.0&lt;/td&gt;
      &lt;td&gt;48.0&lt;/td&gt;
      &lt;td&gt;90.0&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# for each gender
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"sex"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s"&gt;"age"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;count&lt;/th&gt;
      &lt;th&gt;mean&lt;/th&gt;
      &lt;th&gt;std&lt;/th&gt;
      &lt;th&gt;min&lt;/th&gt;
      &lt;th&gt;25%&lt;/th&gt;
      &lt;th&gt;50%&lt;/th&gt;
      &lt;th&gt;75%&lt;/th&gt;
      &lt;th&gt;max&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;sex&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;Female&lt;/th&gt;
      &lt;td&gt;10771.0&lt;/td&gt;
      &lt;td&gt;36.858230&lt;/td&gt;
      &lt;td&gt;14.013697&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;25.0&lt;/td&gt;
      &lt;td&gt;35.0&lt;/td&gt;
      &lt;td&gt;46.0&lt;/td&gt;
      &lt;td&gt;90.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Male&lt;/th&gt;
      &lt;td&gt;21790.0&lt;/td&gt;
      &lt;td&gt;39.433547&lt;/td&gt;
      &lt;td&gt;13.370630&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;29.0&lt;/td&gt;
      &lt;td&gt;38.0&lt;/td&gt;
      &lt;td&gt;48.0&lt;/td&gt;
      &lt;td&gt;90.0&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# for each race and gender
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"race"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"sex"&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="s"&gt;"age"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;count&lt;/th&gt;
      &lt;th&gt;mean&lt;/th&gt;
      &lt;th&gt;std&lt;/th&gt;
      &lt;th&gt;min&lt;/th&gt;
      &lt;th&gt;25%&lt;/th&gt;
      &lt;th&gt;50%&lt;/th&gt;
      &lt;th&gt;75%&lt;/th&gt;
      &lt;th&gt;max&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;race&lt;/th&gt;
      &lt;th&gt;sex&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th rowspan="2"&gt;Amer-Indian-Eskimo&lt;/th&gt;
      &lt;th&gt;Female&lt;/th&gt;
      &lt;td&gt;119.0&lt;/td&gt;
      &lt;td&gt;37.117647&lt;/td&gt;
      &lt;td&gt;13.114991&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;27.0&lt;/td&gt;
      &lt;td&gt;36.0&lt;/td&gt;
      &lt;td&gt;46.00&lt;/td&gt;
      &lt;td&gt;80.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Male&lt;/th&gt;
      &lt;td&gt;192.0&lt;/td&gt;
      &lt;td&gt;37.208333&lt;/td&gt;
      &lt;td&gt;12.049563&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;28.0&lt;/td&gt;
      &lt;td&gt;35.0&lt;/td&gt;
      &lt;td&gt;45.00&lt;/td&gt;
      &lt;td&gt;82.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th rowspan="2"&gt;Asian-Pac-Islander&lt;/th&gt;
      &lt;th&gt;Female&lt;/th&gt;
      &lt;td&gt;346.0&lt;/td&gt;
      &lt;td&gt;35.089595&lt;/td&gt;
      &lt;td&gt;12.300845&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;25.0&lt;/td&gt;
      &lt;td&gt;33.0&lt;/td&gt;
      &lt;td&gt;43.75&lt;/td&gt;
      &lt;td&gt;75.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Male&lt;/th&gt;
      &lt;td&gt;693.0&lt;/td&gt;
      &lt;td&gt;39.073593&lt;/td&gt;
      &lt;td&gt;12.883944&lt;/td&gt;
      &lt;td&gt;18.0&lt;/td&gt;
      &lt;td&gt;29.0&lt;/td&gt;
      &lt;td&gt;37.0&lt;/td&gt;
      &lt;td&gt;46.00&lt;/td&gt;
      &lt;td&gt;90.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th rowspan="2"&gt;Black&lt;/th&gt;
      &lt;th&gt;Female&lt;/th&gt;
      &lt;td&gt;1555.0&lt;/td&gt;
      &lt;td&gt;37.854019&lt;/td&gt;
      &lt;td&gt;12.637197&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;28.0&lt;/td&gt;
      &lt;td&gt;37.0&lt;/td&gt;
      &lt;td&gt;46.00&lt;/td&gt;
      &lt;td&gt;90.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Male&lt;/th&gt;
      &lt;td&gt;1569.0&lt;/td&gt;
      &lt;td&gt;37.682600&lt;/td&gt;
      &lt;td&gt;12.882612&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;27.0&lt;/td&gt;
      &lt;td&gt;36.0&lt;/td&gt;
      &lt;td&gt;46.00&lt;/td&gt;
      &lt;td&gt;90.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th rowspan="2"&gt;Other&lt;/th&gt;
      &lt;th&gt;Female&lt;/th&gt;
      &lt;td&gt;109.0&lt;/td&gt;
      &lt;td&gt;31.678899&lt;/td&gt;
      &lt;td&gt;11.631599&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;23.0&lt;/td&gt;
      &lt;td&gt;29.0&lt;/td&gt;
      &lt;td&gt;39.00&lt;/td&gt;
      &lt;td&gt;74.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Male&lt;/th&gt;
      &lt;td&gt;162.0&lt;/td&gt;
      &lt;td&gt;34.654321&lt;/td&gt;
      &lt;td&gt;11.355531&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;26.0&lt;/td&gt;
      &lt;td&gt;32.0&lt;/td&gt;
      &lt;td&gt;42.00&lt;/td&gt;
      &lt;td&gt;77.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th rowspan="2"&gt;White&lt;/th&gt;
      &lt;th&gt;Female&lt;/th&gt;
      &lt;td&gt;8642.0&lt;/td&gt;
      &lt;td&gt;36.811618&lt;/td&gt;
      &lt;td&gt;14.329093&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;25.0&lt;/td&gt;
      &lt;td&gt;35.0&lt;/td&gt;
      &lt;td&gt;46.00&lt;/td&gt;
      &lt;td&gt;90.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Male&lt;/th&gt;
      &lt;td&gt;19174.0&lt;/td&gt;
      &lt;td&gt;39.652498&lt;/td&gt;
      &lt;td&gt;13.436029&lt;/td&gt;
      &lt;td&gt;17.0&lt;/td&gt;
      &lt;td&gt;29.0&lt;/td&gt;
      &lt;td&gt;38.0&lt;/td&gt;
      &lt;td&gt;49.00&lt;/td&gt;
      &lt;td&gt;90.0&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;8. Among whom is the proportion of those who earn a lot (&amp;gt;50K) greater: married or single men (marital-status feature)? Consider as married those who have a marital-status starting with Married (Married-civ-spouse, Married-spouse-absent or Married-AF-spouse), the rest are considered bachelors.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# add a new column "is-married"
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"is-married"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"marital-status"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Married"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# display the dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;age&lt;/th&gt;
      &lt;th&gt;workclass&lt;/th&gt;
      &lt;th&gt;fnlwgt&lt;/th&gt;
      &lt;th&gt;education&lt;/th&gt;
      &lt;th&gt;education-num&lt;/th&gt;
      &lt;th&gt;marital-status&lt;/th&gt;
      &lt;th&gt;occupation&lt;/th&gt;
      &lt;th&gt;relationship&lt;/th&gt;
      &lt;th&gt;race&lt;/th&gt;
      &lt;th&gt;sex&lt;/th&gt;
      &lt;th&gt;capital-gain&lt;/th&gt;
      &lt;th&gt;capital-loss&lt;/th&gt;
      &lt;th&gt;hours-per-week&lt;/th&gt;
      &lt;th&gt;native-country&lt;/th&gt;
      &lt;th&gt;salary&lt;/th&gt;
      &lt;th&gt;is-married&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;39&lt;/td&gt;
      &lt;td&gt;State-gov&lt;/td&gt;
      &lt;td&gt;77516&lt;/td&gt;
      &lt;td&gt;Bachelors&lt;/td&gt;
      &lt;td&gt;13&lt;/td&gt;
      &lt;td&gt;Never-married&lt;/td&gt;
      &lt;td&gt;Adm-clerical&lt;/td&gt;
      &lt;td&gt;Not-in-family&lt;/td&gt;
      &lt;td&gt;White&lt;/td&gt;
      &lt;td&gt;Male&lt;/td&gt;
      &lt;td&gt;2174&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;40&lt;/td&gt;
      &lt;td&gt;United-States&lt;/td&gt;
      &lt;td&gt;&amp;lt;=50K&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;50&lt;/td&gt;
      &lt;td&gt;Self-emp-not-inc&lt;/td&gt;
      &lt;td&gt;83311&lt;/td&gt;
      &lt;td&gt;Bachelors&lt;/td&gt;
      &lt;td&gt;13&lt;/td&gt;
      &lt;td&gt;Married-civ-spouse&lt;/td&gt;
      &lt;td&gt;Exec-managerial&lt;/td&gt;
      &lt;td&gt;Husband&lt;/td&gt;
      &lt;td&gt;White&lt;/td&gt;
      &lt;td&gt;Male&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;13&lt;/td&gt;
      &lt;td&gt;United-States&lt;/td&gt;
      &lt;td&gt;&amp;lt;=50K&lt;/td&gt;
      &lt;td&gt;True&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;Private&lt;/td&gt;
      &lt;td&gt;215646&lt;/td&gt;
      &lt;td&gt;HS-grad&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;Divorced&lt;/td&gt;
      &lt;td&gt;Handlers-cleaners&lt;/td&gt;
      &lt;td&gt;Not-in-family&lt;/td&gt;
      &lt;td&gt;White&lt;/td&gt;
      &lt;td&gt;Male&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;40&lt;/td&gt;
      &lt;td&gt;United-States&lt;/td&gt;
      &lt;td&gt;&amp;lt;=50K&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;53&lt;/td&gt;
      &lt;td&gt;Private&lt;/td&gt;
      &lt;td&gt;234721&lt;/td&gt;
      &lt;td&gt;11th&lt;/td&gt;
      &lt;td&gt;7&lt;/td&gt;
      &lt;td&gt;Married-civ-spouse&lt;/td&gt;
      &lt;td&gt;Handlers-cleaners&lt;/td&gt;
      &lt;td&gt;Husband&lt;/td&gt;
      &lt;td&gt;Black&lt;/td&gt;
      &lt;td&gt;Male&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;40&lt;/td&gt;
      &lt;td&gt;United-States&lt;/td&gt;
      &lt;td&gt;&amp;lt;=50K&lt;/td&gt;
      &lt;td&gt;True&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;Private&lt;/td&gt;
      &lt;td&gt;338409&lt;/td&gt;
      &lt;td&gt;Bachelors&lt;/td&gt;
      &lt;td&gt;13&lt;/td&gt;
      &lt;td&gt;Married-civ-spouse&lt;/td&gt;
      &lt;td&gt;Prof-specialty&lt;/td&gt;
      &lt;td&gt;Wife&lt;/td&gt;
      &lt;td&gt;Black&lt;/td&gt;
      &lt;td&gt;Female&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;40&lt;/td&gt;
      &lt;td&gt;Cuba&lt;/td&gt;
      &lt;td&gt;&amp;lt;=50K&lt;/td&gt;
      &lt;td&gt;True&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"is-married"&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="s"&gt;"salary"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;is-married  salary
False       &amp;lt;=50K     0.935546
            &amp;gt;50K      0.064454
True        &amp;lt;=50K     0.563080
            &amp;gt;50K      0.436920
Name: salary, dtype: float64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We can see that amongst Married people, we have a higher proportion of people with salary &amp;gt;50K&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;9. What is the maximum number of hours a person works per week (hours-per-week feature)? How many people work such a number of hours, and what is the percentage of those who earn a lot (&amp;gt;50K) among them?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# max number of hourse a person works per week
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"hours-per-week"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;99
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# how many people work the above maximum number of hourse
&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"hours-per-week"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"hours-per-week"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;85
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# percentage of people in the above population that earn more than 50K
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"hours-per-week"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"hours-per-week"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()][&lt;/span&gt;&lt;span class="s"&gt;'salary'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;=50K    0.705882
&amp;gt;50K     0.294118
Name: salary, dtype: float64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Only 29%&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10. Count the average time of work (hours-per-week) for those who earn a little and a lot (salary) for each country (native-country). What will these be for Japan?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# group the data on native country and salary and find the average work time for each group
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;option_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'display.max_rows'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"native-country"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"salary"&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="s"&gt;"hours-per-week"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;native-country              salary
?                           &amp;lt;=50K     40.164760
                            &amp;gt;50K      45.547945
Cambodia                    &amp;lt;=50K     41.416667
                            &amp;gt;50K      40.000000
Canada                      &amp;lt;=50K     37.914634
                            &amp;gt;50K      45.641026
China                       &amp;lt;=50K     37.381818
                            &amp;gt;50K      38.900000
Columbia                    &amp;lt;=50K     38.684211
                            &amp;gt;50K      50.000000
Cuba                        &amp;lt;=50K     37.985714
                            &amp;gt;50K      42.440000
Dominican-Republic          &amp;lt;=50K     42.338235
                            &amp;gt;50K      47.000000
Ecuador                     &amp;lt;=50K     38.041667
                            &amp;gt;50K      48.750000
El-Salvador                 &amp;lt;=50K     36.030928
                            &amp;gt;50K      45.000000
England                     &amp;lt;=50K     40.483333
                            &amp;gt;50K      44.533333
France                      &amp;lt;=50K     41.058824
                            &amp;gt;50K      50.750000
Germany                     &amp;lt;=50K     39.139785
                            &amp;gt;50K      44.977273
Greece                      &amp;lt;=50K     41.809524
                            &amp;gt;50K      50.625000
Guatemala                   &amp;lt;=50K     39.360656
                            &amp;gt;50K      36.666667
Haiti                       &amp;lt;=50K     36.325000
                            &amp;gt;50K      42.750000
Holand-Netherlands          &amp;lt;=50K     40.000000
Honduras                    &amp;lt;=50K     34.333333
                            &amp;gt;50K      60.000000
Hong                        &amp;lt;=50K     39.142857
                            &amp;gt;50K      45.000000
Hungary                     &amp;lt;=50K     31.300000
                            &amp;gt;50K      50.000000
India                       &amp;lt;=50K     38.233333
                            &amp;gt;50K      46.475000
Iran                        &amp;lt;=50K     41.440000
                            &amp;gt;50K      47.500000
Ireland                     &amp;lt;=50K     40.947368
                            &amp;gt;50K      48.000000
Italy                       &amp;lt;=50K     39.625000
                            &amp;gt;50K      45.400000
Jamaica                     &amp;lt;=50K     38.239437
                            &amp;gt;50K      41.100000
Japan                       &amp;lt;=50K     41.000000
                            &amp;gt;50K      47.958333
Laos                        &amp;lt;=50K     40.375000
                            &amp;gt;50K      40.000000
Mexico                      &amp;lt;=50K     40.003279
                            &amp;gt;50K      46.575758
Nicaragua                   &amp;lt;=50K     36.093750
                            &amp;gt;50K      37.500000
Outlying-US(Guam-USVI-etc)  &amp;lt;=50K     41.857143
Peru                        &amp;lt;=50K     35.068966
                            &amp;gt;50K      40.000000
Philippines                 &amp;lt;=50K     38.065693
                            &amp;gt;50K      43.032787
Poland                      &amp;lt;=50K     38.166667
                            &amp;gt;50K      39.000000
Portugal                    &amp;lt;=50K     41.939394
                            &amp;gt;50K      41.500000
Puerto-Rico                 &amp;lt;=50K     38.470588
                            &amp;gt;50K      39.416667
Scotland                    &amp;lt;=50K     39.444444
                            &amp;gt;50K      46.666667
South                       &amp;lt;=50K     40.156250
                            &amp;gt;50K      51.437500
Taiwan                      &amp;lt;=50K     33.774194
                            &amp;gt;50K      46.800000
Thailand                    &amp;lt;=50K     42.866667
                            &amp;gt;50K      58.333333
Trinadad&amp;amp;Tobago             &amp;lt;=50K     37.058824
                            &amp;gt;50K      40.000000
United-States               &amp;lt;=50K     38.799127
                            &amp;gt;50K      45.505369
Vietnam                     &amp;lt;=50K     37.193548
                            &amp;gt;50K      39.200000
Yugoslavia                  &amp;lt;=50K     41.600000
                            &amp;gt;50K      49.500000
Name: hours-per-week, dtype: float64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# for japan
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"native-country"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="s"&gt;"Japan"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"salary"&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="s"&gt;"hours-per-week"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;salary
&amp;lt;=50K    41.000000
&amp;gt;50K     47.958333
Name: hours-per-week, dtype: float64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Pandas - Visualizing Dataframe Data - 7 Days of Pandas</title>
      <dc:creator>Piyush Raj</dc:creator>
      <pubDate>Mon, 26 Dec 2022 10:09:11 +0000</pubDate>
      <link>https://forem.com/piyushraj/pandas-visualizing-dataframe-data-7-days-of-pandas-1l0p</link>
      <guid>https://forem.com/piyushraj/pandas-visualizing-dataframe-data-7-days-of-pandas-1l0p</guid>
      <description>&lt;p&gt;Welcome to the sixth article in the "7 Days of Pandas" series where we cover the &lt;code&gt;pandas&lt;/code&gt; library in Python which is used for data manipulation.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-read-and-write-data-from-csv-files-7-days-of-pandas-1o4f"&gt;first article&lt;/a&gt; of the series, we looked at how to read and write CSV files with Pandas. In this tutorial, we will look at some of the most common operations that we perform on a dataframe in Pandas.  &lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-basic-data-manipulation-7-days-of-pandas-4c47"&gt;second article&lt;/a&gt;, we looked at how to perform basic data manipulation.   &lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-basic-exploratory-data-analysis-7-days-of-pandas-3816"&gt;third article&lt;/a&gt;, we looked at how to perform EDA (exploratory data analysis) with Pandas.  &lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-handling-missing-values-7-days-of-pandas-a16"&gt;fourth article&lt;/a&gt;, we looked at how to handle missing values in a dataframe.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-aggregating-and-grouping-data-7-days-of-pandas-p60"&gt;fifth article&lt;/a&gt; we looked at how to aggregate and group data in Pandas&lt;/p&gt;

&lt;p&gt;In this tutorial, we will look at how to plot data in a pandas dataframe with the help of some examples.&lt;/p&gt;

&lt;p&gt;Data visualizations are a great way to present data and can help us find insights that may not have been obvious with the data in just tabular form. For example, if you have the data of salaries of employees in an office, a bar chart would give you a much more intuitive feel for comparing them.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to visualize data in pandas dataframes?
&lt;/h2&gt;

&lt;p&gt;You can use the pandas dataframe &lt;a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html" rel="noopener noreferrer"&gt;&lt;code&gt;plot()&lt;/code&gt; function&lt;/a&gt; to create a plot from the dataframe values. It creates a matplotlib plot. You can specify the x and y values of the plot with &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; parameters respectively and the type of plot you want to create with the &lt;code&gt;kind&lt;/code&gt; parameter.&lt;/p&gt;

&lt;p&gt;Let's look at some common types of plots that you can create from pandas dataframe data.&lt;/p&gt;

&lt;p&gt;Before we begin, let's first import pandas and create a sample dataframe that we will be using throughout this tutorial.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# employee data
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tim&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Shaym&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Noor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Esha&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sam&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;James&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Lily&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gender&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;26&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;27&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Department&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Marketing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Marketing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;60000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;70000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;82000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;55000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;58000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;55000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;65000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# create pandas dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# display the dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;Gender&lt;/th&gt;
      &lt;th&gt;Age&lt;/th&gt;
      &lt;th&gt;Department&lt;/th&gt;
      &lt;th&gt;Salary&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;Tim&lt;/td&gt;
      &lt;td&gt;M&lt;/td&gt;
      &lt;td&gt;26&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;60000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;Shaym&lt;/td&gt;
      &lt;td&gt;M&lt;/td&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;70000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;Noor&lt;/td&gt;
      &lt;td&gt;F&lt;/td&gt;
      &lt;td&gt;27&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;82000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;Esha&lt;/td&gt;
      &lt;td&gt;F&lt;/td&gt;
      &lt;td&gt;32&lt;/td&gt;
      &lt;td&gt;HR&lt;/td&gt;
      &lt;td&gt;55000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;Sam&lt;/td&gt;
      &lt;td&gt;M&lt;/td&gt;
      &lt;td&gt;24&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;58000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;5&lt;/th&gt;
      &lt;td&gt;James&lt;/td&gt;
      &lt;td&gt;M&lt;/td&gt;
      &lt;td&gt;31&lt;/td&gt;
      &lt;td&gt;HR&lt;/td&gt;
      &lt;td&gt;55000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;td&gt;Lily&lt;/td&gt;
      &lt;td&gt;F&lt;/td&gt;
      &lt;td&gt;33&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;65000&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Scatter Plot
&lt;/h2&gt;

&lt;p&gt;To create a &lt;a href="https://datascienceparichay.com/article/scatter-plot-from-pandas-dataframe/" rel="noopener noreferrer"&gt;scatter plot with dataframe data&lt;/a&gt;, pass "scatter" to the &lt;code&gt;kind&lt;/code&gt; parameter of the &lt;code&gt;plot()&lt;/code&gt; function. For example, let's create a scatter plot of the "Age" vs "Salary" data in the above dataframe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scatter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6p6osp8ijwf53tm6nvm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6p6osp8ijwf53tm6nvm.png" alt="scatter plot" width="589" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also customize the plot with additional parameters to the &lt;code&gt;plot()&lt;/code&gt; function. For example, let's add a title to the plot and change the color of the points.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scatter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Salary v/s Age&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;red&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7teztpg0c8b47bklez12.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7teztpg0c8b47bklez12.png" alt="scatter plot with title and red scatter points" width="589" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Bar Plot
&lt;/h2&gt;

&lt;p&gt;To create a bar plot, pass "bar" as an argument to the &lt;code&gt;kind&lt;/code&gt; parameter. Let's create a bar plot of the "Salary" column in the above dataframe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2v2vwslhktil7ajs1lbf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2v2vwslhktil7ajs1lbf.png" alt="bar plot" width="570" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also customize the plot with additional parameters to the &lt;code&gt;plot()&lt;/code&gt; function. For example, let's rotate the xtick labels slightly and change the color of the bars.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;teal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fidxwh5xekx751jmis7d6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fidxwh5xekx751jmis7d6.png" alt="bar plot formatted with teal colored bars and rotated xtick labels" width="570" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Histogram
&lt;/h2&gt;

&lt;p&gt;A histogram is used to look at the distribution of a continuous variable. To plot a histogram on pandas dataframe data, pass "hist" to the &lt;code&gt;kind&lt;/code&gt; parameter.&lt;/p&gt;

&lt;p&gt;For example, let's plot a histogram of the values in the "Age" column.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hist&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqly0w4l68dycew0o5t4n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqly0w4l68dycew0o5t4n.png" alt="histogram" width="567" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also directly apply the &lt;code&gt;plot()&lt;/code&gt; function to a pandas series.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hist&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fklbvuwu0xsubaatulxln.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fklbvuwu0xsubaatulxln.png" alt="histogram" width="567" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We get the same result.&lt;/p&gt;

&lt;p&gt;You can similarly plot other types of plots (for example, &lt;a href="https://datascienceparichay.com/article/plot-pandas-series-as-a-line-plot/" rel="noopener noreferrer"&gt;line plot&lt;/a&gt;, &lt;a href="https://datascienceparichay.com/article/create-a-pie-chart-of-pandas-series-values/" rel="noopener noreferrer"&gt;pie chart&lt;/a&gt;, etc.) with the &lt;code&gt;plot()&lt;/code&gt; function using the appropriate parameters.&lt;/p&gt;

</description>
      <category>learning</category>
      <category>blockchain</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Pandas - Aggregating and Grouping Data - 7 Days of Pandas</title>
      <dc:creator>Piyush Raj</dc:creator>
      <pubDate>Sun, 25 Dec 2022 10:08:00 +0000</pubDate>
      <link>https://forem.com/piyushraj/pandas-aggregating-and-grouping-data-7-days-of-pandas-p60</link>
      <guid>https://forem.com/piyushraj/pandas-aggregating-and-grouping-data-7-days-of-pandas-p60</guid>
      <description>&lt;p&gt;Welcome to the fifth article in the "7 Days of Pandas" series where we cover the &lt;code&gt;pandas&lt;/code&gt; library in Python which is used for data manipulation.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-read-and-write-data-from-csv-files-7-days-of-pandas-1o4f"&gt;first article&lt;/a&gt; of the series, we looked at how to read and write CSV files with Pandas. In this tutorial, we will look at some of the most common operations that we perform on a dataframe in Pandas.  &lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-basic-data-manipulation-7-days-of-pandas-4c47"&gt;second article&lt;/a&gt;, we looked at how to perform basic data manipulation.   &lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-basic-exploratory-data-analysis-7-days-of-pandas-3816"&gt;third article&lt;/a&gt;, we looked at how to perform EDA (exploratory data analysis) with Pandas.  &lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-handling-missing-values-7-days-of-pandas-a16"&gt;fourth article&lt;/a&gt;, we looked at how to handle missing values in a dataframe.&lt;/p&gt;

&lt;p&gt;In this tutorial, we will look aggregate and group data in Pandas.&lt;/p&gt;

&lt;p&gt;Aggregating and grouping data is a common task when working with datasets, and pandas provides a range of functions and methods to help you do this efficiently. &lt;/p&gt;

&lt;p&gt;In this tutorial, we will cover the following topics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Applying aggregate functions to pandas dataframe.&lt;/li&gt;
&lt;li&gt;Grouping data in pandas dataframe (and applying aggregate functions to the grouped data).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Before we begin, let's first import pandas and create a sample dataframe that we will be using throughout this tutorial.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# employee data
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tim&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Shaym&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Noor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Esha&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sam&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;James&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Lily&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gender&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;26&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;27&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Department&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Marketing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Marketing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;60000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;70000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;82000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;55000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;58000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;55000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;65000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# create pandas dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# display the dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;Gender&lt;/th&gt;
      &lt;th&gt;Age&lt;/th&gt;
      &lt;th&gt;Department&lt;/th&gt;
      &lt;th&gt;Salary&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;Tim&lt;/td&gt;
      &lt;td&gt;M&lt;/td&gt;
      &lt;td&gt;26&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;60000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;Shaym&lt;/td&gt;
      &lt;td&gt;M&lt;/td&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;70000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;Noor&lt;/td&gt;
      &lt;td&gt;F&lt;/td&gt;
      &lt;td&gt;27&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;82000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;Esha&lt;/td&gt;
      &lt;td&gt;F&lt;/td&gt;
      &lt;td&gt;32&lt;/td&gt;
      &lt;td&gt;HR&lt;/td&gt;
      &lt;td&gt;55000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;Sam&lt;/td&gt;
      &lt;td&gt;M&lt;/td&gt;
      &lt;td&gt;24&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;58000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;5&lt;/th&gt;
      &lt;td&gt;James&lt;/td&gt;
      &lt;td&gt;M&lt;/td&gt;
      &lt;td&gt;31&lt;/td&gt;
      &lt;td&gt;HR&lt;/td&gt;
      &lt;td&gt;55000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;td&gt;Lily&lt;/td&gt;
      &lt;td&gt;F&lt;/td&gt;
      &lt;td&gt;33&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;65000&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Applying aggregate functions
&lt;/h2&gt;

&lt;p&gt;Pandas comes with a number of aggregate functions that you can apply to the entire dataframe or one or more columns in the dataframe.&lt;/p&gt;

&lt;p&gt;For example, you can apply the &lt;code&gt;sum()&lt;/code&gt; funciton to get the sum of values in each column.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Name                         TimShaymNoorEshaSamJamesLily
Gender                                            MMFFMMF
Age                                                   201
Department    MarketingProductProductHRProductHRMarketing
Salary                                             445000
dtype: object
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Note that for &lt;code&gt;object&lt;/code&gt; type &lt;code&gt;sum()&lt;/code&gt; resulted in a concatenated string.&lt;/p&gt;

&lt;p&gt;You can select which columns to apply the aggregate functions to. &lt;/p&gt;

&lt;p&gt;For example, let's the mean value of the "Age" and the "Salary" columns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Age          28.714286
Salary    63571.428571
dtype: float64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Grouping data
&lt;/h2&gt;

&lt;p&gt;To group a pandas DataFrame by one or more columns, you can use the pandas dataframe &lt;code&gt;groupby()&lt;/code&gt; method. This method takes one or more column names as arguments and returns a groupby object that can be used to apply various operations to the grouped data.&lt;/p&gt;

&lt;p&gt;Let's group the above data on the "Gender" column.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gender&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;pandas.core.groupby.generic.DataFrameGroupBy object at 0x10684b970&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We get a groupby object. You can now use this object to &lt;a href="https://datascienceparichay.com/article/pandas-groupby-mean/" rel="noopener noreferrer"&gt;apply aggregations to the grouped data&lt;/a&gt;. For example, let's get the average "Age" value for each group.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gender&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Gender
F    30.666667
M    27.250000
Name: Age, dtype: float64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We get the mean value of the "Age" column for each group (here, "Gender") in the data.&lt;/p&gt;

&lt;p&gt;You can group the data of more than one columns as well. For example, let's group the data on "Gender" and "Department" and get the average "Age" in each group.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gender&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Department&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Gender  Department
F       HR            32.0
        Marketing     33.0
        Product       27.0
M       HR            31.0
        Marketing     26.0
        Product       26.0
Name: Age, dtype: float64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You can also apply multiple aggregate functions to the grouped data using the &lt;code&gt;.agg()&lt;/code&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gender&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Department&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th colspan="2"&gt;Age&lt;/th&gt;
      &lt;th colspan="2"&gt;Salary&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;mean&lt;/th&gt;
      &lt;th&gt;count&lt;/th&gt;
      &lt;th&gt;mean&lt;/th&gt;
      &lt;th&gt;count&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Gender&lt;/th&gt;
      &lt;th&gt;Department&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th rowspan="3"&gt;F&lt;/th&gt;
      &lt;th&gt;HR&lt;/th&gt;
      &lt;td&gt;32.0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;55000.0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Marketing&lt;/th&gt;
      &lt;td&gt;33.0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;65000.0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Product&lt;/th&gt;
      &lt;td&gt;27.0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;82000.0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th rowspan="3"&gt;M&lt;/th&gt;
      &lt;th&gt;HR&lt;/th&gt;
      &lt;td&gt;31.0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;55000.0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Marketing&lt;/th&gt;
      &lt;td&gt;26.0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;60000.0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Product&lt;/th&gt;
      &lt;td&gt;26.0&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;64000.0&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>javascript</category>
      <category>learning</category>
    </item>
    <item>
      <title>Pandas - Handling Missing Values - 7 Days of Pandas</title>
      <dc:creator>Piyush Raj</dc:creator>
      <pubDate>Sat, 24 Dec 2022 15:54:37 +0000</pubDate>
      <link>https://forem.com/piyushraj/pandas-handling-missing-values-7-days-of-pandas-a16</link>
      <guid>https://forem.com/piyushraj/pandas-handling-missing-values-7-days-of-pandas-a16</guid>
      <description>&lt;p&gt;Welcome to the fourth article in the "7 Days of Pandas" series where we cover the &lt;code&gt;pandas&lt;/code&gt; library in Python which is used for data manipulation.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-read-and-write-data-from-csv-files-7-days-of-pandas-1o4f"&gt;first article&lt;/a&gt; of the series, we looked at how to read and write CSV files with Pandas. &lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-basic-data-manipulation-7-days-of-pandas-4c47"&gt;second article&lt;/a&gt;, we looked at how to perform basic data manipulation. &lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-basic-exploratory-data-analysis-7-days-of-pandas-3816"&gt;third article&lt;/a&gt;, we looked at how to perform EDA (exploratory data analysis) with Pandas.  &lt;/p&gt;

&lt;p&gt;In this tutorial, we will look at how to handle missing values in data.&lt;/p&gt;

&lt;p&gt;When working on some data, it's not uncommon to find missing values in the data. Missing values can occur in data for a variety of reasons, for example, error in data capture, encoding issues, etc. It's important to deal with missing values before you proceed on further analyzing data and it's a major step in data preprocessing.&lt;/p&gt;

&lt;p&gt;In this tutorial, we will cover the following topics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Identifying missing values.&lt;/li&gt;
&lt;li&gt;Handling missing values.

&lt;ul&gt;
&lt;li&gt;Filling missing values.&lt;/li&gt;
&lt;li&gt;Removing missing values.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Before we begin, let's first import pandas and create a sample dataframe that we will be using throughout this tutorial.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# employee data
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Tim"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Shaym"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Noor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Esha"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Sam"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"James"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Lily"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;"Age"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;26&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;27&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;"Department"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Marketing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Product"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Product"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"HR"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Product"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Marketing"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;"Salary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;60000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;82000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;58000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;55000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;65000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# create pandas dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# display the dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;Age&lt;/th&gt;
      &lt;th&gt;Department&lt;/th&gt;
      &lt;th&gt;Salary&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;Tim&lt;/td&gt;
      &lt;td&gt;26.0&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;60000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;Shaym&lt;/td&gt;
      &lt;td&gt;28.0&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;Noor&lt;/td&gt;
      &lt;td&gt;27.0&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;82000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;Esha&lt;/td&gt;
      &lt;td&gt;32.0&lt;/td&gt;
      &lt;td&gt;HR&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;Sam&lt;/td&gt;
      &lt;td&gt;24.0&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;58000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;5&lt;/th&gt;
      &lt;td&gt;James&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;55000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;td&gt;Lily&lt;/td&gt;
      &lt;td&gt;33.0&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;65000.0&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Identifying missing values
&lt;/h2&gt;

&lt;p&gt;To identify missing values in a pandas DataFrame, you can use the pandas &lt;code&gt;isna()&lt;/code&gt; method, which returns a boolean mask indicating the presence of missing values. You can then use this mask to select the rows or columns with missing values.&lt;/p&gt;

&lt;p&gt;For example, let's check which values in the above dataframe are missing using the &lt;code&gt;isna()&lt;/code&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;Age&lt;/th&gt;
      &lt;th&gt;Department&lt;/th&gt;
      &lt;th&gt;Salary&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;True&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;True&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;5&lt;/th&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;True&lt;/td&gt;
      &lt;td&gt;True&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can see the resulting boolean mask.&lt;/p&gt;

&lt;p&gt;To check which columns in the dataframe have missing value, apply the &lt;code&gt;any()&lt;/code&gt; function on the resulting boolean dataframe with &lt;code&gt;axis=0&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isna&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Name          False
Age            True
Department     True
Salary         True
dtype: bool
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We see that only the "Name" column doesn't have any missing values.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling missing values
&lt;/h2&gt;

&lt;p&gt;Handling missing values is an important step in the data preparation pipeline. Generally, there are two approaches to handle missing values - &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fill the missing value with some appropriate value (for example, a constant or mean, median, etc. for continuous variables, and mode for categorical fields).&lt;/li&gt;
&lt;li&gt;Remove the missing values (remove the records with missing values).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's now look at how to do both of them in pandas.&lt;/p&gt;

&lt;h3&gt;
  
  
  Filling missing values
&lt;/h3&gt;

&lt;p&gt;To fill missing values in a pandas DataFrame, you can use the pandas &lt;code&gt;fillna()&lt;/code&gt; method. This method allows you to specify a value to fill the missing values with, or a method for imputing the missing values.&lt;/p&gt;

&lt;p&gt;For example, let's see what we get if we fill the missing values with 0.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;Age&lt;/th&gt;
      &lt;th&gt;Department&lt;/th&gt;
      &lt;th&gt;Salary&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;Tim&lt;/td&gt;
      &lt;td&gt;26.0&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;60000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;Shaym&lt;/td&gt;
      &lt;td&gt;28.0&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;0.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;Noor&lt;/td&gt;
      &lt;td&gt;27.0&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;82000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;Esha&lt;/td&gt;
      &lt;td&gt;32.0&lt;/td&gt;
      &lt;td&gt;HR&lt;/td&gt;
      &lt;td&gt;0.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;Sam&lt;/td&gt;
      &lt;td&gt;24.0&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;58000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;5&lt;/th&gt;
      &lt;td&gt;James&lt;/td&gt;
      &lt;td&gt;0.0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;55000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;td&gt;Lily&lt;/td&gt;
      &lt;td&gt;33.0&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;65000.0&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;It returns a dataframe with the missing values filled with the constant value. Note that the &lt;code&gt;fillna()&lt;/code&gt; function didn't modify the original dataframe in-place. It returned the resulting dataframe after filling the missing values.&lt;/p&gt;

&lt;p&gt;You can also specify different values for different columns when filling missing values.&lt;br&gt;
For exmaple, let's fill missing values in "Age" and "Salary" columns with their respective means and the missing value in the "Department" column with its mode (the most frequent value).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;'Age'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Age'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s"&gt;'Salary'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Salary'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s"&gt;'Department'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Department'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;Age&lt;/th&gt;
      &lt;th&gt;Department&lt;/th&gt;
      &lt;th&gt;Salary&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;Tim&lt;/td&gt;
      &lt;td&gt;26.000000&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;60000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;Shaym&lt;/td&gt;
      &lt;td&gt;28.000000&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;64000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;Noor&lt;/td&gt;
      &lt;td&gt;27.000000&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;82000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;Esha&lt;/td&gt;
      &lt;td&gt;32.000000&lt;/td&gt;
      &lt;td&gt;HR&lt;/td&gt;
      &lt;td&gt;64000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;Sam&lt;/td&gt;
      &lt;td&gt;24.000000&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;58000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;5&lt;/th&gt;
      &lt;td&gt;James&lt;/td&gt;
      &lt;td&gt;28.333333&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;55000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;td&gt;Lily&lt;/td&gt;
      &lt;td&gt;33.000000&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;65000.0&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Dropping rows with missing values
&lt;/h3&gt;

&lt;p&gt;Another strategy of handling missing values is to remove the rows that contain missing values. This is used when the proportion of missing values is comparitively less and we can afford to discard that data.&lt;/p&gt;

&lt;p&gt;Use the pandas &lt;code&gt;dropna()&lt;/code&gt; function to remove rows with missing values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;Age&lt;/th&gt;
      &lt;th&gt;Department&lt;/th&gt;
      &lt;th&gt;Salary&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;Tim&lt;/td&gt;
      &lt;td&gt;26.0&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;60000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;Noor&lt;/td&gt;
      &lt;td&gt;27.0&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;82000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;Sam&lt;/td&gt;
      &lt;td&gt;24.0&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;58000.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;td&gt;Lily&lt;/td&gt;
      &lt;td&gt;33.0&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;65000.0&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is how the dataframe looks after removing rows with any missing values.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Pandas - Basic Exploratory Data Analysis - 7 Days of Pandas</title>
      <dc:creator>Piyush Raj</dc:creator>
      <pubDate>Fri, 23 Dec 2022 07:29:36 +0000</pubDate>
      <link>https://forem.com/piyushraj/pandas-basic-exploratory-data-analysis-7-days-of-pandas-3816</link>
      <guid>https://forem.com/piyushraj/pandas-basic-exploratory-data-analysis-7-days-of-pandas-3816</guid>
      <description>&lt;p&gt;Welcome to the third article in the "7 Days of Pandas" series where we cover the &lt;code&gt;pandas&lt;/code&gt; library in Python which is used for data manipulation.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-read-and-write-data-from-csv-files-7-days-of-pandas-1o4f"&gt;first article&lt;/a&gt; of the series, we looked at how to read and write CSV files with Pandas. &lt;br&gt;
In the &lt;a href="https://dev.to/piyushraj/pandas-basic-data-manipulation-7-days-of-pandas-4c47"&gt;second article&lt;/a&gt;, we looked at how to perform basic data manipulation.&lt;br&gt;
In this tutorial, we will look at some of the common operations that we perform on a dataframe during the exploratory data analysis (EDA phase).&lt;/p&gt;

&lt;p&gt;Exploratory Data Analysis (EDA) helps us better understand the data at hand and can give us valuable insights. In this phase, we look at the data for insights and use descriptive statistics and visualizations to derive insights from the data.&lt;/p&gt;

&lt;p&gt;The pandas library comes with a number of useful functions that help us explore the data. In this tutorial, we will cover the following topics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Get the first and the last N rows of a dataframe.&lt;/li&gt;
&lt;li&gt;Using the &lt;code&gt;info()&lt;/code&gt; function.&lt;/li&gt;
&lt;li&gt;Get descriptive statistics with the &lt;code&gt;describe()&lt;/code&gt; function.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Before we begin, let's first import pandas and create a sample dataframe that we will be using throughout this tutorial.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# employee data
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Tim"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Shaym"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Noor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Esha"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Sam"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"James"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Lily"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;"Age"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;26&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;27&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;"Department"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Marketing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Product"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Product"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"HR"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Product"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"HR"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Marketing"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;"Salary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;60000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;70000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;82000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;55000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;58000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;55000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;65000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# create pandas dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# display the dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;Age&lt;/th&gt;
      &lt;th&gt;Department&lt;/th&gt;
      &lt;th&gt;Salary&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;Tim&lt;/td&gt;
      &lt;td&gt;26&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;60000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;Shaym&lt;/td&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;70000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;Noor&lt;/td&gt;
      &lt;td&gt;27&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;82000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;Esha&lt;/td&gt;
      &lt;td&gt;32&lt;/td&gt;
      &lt;td&gt;HR&lt;/td&gt;
      &lt;td&gt;55000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;Sam&lt;/td&gt;
      &lt;td&gt;24&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;58000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;5&lt;/th&gt;
      &lt;td&gt;James&lt;/td&gt;
      &lt;td&gt;31&lt;/td&gt;
      &lt;td&gt;HR&lt;/td&gt;
      &lt;td&gt;55000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;td&gt;Lily&lt;/td&gt;
      &lt;td&gt;33&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;65000&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We have a dataframe with information of some employee in an office.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get the first and the last N rows of a dataframe
&lt;/h2&gt;

&lt;p&gt;After loading or creating a dataframe, a good first step is to look at the first few rows to see if the data is as expected or not. Or, if there are any obvious issues with the data (for example, missing fields, etc.).&lt;/p&gt;

&lt;p&gt;You can use the pandas dataframe &lt;code&gt;head()&lt;/code&gt; function to get the first n rows of the dataframe. Pass the number of rows you want from the top as an argument. By default, n is 5.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# get the first five rows
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;Age&lt;/th&gt;
      &lt;th&gt;Department&lt;/th&gt;
      &lt;th&gt;Salary&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;Tim&lt;/td&gt;
      &lt;td&gt;26&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;60000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;Shaym&lt;/td&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;70000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;Noor&lt;/td&gt;
      &lt;td&gt;27&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;82000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;Esha&lt;/td&gt;
      &lt;td&gt;32&lt;/td&gt;
      &lt;td&gt;HR&lt;/td&gt;
      &lt;td&gt;55000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;Sam&lt;/td&gt;
      &lt;td&gt;24&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;58000&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can similarly get the last n rows of the dataframe, using the pandas dataframe &lt;code&gt;tail()&lt;/code&gt; function. Pass the number of rows you want from the bottom as an argument. By default, n is 5.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# get the last five rows
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;Age&lt;/th&gt;
      &lt;th&gt;Department&lt;/th&gt;
      &lt;th&gt;Salary&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;Noor&lt;/td&gt;
      &lt;td&gt;27&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;82000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;Esha&lt;/td&gt;
      &lt;td&gt;32&lt;/td&gt;
      &lt;td&gt;HR&lt;/td&gt;
      &lt;td&gt;55000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;Sam&lt;/td&gt;
      &lt;td&gt;24&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
      &lt;td&gt;58000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;5&lt;/th&gt;
      &lt;td&gt;James&lt;/td&gt;
      &lt;td&gt;31&lt;/td&gt;
      &lt;td&gt;HR&lt;/td&gt;
      &lt;td&gt;55000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;td&gt;Lily&lt;/td&gt;
      &lt;td&gt;33&lt;/td&gt;
      &lt;td&gt;Marketing&lt;/td&gt;
      &lt;td&gt;65000&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Use the &lt;code&gt;info()&lt;/code&gt; function
&lt;/h2&gt;

&lt;p&gt;You can use the &lt;a href="https://datascienceparichay.com/article/pandas-get-dataframe-summary-with-info/"&gt;pandas dataframe &lt;code&gt;info()&lt;/code&gt; function&lt;/a&gt; to get a concise summary of the dataframe. It gives information such as the column dtypes, count of non-null values in each column, the memory usage of the dataframe, etc.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# summary of the dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;class 'pandas.core.frame.DataFrame'&amp;gt;
RangeIndex: 7 entries, 0 to 6
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Name        7 non-null      object
 1   Age         7 non-null      int64 
 2   Department  7 non-null      object
 3   Salary      7 non-null      int64 
dtypes: int64(2), object(2)
memory usage: 352.0+ bytes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Get descriptive statistics with the &lt;code&gt;describe()&lt;/code&gt; function
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://datascienceparichay.com/article/pandas-dataframe-describe-function/"&gt;pandas dataframe &lt;code&gt;describe()&lt;/code&gt; function&lt;/a&gt; returns some descriptive statistics for a dataframe. For example, for numerical columns, it returns the count, mean, standard deviation, min, max, percentile values, etc.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# get dataframe's descriptive statistics
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Age&lt;/th&gt;
      &lt;th&gt;Salary&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;count&lt;/th&gt;
      &lt;td&gt;7.000000&lt;/td&gt;
      &lt;td&gt;7.000000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;mean&lt;/th&gt;
      &lt;td&gt;28.714286&lt;/td&gt;
      &lt;td&gt;63571.428571&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;std&lt;/th&gt;
      &lt;td&gt;3.352327&lt;/td&gt;
      &lt;td&gt;9778.499252&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;min&lt;/th&gt;
      &lt;td&gt;24.000000&lt;/td&gt;
      &lt;td&gt;55000.000000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;25%&lt;/th&gt;
      &lt;td&gt;26.500000&lt;/td&gt;
      &lt;td&gt;56500.000000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;50%&lt;/th&gt;
      &lt;td&gt;28.000000&lt;/td&gt;
      &lt;td&gt;60000.000000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;75%&lt;/th&gt;
      &lt;td&gt;31.500000&lt;/td&gt;
      &lt;td&gt;67500.000000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;max&lt;/th&gt;
      &lt;td&gt;33.000000&lt;/td&gt;
      &lt;td&gt;82000.000000&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Note that the pandas dataframe &lt;code&gt;describe()&lt;/code&gt; function, by default includes only the numeric columns when generating the dataframe’s description.&lt;/p&gt;

&lt;p&gt;You can, however, specify other columns types (or all the columns) to include the statistics for using the &lt;code&gt;include&lt;/code&gt; parameter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# get descriptive statistics for object type the columns
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'object'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;Department&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;count&lt;/th&gt;
      &lt;td&gt;7&lt;/td&gt;
      &lt;td&gt;7&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;unique&lt;/th&gt;
      &lt;td&gt;7&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;top&lt;/th&gt;
      &lt;td&gt;Tim&lt;/td&gt;
      &lt;td&gt;Product&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;freq&lt;/th&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For &lt;code&gt;object&lt;/code&gt; type columns, we get the information about the count, number of unique values, top (the most frequent value), and freq (the count of the most frequent value in the column).&lt;/p&gt;

&lt;p&gt;These descriptive statistics give us valuable insights into the distribution of the data in different columns.&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Pandas - Basic Data Manipulation - 7 Days of Pandas</title>
      <dc:creator>Piyush Raj</dc:creator>
      <pubDate>Thu, 22 Dec 2022 06:10:10 +0000</pubDate>
      <link>https://forem.com/piyushraj/pandas-basic-data-manipulation-7-days-of-pandas-4c47</link>
      <guid>https://forem.com/piyushraj/pandas-basic-data-manipulation-7-days-of-pandas-4c47</guid>
      <description>&lt;p&gt;Welcome to the second article in the "7 Days of Pandas" series where we cover the &lt;code&gt;pandas&lt;/code&gt; library in Python which is used for data manipulation.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/piyushraj/pandas-read-and-write-data-from-csv-files-7-days-of-pandas-1o4f"&gt;first article&lt;/a&gt; of the series, we looked at how to read and write CSV files with Pandas. In this tutorial, we will look at some of the most common operations that we perform on a dataframe in Pandas.&lt;/p&gt;

&lt;p&gt;Pandas is a powerful Python library that is widely used for data manipulation and analysis. It provides a range of functions and methods that allow you to easily manipulate and transform data in a variety of formats. In this tutorial, we will cover the following topics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Selecting rows and columns&lt;/li&gt;
&lt;li&gt;Filtering data&lt;/li&gt;
&lt;li&gt;Sorting data&lt;/li&gt;
&lt;li&gt;Adding and deleting columns&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Before we begin, let's first import pandas and read in a sample data file. We will use the &lt;a href="https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html"&gt;&lt;code&gt;pandas.read_csv()&lt;/code&gt;&lt;/a&gt; function to read in a CSV file and store it in a DataFrame object.&lt;/p&gt;

&lt;p&gt;We'll assume that a CSV file "sample_data.csv" exists in the current working directory that we read into a dataframe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"sample_data.csv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that we have a DataFrame, let's dive into the first topic: selecting rows and columns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Selecting Rows and Columns
&lt;/h2&gt;

&lt;p&gt;There are several ways to select specific rows and columns from a pandas DataFrame. One way is to use the loc attribute, which allows you to select rows and columns based on their labels. For example, to select the first row of the DataFrame, you can use the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# select the first row
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To select a specific column, you can pass the column name as a string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# select column by its name
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="s"&gt;"column_name"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also use the &lt;code&gt;iloc&lt;/code&gt; attribute to select rows and columns based on their integer indices. For example, to select the first row using iloc, you can use the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# select the first row
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To select a specific column, you can pass the column index as an integer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# select column by column index
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Filtering Data
&lt;/h2&gt;

&lt;p&gt;In addition to selecting rows and columns, you can also use pandas to filter your data based on specific conditions.&lt;/p&gt;

&lt;p&gt;You can use boolean indexing to filter the data in a dataframe. Boolean indexing allows you to filter a DataFrame based on the values in one or more columns. The idea is the to use a boolean expression that results in a boolean index which we use to filter the original data.&lt;/p&gt;

&lt;p&gt;To do this, you pass a boolean expression to the DataFrame's indexing operator, []. For example, to filter the DataFrame to only include rows where the value in the "column_name" column is greater than 5, you can use the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# filter dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"column_name"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also &lt;a href="https://datascienceparichay.com/article/pandas-filter-dataframe-for-multiple-conditions/"&gt;filter the dataframe on multiple conditions&lt;/a&gt; by using the logical operators &lt;code&gt;&amp;amp;&lt;/code&gt; (and) and &lt;code&gt;|&lt;/code&gt; (or). For example, to filter the DataFrame to only include rows where the value in the "column_name" column is greater than 5 and the value in the "other_column" column is less than 10, you can use the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# filter dataframe on mulitple conditions
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"column_name"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"other_column"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alternatively, you can also use the &lt;code&gt;query()&lt;/code&gt; function in pandas to filter a dataframe.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sorting Data
&lt;/h2&gt;

&lt;p&gt;To &lt;a href="https://datascienceparichay.com/article/pandas-sort-a-dataframe/"&gt;sort a pandas DataFrame&lt;/a&gt;, you can use the pandas dataframe &lt;code&gt;sort_values()&lt;/code&gt; method. This method allows you to specify one or multiple columns to sort by, as well as the sort order (ascending or descending).&lt;/p&gt;

&lt;p&gt;For example, to sort the DataFrame by the "column_name" column in ascending order, you can use the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# sort dataframe by "column_name" in ascending order
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"column_name"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To sort in descending order, you can set the &lt;code&gt;ascending&lt;/code&gt; parameter to &lt;code&gt;False&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# sort dataframe by "column_name" in descending order
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"column_name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also sort by multiple columns by passing a list of column names:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# sort dataframe by multiple columns
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;"column_name_1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"column_name_2"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Adding and Deleting Columns
&lt;/h2&gt;

&lt;p&gt;To add a new column to a pandas DataFrame, you can simply assign a new value to a column that doesn't exist. For example, to add a new column called "new_column" with a default value of 0 for all rows, you can use the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# create a new column with all values as 0
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"new_column"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also assign different values to each row using a list or another Series object.&lt;/p&gt;

&lt;p&gt;There are &lt;a href="https://datascienceparichay.com/article/pandas-add-column-to-dataframe/"&gt;other methods to add a column&lt;/a&gt; as well.&lt;/p&gt;

&lt;p&gt;To delete a column from a DataFrame, you can use the &lt;code&gt;drop()&lt;/code&gt; method and specify the column name and the &lt;code&gt;axis&lt;/code&gt; parameter set to 1 (columns). For example, to delete the "new_column" from the DataFrame, you can use the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# remove the column "new_column" from the dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"new_column"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That concludes this tutorial on basic data manipulation with pandas. We hope that you found it useful.&lt;/p&gt;

&lt;p&gt;In the coming articles, we'll look at other useful operations in Pandas.&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Pandas - Read and Write Data From CSV files - 7 Days of Pandas</title>
      <dc:creator>Piyush Raj</dc:creator>
      <pubDate>Wed, 21 Dec 2022 12:11:41 +0000</pubDate>
      <link>https://forem.com/piyushraj/pandas-read-and-write-data-from-csv-files-7-days-of-pandas-1o4f</link>
      <guid>https://forem.com/piyushraj/pandas-read-and-write-data-from-csv-files-7-days-of-pandas-1o4f</guid>
      <description>&lt;p&gt;Welcome to the 7 days of Pandas challenge!&lt;/p&gt;

&lt;p&gt;In this series, we'll cover the basics and the commonly used operations in &lt;code&gt;pandas&lt;/code&gt; library in Python which is primarily used for data manipulation.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;pandas&lt;/code&gt;, the main object we use is a &lt;code&gt;DataFrame&lt;/code&gt; which is an object that stores the data into a tabular form and lets us perform operations on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Day 1 - Read and Write Data from CSV files using &lt;code&gt;pandas&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;In this article, we will cover how to load data from a CSV file into a dataframe and then write a dataframe to a CSV file using the &lt;code&gt;pandas&lt;/code&gt; module.&lt;/p&gt;

&lt;h3&gt;
  
  
  Read data from CSV file in &lt;code&gt;pandas&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;You can use the &lt;code&gt;pandas.read_csv()&lt;/code&gt; function to read data from a CSV file into a dataframe. The following is the syntax -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="c1"&gt;# read data from csv file
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PATH_TO_FILE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pass the path to the CSV file as an argument to the &lt;code&gt;pandas.read_csv()&lt;/code&gt; function. It reads the data from the CSV file and returns the resulting dataframe with that data.&lt;/p&gt;

&lt;p&gt;Let's look at an example.&lt;/p&gt;

&lt;p&gt;We'll read the data from a file called "Pokemon.csv" saved in the current working directory as a dataframe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# import the pandas module
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# read data from csv file
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Pokemon.csv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# display the first five rows of the dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--l18WCiZZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6ia4tsq716d4srgew0hp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--l18WCiZZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6ia4tsq716d4srgew0hp.png" alt="first five rows of the dataframe" width="880" height="233"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see that the data from the CSV file was loaded in the dataframe. Now, you can go ahead and analyze/manipulate the data as per your requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Write data to a CSV file using &lt;code&gt;pandas&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;You can also use the &lt;code&gt;pandas&lt;/code&gt; module to save a dataframe as a CSV file. For example, after working with and changing the data in a dataframe, you may want to save it for later use.&lt;/p&gt;

&lt;p&gt;Use the &lt;code&gt;pandas.DataFrame.to_csv()&lt;/code&gt; function to save a pandas dataframe as a CSV file. The following is the syntax -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# save dataframe to a csv file
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PATH_TO_NEW_FILE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pass the path (or just the file name in case you want to save the dataframe as csv in the current working directory) as an argument to the &lt;code&gt;pandas.DataFrame.to_csv()&lt;/code&gt; function. &lt;/p&gt;

&lt;p&gt;Note that, if you do not want the dataframe index as an additional column in the resulting CSV file, pass &lt;code&gt;index=False&lt;/code&gt; as an argument.&lt;/p&gt;

&lt;p&gt;Let's look at an example. &lt;/p&gt;

&lt;p&gt;Let's write the above dataframe &lt;code&gt;df&lt;/code&gt; to a new CSV file called "Pokemon2.csv".&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# write dataframe to a csv file
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Pokemon2.csv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you open the CSV file, it looks something like this -&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cGz5OiWu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/atjblhs0b0gvet9s6bs2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cGz5OiWu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/atjblhs0b0gvet9s6bs2.png" alt="the resulting csv file" width="880" height="543"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see that the data was successfully written to the CSV file.&lt;/p&gt;

&lt;p&gt;That'll be it for this article. In the coming articles, we will dive deep into using pandas and some of its most powerful and useful functionalities.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://datascienceparichay.com/article/read-csv-files-using-pandas-with-examples/"&gt;Read CSV files using Pandas – With Examples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html"&gt;&lt;code&gt;pandas.read_csv()&lt;/code&gt; docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv"&gt;&lt;code&gt;pandas.DataFrame.to_csv()&lt;/code&gt; docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>pandas</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Python - If Else in List Comprehension</title>
      <dc:creator>Piyush Raj</dc:creator>
      <pubDate>Mon, 12 Sep 2022 12:27:44 +0000</pubDate>
      <link>https://forem.com/piyushraj/python-if-else-in-list-comprehension-4f0j</link>
      <guid>https://forem.com/piyushraj/python-if-else-in-list-comprehension-4f0j</guid>
      <description>&lt;p&gt;In this tutorial, we will look at how to create a list using a list comprehension that uses an if else logic to decide the final list values.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://datascienceparichay.com/article/python-list-comprehension-with-examples/"&gt;List comprehensions&lt;/a&gt; offer a concise way to create lists in Python and are particularly useful when creating lists using other iterables, filtering lists, etc.&lt;/p&gt;

&lt;p&gt;Let's say you have a list, say &lt;code&gt;ls&lt;/code&gt; of integers 1 to 10 (both inclusive) and you want to create a new list, say &lt;code&gt;new_ls&lt;/code&gt; that contains the string "odd" or "even" depending on whether the corresponding value in &lt;code&gt;ls&lt;/code&gt; is odd or even. &lt;/p&gt;

&lt;p&gt;Using a list comprehension can be a valid approach for such cases. &lt;/p&gt;

&lt;h2&gt;
  
  
  How to use &lt;code&gt;if else&lt;/code&gt; logic inside a list comprehension in Python?
&lt;/h2&gt;

&lt;p&gt;Use the following syntax to incorporate an &lt;code&gt;if else&lt;/code&gt; logic inside a list comprehension.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;new_ls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;condition&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;other_expression&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;member&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;iterable&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, for each &lt;code&gt;member&lt;/code&gt; in the &lt;code&gt;iterable&lt;/code&gt; we are checking for our &lt;code&gt;condition&lt;/code&gt; which if it evaluates to &lt;code&gt;True&lt;/code&gt;, we use the value resulting from &lt;code&gt;expression&lt;/code&gt; otherwise we use the value resulting from &lt;code&gt;other_expression&lt;/code&gt; as our resulting value in the list comprehension. &lt;/p&gt;

&lt;p&gt;Let's now look at an example.&lt;/p&gt;

&lt;p&gt;Let's create a list with string values "odd" and "even" for corresponding value in the &lt;code&gt;ls&lt;/code&gt; list.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# list of integers from 1 to 10
&lt;/span&gt;&lt;span class="n"&gt;ls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# create list using list comprehension
&lt;/span&gt;&lt;span class="n"&gt;new_ls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"odd"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="s"&gt;"even"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ls&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_ls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;['odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We get a list with "odd" and "even" values. For example, for the value &lt;code&gt;4&lt;/code&gt; in the list &lt;code&gt;ls&lt;/code&gt;, its corresponding value in the list &lt;code&gt;new_ls&lt;/code&gt; is "even"&lt;/p&gt;

&lt;h2&gt;
  
  
  Filter values with &lt;code&gt;if&lt;/code&gt; in list comprehension
&lt;/h2&gt;

&lt;p&gt;Note that you can also just use the &lt;code&gt;if&lt;/code&gt; statement (without the &lt;code&gt;else&lt;/code&gt; part) inside the list comprehension. This is commonly used when filtering lists. &lt;/p&gt;

&lt;p&gt;Use the following syntax to use just the &lt;code&gt;if&lt;/code&gt; construct inside a list comprehension.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;new_ls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;member&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;iterable&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;condition&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, for each &lt;code&gt;member&lt;/code&gt; in the &lt;code&gt;iterable&lt;/code&gt; we are checking for our &lt;code&gt;condition&lt;/code&gt; which if it evaluates to &lt;code&gt;True&lt;/code&gt;, we use the value resulting from &lt;code&gt;expression&lt;/code&gt; as our resulting value in the list comprehension otherwise we don't do anything (skip that member).&lt;/p&gt;

&lt;p&gt;For example, let's filter the above list of integers &lt;code&gt;ls&lt;/code&gt; to create a new list with only odd integers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# list of integers from 1 to 10
&lt;/span&gt;&lt;span class="n"&gt;ls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# create list of odd values in ls using list comprehension
&lt;/span&gt;&lt;span class="n"&gt;odd_ls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ls&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;odd_ls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1, 3, 5, 7, 9]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We get a list of only the odd numbers in &lt;code&gt;ls&lt;/code&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Sort a List in Python?</title>
      <dc:creator>Piyush Raj</dc:creator>
      <pubDate>Wed, 17 Aug 2022 14:12:50 +0000</pubDate>
      <link>https://forem.com/piyushraj/how-to-sort-a-list-in-python-36m8</link>
      <guid>https://forem.com/piyushraj/how-to-sort-a-list-in-python-36m8</guid>
      <description>&lt;p&gt;Lists are a common data structure used to store sequences and/or collection of data in Python. Since lists are an ordered collection, it can be handy to know how to sort a list in ascending or descending order. &lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Why sort lists or any data for that matter?&lt;/li&gt;
&lt;li&gt;The list &lt;code&gt;sort()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;The built-in &lt;code&gt;sorted()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why sort lists or any data for that matter?
&lt;/h2&gt;

&lt;p&gt;In general, data presented in a sorted order is more intuitive to look at and infer from. Additionally, some problems become easier to solve when the data is already sorted. For example - Binary search - Searching for an element in a list would take linear time if the list is not sorted but it takes only logarithmic time if the list is already sorted.&lt;/p&gt;

&lt;h2&gt;
  
  
  The list &lt;code&gt;sort()&lt;/code&gt; function
&lt;/h2&gt;

&lt;p&gt;To sort the list in-place, use the list &lt;code&gt;sort()&lt;/code&gt; function. It has the following syntax -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ls.sort(reverse=False, key=None)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It sorts the list in-place and does not return any value. &lt;/p&gt;

&lt;p&gt;The optional parameter &lt;code&gt;reverse&lt;/code&gt; specifies whether to sort the list in descending order or not (and is &lt;code&gt;False&lt;/code&gt; by default) whereas the &lt;code&gt;key&lt;/code&gt; option parameter allows you to pass a custom function to determine the sorting order.&lt;/p&gt;

&lt;p&gt;Here's an example -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# create a list
ls = [3, 5, 1, 2, 4, 7, 6]
# sort the list
ls.sort()
# display the list
print(ls)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1, 2, 3, 4, 5, 6, 7]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The list got sorted in-place.&lt;/p&gt;

&lt;p&gt;Let's now sort the above list in descending order. For this, pass &lt;code&gt;reverse=True&lt;/code&gt; to the &lt;code&gt;sort()&lt;/code&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# create a list
ls = [3, 5, 1, 2, 4, 7, 6]
# sort the list in descending order
ls.sort(reverse=True)
# display the list
print(ls)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[7, 6, 5, 4, 3, 2, 1]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The list is now sorted in descending order&lt;/p&gt;

&lt;h2&gt;
  
  
  The built-in &lt;code&gt;sorted()&lt;/code&gt; function
&lt;/h2&gt;

&lt;p&gt;If you do not want to modify the original list, you can use the Python built-in &lt;code&gt;sorted()&lt;/code&gt; function. Its syntax is very similar to the list &lt;code&gt;sort()&lt;/code&gt; function -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sorted(iterable, reverse=False, key=None)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It returns a sorted copy of the original list.&lt;/p&gt;

&lt;p&gt;Let's look at this method in action.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# create a list
ls = [3, 5, 1, 2, 4, 7, 6]
# sort the list
res_ls = sorted(ls)
# display the original list
print("Original list - ", ls)
# display the resulting list
print("Resulting list - ", res_ls)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Original list -  [3, 5, 1, 2, 4, 7, 6]
Resulting list -  [1, 2, 3, 4, 5, 6, 7]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The original list is unaffected and the returned list is sorted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We looked at two methods to sort a list in Python (both with similar parameters). The key takeaways from this tutorial are-&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To sort the list in-place use the list &lt;code&gt;sort()&lt;/code&gt; function.&lt;/li&gt;
&lt;li&gt;To keep the original list unaltered use the Python built-in &lt;code&gt;sorted()&lt;/code&gt; function which returns a sorted copy of the original list.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both the &lt;code&gt;sort()&lt;/code&gt; and the &lt;code&gt;sorted()&lt;/code&gt; functions take the optional parameters &lt;code&gt;reverse&lt;/code&gt; and &lt;code&gt;key&lt;/code&gt;. The &lt;code&gt;key&lt;/code&gt; parameter can be very useful if you're looking to apply for custom sorting logic on a list. Refer to the tutorial - &lt;a href="https://datascienceparichay.com/article/python-list-sort-with-examples/"&gt;Python List Sort - With Examples&lt;/a&gt; for examples on how to perform such sort operations using the &lt;code&gt;key&lt;/code&gt; parameter.&lt;/p&gt;

</description>
      <category>python</category>
      <category>tutorial</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
