<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Alex Doytchinov</title>
    <description>The latest articles on Forem by Alex Doytchinov (@alexdoytch).</description>
    <link>https://forem.com/alexdoytch</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F444641%2F238b44ed-9481-4ea3-b24b-2fe6c26f77de.jpeg</url>
      <title>Forem: Alex Doytchinov</title>
      <link>https://forem.com/alexdoytch</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/alexdoytch"/>
    <language>en</language>
    <item>
      <title>Intro to Spyder IDE Using Python Pandas</title>
      <dc:creator>Alex Doytchinov</dc:creator>
      <pubDate>Sun, 02 Aug 2020 14:21:30 +0000</pubDate>
      <link>https://forem.com/bitproject/intro-to-spyder-ide-using-python-pandas-2onk</link>
      <guid>https://forem.com/bitproject/intro-to-spyder-ide-using-python-pandas-2onk</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F1080%2F1%2AfUO28EIHi1bkZPhjZ451tQ.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F1080%2F1%2AfUO28EIHi1bkZPhjZ451tQ.jpeg" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In large datasets, it can be very difficult to extract the necessary information by hand. Python helps automate or heavily reduce the work necessary to present the relevant data set. Python is heavy on English-language syntax to promote a low learning curve and to enhance the average end-user experience&lt;/p&gt;

&lt;p&gt;Python libraries help save time by giving you pre-written code! We recycle previously created functions to save us time for setting up our powerful data analysis tools. Without them, many programs would be significantly larger and repetitive, and saves end-users time to complete assignments.&lt;/p&gt;

&lt;p&gt;Pandas is a premier data science tool. It reads in large data sets such as .csv files or SQL databases and can help extract data based on a meaningful range of values and/or indices. It also has sets of statistical commands to get averages, sums, medians, etc. It also has data cleaning functions useful to remove incomplete entries (NULL values), and is able to join data sets together for more comprehensive presentation. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcms.qz.com%2Fwp-content%2Fuploads%2F2017%2F11%2Fwes_mckinney_pandas.png%3Fw%3D1400%26strip%3Dall%26quality%3D75" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcms.qz.com%2Fwp-content%2Fuploads%2F2017%2F11%2Fwes_mckinney_pandas.png%3Fw%3D1400%26strip%3Dall%26quality%3D75" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pandas has become one of the most popular tools in all of computer science, account for almost 1% of all Stack Overflow questions since 2017. The creator of Pandas, Wes McKinney, crated the tool to help all forms of analysts. He says, “I tell them that it enables people to analyze and work with data who are not expert computer scientists...You still have to write code, but it’s making the code intuitive and accessible. It helps people move beyond just using Excel for data analysis."   &lt;/p&gt;

&lt;p&gt;Today, we will walk through some of the fundamentals of Pandas and use a variety of commands. This walktrough will use the Spyder IDE from &lt;a href="https://www.anaconda.com/products/individual" rel="noopener noreferrer"&gt;Anaconda&lt;/a&gt;. Once installed, go to your application dashboard, then open the new Anaconda3 folder and click on Spyder. Before we can do any work, we have to install pandas through our console. Some editors have basic packages pre-installed, but always check beforehand.&lt;/p&gt;

&lt;p&gt;A common way to install packages is:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

pip install &amp;lt;package_name&amp;gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;In Spyder: click on the Console in the bottom right corner and enter the following command below:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

pip install pandas


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Finally, we will need &lt;a href="https://www.kaggle.com/benroshan/factors-affecting-campus-placement/data" rel="noopener noreferrer"&gt;data to work with&lt;/a&gt;. Once downloaded, click show in folder so that you know its directory. If in a .zip, go into the .zip and copy the csv, go back to the original directory and paste.&lt;/p&gt;

&lt;h3&gt;
  
  
  To Do List
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Install and Create Datasets (Done!)
&lt;/h4&gt;

&lt;h4&gt;
  
  
  2. Using Commands to Extract Data
&lt;/h4&gt;

&lt;h4&gt;
  
  
  3. Alter Our Datasets
&lt;/h4&gt;

&lt;h4&gt;
  
  
  4. Merge and Present Information
&lt;/h4&gt;

&lt;h2&gt;
  
  
  Extracting Our Data
&lt;/h2&gt;

&lt;p&gt;Congrats! We have our library installed and our data set to work with. Now, go into Spyder and in the left terminal hit enter, then type the following to have access to  pandas library:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

import pandas as pd


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The “import pandas” portion now includes the entire library for us to use. The “as pd” portion is to use pd as a shortcut when calling the library functions. So now we don’t type out pandas.(function_name) everytime, but instead pd.(function_name). Yay Efficiency! Our first line of code will be to get our data set into a DataFrame object (so find your data!) Enter the file location of your .csv file into the pd.read_csv(“file”) function, then click the green play button to run the program. On Spyder, there’s a tab called &lt;em&gt;Variable Explorer&lt;/em&gt; in the middle right. Click on it, and you should see a new DataFrame object of size (215,15).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2F4CZa2uy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2F4CZa2uy.png" alt="Alt text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A DataFrame is just like an Excel table that we can directly edit using our code.&lt;br&gt;
When creating a DataFrame that is not necessarily imported with read_csv, you can directly build a Dataframe in the following format:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;alt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col B&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Result: &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;alt&lt;/td&gt;
&lt;td&gt;DataFrame&lt;/td&gt;
&lt;td&gt;(2,2)&lt;/td&gt;
&lt;td&gt;Column names: A,B&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We have our data with us, but it’s very large, so we might be confused as to what to do with it right now... Here are some examples of the commands to use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;data.head():&lt;/strong&gt; returns top 5 rows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;data.tail():&lt;/strong&gt; returns bottom 5 rows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;data.head(x)/tail(x):&lt;/strong&gt; returns x top/bottom rows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;data.sort_values(col1):&lt;/strong&gt; sorts data ascending based on col1&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;data.sort_values(col1, ascending=False):&lt;/strong&gt; will do the same thing, but sorted in Descending order&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FtPx4OEY.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FtPx4OEY.png" alt="Alt text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Difference Between head() and head(x):
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;data.head():&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;data&lt;/td&gt;
&lt;td&gt;DataFrame&lt;/td&gt;
&lt;td&gt;(5,15)&lt;/td&gt;
&lt;td&gt;Column names: sl_no, gender, ssc_p, ssc_b, hsc_p, hsc_b, hsc_s, degree...&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;data.head(10):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;data&lt;/td&gt;
&lt;td&gt;DataFrame&lt;/td&gt;
&lt;td&gt;(10,15)&lt;/td&gt;
&lt;td&gt;Column names: sl_no, gender, ssc_p, ssc_b, hsc_p, hsc_b, hsc_s, degree...&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Getting Top and Bottom 6 Salaries
&lt;/h2&gt;

&lt;p&gt;On the Kaggle page used to download our csv, there is a part of the page providing definitions to our data’s column headers, including salary. Use it, and the newly learned commands to:&lt;br&gt;
Get the 6 highest and lowest salary entries from our dataset.&lt;/p&gt;

&lt;p&gt;Tip: calling data.head() returns a text answer, but for a new dataframe, write new_data = data.head()&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;bottom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;top&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;So you would get 200,000 for the 6 bottom salaries, but the top 6 salaries were NAN. That doesn’t tell us what we really want. We can filter out NULL data (like here where there was no offer made):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Dropna() gets rid of null values, and arguments subset means we can remove NULL values in a certain column (here we use salary). Inplace is to choose between overwriting the same object or create a new one (true = overwrite). The following code would return non-null values:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subset&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;bottom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;top&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Altering Our Data
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdata-flair.training%2Fblogs%2Fwp-content%2Fuploads%2Fsites%2F2%2F2019%2F06%2FAggregation-and-Grouping-in-Pandas.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdata-flair.training%2Fblogs%2Fwp-content%2Fuploads%2Fsites%2F2%2F2019%2F06%2FAggregation-and-Grouping-in-Pandas.jpg" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Say I want the average value of a certain set, or maybe I just want the highest percentage from each column. Pandas is equipped with various commands such as the ones above that can help us query the datasaet for relevant information.&lt;/p&gt;

&lt;p&gt;data.max() and data.min() will return the highest/lowest value of every column, &lt;br&gt;
data[‘col1’].max() returns highest value in derived column. Similar implementation for .mean(), .median(), .sum(), .count(), .std() [standard deviation], and more. &lt;/p&gt;

&lt;p&gt;For our data set:&lt;/p&gt;

&lt;p&gt;data.max() = 15 values&lt;/p&gt;

&lt;p&gt;data[‘salary’].max() = 1 value&lt;/p&gt;

&lt;p&gt;data.mean() = 7 values, since the other 8 columns are non-numerical&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FkOaNxs3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FkOaNxs3.png" alt="Alt text"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Get the Median, Sum, and Max
&lt;/h2&gt;

&lt;p&gt;Let’s get the median and sum of the top 100 salaries, as well as the max mba_p from those 100 candidates&lt;/p&gt;

&lt;p&gt;Tip: How are the salaries sorted right now?&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;hundred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sum_hun&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hundred&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;med_hun&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hundred&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;max_mba&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hundred&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mba_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;First, we sort the data in descending order, and grab a new DataFrame with .head(). We then create our sum, median and max values by calling their respective commands on the new "hundred" DataFrame.&lt;/p&gt;

&lt;h2&gt;
  
  
  Merging New Data
&lt;/h2&gt;

&lt;p&gt;What if I want to see how all the percentages add up and take the mean? Maybe that can be done with the tools already given, but now I want to add it to our original set as a new column.&lt;/p&gt;

&lt;p&gt;There are two ways to add data: &lt;strong&gt;.append()&lt;/strong&gt; and &lt;strong&gt;.concat().&lt;/strong&gt; Append is to add a dataframe with identical row indexes (same column values), while .concat() adds identical column indexes (same row values).&lt;/p&gt;

&lt;p&gt;Code Format:&lt;br&gt;
data1 = To be added&lt;br&gt;
data2 = Data we want to update&lt;br&gt;
data1 = data1.append(data2)&lt;br&gt;
data.concat([data1,data2])&lt;br&gt;
Add two columns with:&lt;br&gt;
New_data = data[col1] + data[col2]&lt;br&gt;
Note: This creates a Series object, not a DataFrame&lt;br&gt;
Add a series to DataFrame by creating new column:&lt;br&gt;
data[“New Col”] = series&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;one&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;B&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
&lt;span class="n"&gt;two&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;B&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
&lt;span class="n"&gt;one&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;one&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;two&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;ignore_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;2 size(1,2) DataFrames merge into a (2,2) new object. Ignore_index works like inplace earlier.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FUTJP4fJ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FUTJP4fJ.png" alt="Alt text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Aggregate Old Percentages to Create a New One
&lt;/h2&gt;

&lt;p&gt;Let’s take the 5 percentages in our original data set, sum them into one column (and divide by 5 with data /= 5), return the average of the column and add this column to our original data set. &lt;/p&gt;

&lt;p&gt;Tip: You can add columns together!&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;new_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ssc_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hsc_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;new_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_data&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;degree_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;new_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_data&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;etest_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;new_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_data&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mba_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;new_data&lt;/span&gt; &lt;span class="o"&gt;/=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_data&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We create a Series object by adding the two columns from data together. We continue adding the relevant percentages in a similar fashion, and then divide by the number of percentages we are using. Now that we have our master average, there is a new column we can add to our original data set, which we can do so by directly naming a new column for our DataFrame.&lt;/p&gt;

&lt;h2&gt;
  
  
  Merging Two Sets of Relevant Data
&lt;/h2&gt;

&lt;p&gt;Get the 5 highest mba_p and the 5 highest etest_p candidates and merge their dataset to 10 “premier candidates”&lt;/p&gt;

&lt;p&gt;Tip: The two Dataframes share the same column indexes &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;mba&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mba_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;etest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;etest_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mba_top&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mba&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;etest_top&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;etest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;mba_top&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;etest_top&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;mba_top&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We first have to create two new, sorted DataFrames. We grab the top 5 from both of the new datasets into another pair of DataFrames, and we use the .concat() command to make one of our pair the final dataset. &lt;/p&gt;

&lt;h2&gt;
  
  
  Congratulations!
&lt;/h2&gt;

&lt;p&gt;With this tuorial, we learned to sort data and isolate information, as well as ignore irrelevant or incomplete data. We could then create new subsets of data for future analysis and readability. Finally, we gained the ability to add the data to create a more comprehensive data set.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>beginners</category>
      <category>productivity</category>
      <category>python</category>
    </item>
  </channel>
</rss>
