<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: pexpeter</title>
    <description>The latest articles on Forem by pexpeter (@pexpeter).</description>
    <link>https://forem.com/pexpeter</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F815880%2F2302bd35-b981-47a4-b2f8-9978fbab2fd4.jpg</url>
      <title>Forem: pexpeter</title>
      <link>https://forem.com/pexpeter</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/pexpeter"/>
    <language>en</language>
    <item>
      <title>Ultimate Guide to Exploratory Data Analysis</title>
      <dc:creator>pexpeter</dc:creator>
      <pubDate>Tue, 28 Feb 2023 20:24:56 +0000</pubDate>
      <link>https://forem.com/pexpeter/ultimate-guide-to-exploratory-data-analysis-53ko</link>
      <guid>https://forem.com/pexpeter/ultimate-guide-to-exploratory-data-analysis-53ko</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Definition&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Exploratory Data Analysis is an approach to analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods, according to Wikipedia.&lt;/p&gt;

&lt;p&gt;It assists us to identify potential issues with our dataset i.e. missing data, outliers understanding the nature and type of variables, understanding the relationship between the variables and effectively communicating our findings. This helps in advising the company during decision-making as it will be a data-driven approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Exploratory Data Analysis Process&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The EDA process is similar in all the data science programming languages &lt;code&gt;i.e. R, and Python&lt;/code&gt;. The process involves three major steps:&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Input/ Reading
&lt;/h3&gt;

&lt;p&gt;This involves assigning your data to a programming language object so as to store the data in memory. Each data source has its own way of data input/handling. Example in &lt;code&gt;R&lt;/code&gt; and &lt;code&gt;Python&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;  &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
  &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'filepath/data.csv'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;read.csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'filepath/data.csv'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The method to use is dictated by the data size and format in which it's stored.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Cleaning and Analysis
&lt;/h3&gt;

&lt;p&gt;This involves getting a deeper understanding of the data you imported. It will involve:&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;em&gt;Identifying missing data points/ data&lt;/em&gt;
&lt;/h4&gt;

&lt;p&gt;During data collection, some respondents tend to skip questions or sometimes pick an option and do not give the required input. This tends to bring missing data which would have given better insight. EDA assists in identifying the missing and the size of missing data. The most common name is &lt;code&gt;null&lt;/code&gt; or &lt;code&gt;NaN&lt;/code&gt; variables.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;em&gt;Identifying outliers&lt;/em&gt;
&lt;/h4&gt;

&lt;p&gt;Outliers are data points that are significantly different from other data points in the dataset. We tend to identify them through quantiles or percentiles. Most outliers are considered to be not between the &lt;code&gt;(0.1, 0.9)&lt;/code&gt; quantile. . The best action always involves dropping the outliers. Example in Python&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt; &lt;span class="c1"&gt;#subseting to get the borders
&lt;/span&gt; &lt;span class="n"&gt;low&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;high&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'column'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
 &lt;span class="c1"&gt;#assigning to series
&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'column'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;between&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;low&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;high&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;em&gt;Distribution of the data&lt;/em&gt;
&lt;/h4&gt;

&lt;p&gt;We query the spread, shape, and central tendency &lt;code&gt;(mean, standard deviation, variance e.t.c)&lt;/code&gt; of the data to get insights into the data distribution. This helps us in making decisions on what statistical test or analysis to use, and also helps in checking data skewness &lt;code&gt;(unbalanced data)&lt;/code&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;em&gt;Statistical Analysis&lt;/em&gt;
&lt;/h4&gt;

&lt;p&gt;A method to perform statistical analysis depends on the data distribution and data types. It involves performing correlations between variables and this aid in understanding concepts such as multicollinearity between the variables.&lt;/p&gt;

&lt;p&gt;We also perform statistical tests on the data to assist us to answer the hypothesis or objectives we had created.&lt;/p&gt;

&lt;p&gt;We can also perform regression analysis on the data set to get more insights into our response variable and independent variables.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;em&gt;Data Visualization&lt;/em&gt;
&lt;/h4&gt;

&lt;p&gt;This is considered the last process of Exploratory Data Analysis.&lt;/p&gt;

&lt;p&gt;We visualize our data to identify patterns and trends&lt;code&gt;(time series)&lt;/code&gt; which would be difficult with raw data.&lt;/p&gt;

&lt;p&gt;It also involves communicating our results in a simple language understandable by the policymakers. We compile our results in pictorial format to highlight major insights and show the data-driven recommendations.&lt;/p&gt;

&lt;p&gt;The most common data visualization packages are &lt;code&gt;ggplot2 (R)&lt;/code&gt;, &lt;code&gt;matplotlib(Python)&lt;/code&gt;, &lt;code&gt;Seaborn(Python)&lt;/code&gt;, and &lt;code&gt;plotly(R and Python)&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Importance&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;EDA significance can be majorly classified as:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;em&gt;Data Quality and Better Understanding&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;It helps in having a better understanding of the data you are working on to identify trends, patterns, or outliers, which are considered anomalies. This aids in planning how to analyze the data and interpret it.&lt;/p&gt;

&lt;p&gt;Identifying missing values helps in ensuring we use reliable data.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;em&gt;Communication&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;The use of visualization and summaries aids in presenting our results. This makes the results more understandable to most people.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;em&gt;Decision Making&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;EDA can aid in making decisions backed up by data. This gives policymakers a chance to make informed decisions that tend to be more effective and achievable.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;EDA is a critical process in data processing in order to get insights about data sets. The insights aid in making data-driven decisions which tends to be effective.&lt;/p&gt;

&lt;p&gt;It's essential for a data analyst to master EDA in order to assist policymakers to understand the past, current and best practices for future events.&lt;/p&gt;

&lt;p&gt;I hope this ultimate guide serves as a valuable resource for anyone looking to improve their EDA skills.&lt;/p&gt;

&lt;p&gt;You can check a sample EDA procedure in this &lt;a href=""&gt;GitHub repository&lt;/a&gt; and never feel shy about asking for guidance. &lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>beginners</category>
      <category>programming</category>
    </item>
    <item>
      <title>Python 101: Introduction to Python for Data Science</title>
      <dc:creator>pexpeter</dc:creator>
      <pubDate>Sat, 18 Feb 2023 15:14:19 +0000</pubDate>
      <link>https://forem.com/pexpeter/python-101-introduction-to-python-for-data-science-3gm6</link>
      <guid>https://forem.com/pexpeter/python-101-introduction-to-python-for-data-science-3gm6</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--X2mAs28n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://www.python.org/static/img/python-logo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--X2mAs28n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://www.python.org/static/img/python-logo.png" alt="Python logo" width="290" height="82"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Python Definition&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Python is an interpreted, high-level, general-purpose programming language. It was first released in 1991 by Guido van Rossum and has since become one of the most popular programming languages in the world. Its syntax has made it popular as its easy to learn and use.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Advantages of Python for Data Analysis&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;There are several data analysis preferred programming language including R, Stata and SAS.&lt;/p&gt;

&lt;p&gt;Python is better compared to others due to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It's ease of use. It's syntax makes it easy to learn, write, and maintain code, even for beginners.&lt;/li&gt;
&lt;li&gt;Range of libraries: Python has a large number of libraries that provide a range of functionalities for data analysis, such as NumPy, Pandas, and Matplotlib.&lt;/li&gt;
&lt;li&gt;Open-source: Python is open-source, which means that it is freely available and can be used and modified by anyone.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Installing Python&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Python application file can be accessed and downloaded from its main website for different operating systems. I will mainly use Windows for this article.&lt;br&gt;
&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://www.python.org/" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://res.cloudinary.com/practicaldev/image/fetch/s--AN1dv4Zi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://www.python.org/static/opengraph-icon-200x200.png" height="200" class="m-0" width="200"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://www.python.org/" rel="noopener noreferrer" class="c-link"&gt;
          Welcome to Python.org
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          The official home of the Python Programming Language
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://res.cloudinary.com/practicaldev/image/fetch/s--8To6tz8m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://www.python.org/static/favicon.ico" width="48" height="48"&gt;
        python.org
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;After installing python you have to choose an IDE (Integrated Development Environment) which is a software application that provides a comprehensive environment for software development.&lt;/p&gt;

&lt;p&gt;The common IDE for data science are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jetbrains.com/pycharm/"&gt;Pycharm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.spyder-ide.org/"&gt;Spyder&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.visualstudio.com/"&gt;Visual Studio Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jupyter.org/"&gt;Jupyter Notebooks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;IDLE( comes with python installation).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;First Code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A successful setup of your code writing environment means you are ready to code. Python for data science requires several basic libraries to simplify your coding processes. They can be installed using a package manager used in python in your command prompt (cmd)by running the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;NumPy
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Pandas
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Matplotlib
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;matplotlib&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Scikit-Learn
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;scikit&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;learn&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The libraries are installed for ease of use when dealing with tabular data (&lt;code&gt;Pandas&lt;/code&gt;), arrays(&lt;code&gt;NumPy&lt;/code&gt;), visualizations(&lt;code&gt;Matplotlib&lt;/code&gt;) and Machine Learning (&lt;code&gt;Scikit Learn&lt;/code&gt;). As you advance you come to know more libraries that come handy in your data science projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Using Python Libraries for Data Science&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The installed libraries are not usable until they are called or modules in them are called. This is done easily through using the &lt;code&gt;import&lt;/code&gt; and &lt;code&gt;from library import module&lt;/code&gt;. Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#we use `as` as an alias so as to simplify our code
#pandas library
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="c1"&gt;#numpy library
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="c1"&gt;#matplotlib library
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="c1"&gt;#scikit learn library
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_pipeline&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you may have noted, from is used to import a certain method or module from a library depending on the project you are working on.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Python Syntax&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Operators in Python for Data Science
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Arithmetic operators: used for performing arithmetic operations such as addition(&lt;code&gt;+&lt;/code&gt;), subtraction(&lt;code&gt;-&lt;/code&gt;), multiplication(&lt;code&gt;*&lt;/code&gt;), division(&lt;code&gt;/&lt;/code&gt;), and modulus(&lt;code&gt;%&lt;/code&gt;). &lt;/li&gt;
&lt;li&gt;Comparison operators: used for comparing two values and returning a Boolean value (True or False). They include equal to(&lt;code&gt;==&lt;/code&gt;), not equal to(&lt;code&gt;!=&lt;/code&gt;), greater than(&lt;code&gt;&amp;gt;&lt;/code&gt;), less than(&lt;code&gt;&amp;lt;&lt;/code&gt;), greater than or equal to(&lt;code&gt;&amp;gt;=&lt;/code&gt;) and less than or equal to(&lt;code&gt;&amp;lt;=&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Logical operators: used for combining Boolean values and returning a Boolean result. These include &lt;code&gt;logical AND, logical OR and logical NOT&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Assignment operators: used for assigning a value to a variable and performing an operation on the variable at the same time. These include:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;    &lt;span class="c1"&gt;# equivalent to a = a + 3
&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;    &lt;span class="c1"&gt;# equivalent to a = a - 2
&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;    &lt;span class="c1"&gt;# equivalent to a = a * 4
&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;/=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;    &lt;span class="c1"&gt;# equivalent to a = a / 2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Python Data Structures&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Python comes with inbuilt data structures that enables data scientist store and manipulate data sets. They are the foundations that makes easy to integrate with the data science libraries.&lt;br&gt;
The most common data structures are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lists:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A list is a collection of ordered elements, which can be of any data type. Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mylist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Tuples:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A tuple is a collection of ordered elements, similar to a list. However, tuples are immutable, which means that once a tuple is created, its elements cannot be modified. Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mytuple&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;-Dictionaries: &lt;/p&gt;

&lt;p&gt;A dictionary is a collection of key-value pairs, where each key is associated with a value. Dictionaries are unordered and mutable, which means that you can add, remove, or modify key-value pairs in a dictionary. Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mydict&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"a"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"b"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"c"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are the most commonly used data structures but others include sets and arrays.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Data science involves lot of projects from data collection to machine learning. The kind of project will dictate the kind of library and code to write. The most common data sources are apis, excel(flat databases), structured databases(SQL), unstructured databases(mongo) and mixed sometimes. &lt;/p&gt;

&lt;p&gt;Python offers easy integration of data sources e.g &lt;code&gt;pymongo library for mongodb databases, sqlite3 for sql databases and pandas for flat databases(excel, csv etc)&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Examples:&lt;br&gt;
-&lt;strong&gt;&lt;a href="https://pymongo.readthedocs.io/"&gt;Pymongo&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pymongo&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MongoClient&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MongoClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"local host"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;27017&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;-&lt;strong&gt;&lt;a href="https://docs.python.org/3/library/sqlite3.html"&gt;sqlite3&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;sqlite3&lt;/span&gt;
&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;load_ext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;
&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="n"&gt;sqlite&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;-&lt;strong&gt;&lt;a href="https://pandas.pydata.org/"&gt;pandas&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The kind of data also will determine the type of code and libraries to install and use. &lt;/p&gt;

&lt;p&gt;Its always advisable to come up with a clear plan of how to handle your project to avoid wrong method or libraries. &lt;/p&gt;

&lt;p&gt;Data science with python is fun and easy to learn with dedication. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thank you and for any clarification feel free to reach out&lt;/em&gt;.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>datascience</category>
      <category>python</category>
      <category>dataanalysis</category>
    </item>
  </channel>
</rss>
