<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sagar Kapoor</title>
    <description>The latest articles on Forem by Sagar Kapoor (@sagarkapoor).</description>
    <link>https://forem.com/sagarkapoor</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F964295%2Fdc11ddbb-8735-4831-9f1c-fa94db85cc70.jpeg</url>
      <title>Forem: Sagar Kapoor</title>
      <link>https://forem.com/sagarkapoor</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sagarkapoor"/>
    <language>en</language>
    <item>
      <title>Getting Started With Git</title>
      <dc:creator>Sagar Kapoor</dc:creator>
      <pubDate>Thu, 08 Dec 2022 10:19:35 +0000</pubDate>
      <link>https://forem.com/sagarkapoor/getting-started-with-git-40b1</link>
      <guid>https://forem.com/sagarkapoor/getting-started-with-git-40b1</guid>
      <description>&lt;h3&gt;
  
  
  What is Git?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://git-scm.com/" rel="noopener noreferrer"&gt;&lt;b&gt;Git&lt;/b&gt;&lt;/a&gt; is a &lt;strong&gt;&lt;em&gt;distributed&lt;/em&gt;&lt;/strong&gt; &lt;strong&gt;version control system&lt;/strong&gt;. Version control systems (VCS) are software tools that help software teams manage changes to source code over time. There are many VCS available and Git is one of them, having been developed by &lt;a href="https://en.wikipedia.org/wiki/Linus_Torvalds" rel="noopener noreferrer"&gt;Linus Torvalds&lt;/a&gt; in 2005. A VCS helps you do the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Have a complete history of changes to your source code as a log.&lt;/li&gt;
&lt;li&gt;Branching and merging of the work on the source code into many streams, allowing various team members to work together parallel to each other at the same time. &lt;/li&gt;
&lt;li&gt;Tracing each change made to the source code. (Developers often use this feature to roll back an update to their code if such a need arises.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This process is called source code management. I have used the &lt;a href="https://www.atlassian.com/git/tutorials/what-is-version-control" rel="noopener noreferrer"&gt;Atlassian guide&lt;/a&gt; to learn Git, which I would highly recommend for beginners since it is very well documented and thus is easier to understand.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing Git
&lt;/h3&gt;

&lt;p&gt;To install Git on your machine you can use &lt;a href="https://www.atlassian.com/git/tutorials/install-git" rel="noopener noreferrer"&gt;this guide&lt;/a&gt;, if you are using any other system than Windows. For Windows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Download the latest &lt;a href="https://git-for-windows.github.io/" rel="noopener noreferrer"&gt;Git for Windows installer&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;When you've successfully started the installer, you should see the Git Setup wizard screen. Follow the &lt;strong&gt;Next&lt;/strong&gt; and &lt;strong&gt;Finish&lt;/strong&gt; prompts to complete the installation. You should be fine with the default settings if you want to immediately get started with Git.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Setting up a repository
&lt;/h3&gt;

&lt;p&gt;Repositories are created in your project's &lt;em&gt;directory&lt;/em&gt; (folders like this --&amp;gt; 📁 in your PC. This is just a pictorial representation, you can have folder icons of many types). There are two ways through which you can get started with Git for your project. This is the place where the different versions of your updated projects are stored.&lt;br&gt;&lt;br&gt;
In short, this is the history book 📓 of your project. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You can start a Git repository in your project's directory. &lt;/li&gt;
&lt;li&gt;You can clone an existing project. &lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Git Commands
&lt;/h3&gt;

&lt;p&gt;For this step you will need &lt;a href="https://docs.microsoft.com/en-us/powershell/scripting/overview?view=powershell-7.1" rel="noopener noreferrer"&gt;&lt;b&gt;PowerShell&lt;/b&gt;&lt;/a&gt; and a working knowledge with &lt;a href="https://en.wikipedia.org/wiki/Command-line_interface" rel="noopener noreferrer"&gt;CLI&lt;/a&gt;. Working with &lt;code&gt;cmd.exe&lt;/code&gt; is also fine, but I choose PowerShell, &lt;a href="https://www.howtogeek.com/163127/how-powershell-differs-from-the-windows-command-prompt/" rel="noopener noreferrer"&gt;here&lt;/a&gt; are my reasons. You can also use &lt;a href="https://www.atlassian.com/git/tutorials/git-bash" rel="noopener noreferrer"&gt;Git Bash&lt;/a&gt;, a Windows app for emulating the Git CLI experience, to execute the commands.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command is used to configure how you are presented on Git. There are three levels of git configurations, system, global and repository. If you are starting about, you will need these two configurations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;git config --global user.name &amp;lt;your name here&amp;gt;&lt;/code&gt; (within "" if there is a space). This will change the Git username so that you can know who made the changes to your project.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;git config --global user.email &amp;lt;your email here&amp;gt;&lt;/code&gt; This will add your email information to the commits. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For all practicality, you would prefer using the &lt;code&gt;global&lt;/code&gt; settings for most of your commits. But if you want to have a specific name and email associated with a directory, you can do that through the &lt;code&gt;git config --local user.name/email &amp;lt;info&amp;gt;&lt;/code&gt; after opening PowerShell in that specific directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command will initiate a repository in the directory where you execute it. To do so use &lt;code&gt;cd &amp;lt;directory name&amp;gt;&lt;/code&gt; (add the &lt;a href="https://docs.microsoft.com/en-us/dotnet/standard/io/file-path-formats" rel="noopener noreferrer"&gt;path&lt;/a&gt; in between &lt;code&gt;&amp;lt; &amp;gt;&lt;/code&gt;) to navigate to the directory of your project first.  Then execute this command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd &amp;lt;directory name&amp;gt;
git init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or you can use this command: &lt;code&gt;git init &amp;lt;project directory&amp;gt;&lt;/code&gt; to initialise a Git repository in an existing project.&lt;br&gt;&lt;br&gt;
&lt;code&gt;git init&lt;/code&gt; is a one time operation in a project directory, consider it like a notebook that stores the entry of all the people in an office, you only need to have one register.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As the name might have made it clear, this command is used to clone an existing project repository in a remote repository like &lt;a href="https://github.com/" rel="noopener noreferrer"&gt;&lt;b&gt;GitHub&lt;/b&gt;&lt;/a&gt; and obtain a local development clone. &lt;strong&gt;Git&lt;/strong&gt; is the DVCS and &lt;strong&gt;GitHub&lt;/strong&gt; is a service that hosts repositories for projects.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone &amp;lt;repo url&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can use Git &lt;a href="https://www.atlassian.com/git/tutorials/git-ssh" rel="noopener noreferrer"&gt;&lt;b&gt;SSH&lt;/b&gt;&lt;/a&gt; URL to do the same. You can use the link to know more about it if you are interested.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git add &amp;lt;file name&amp;gt;
git commit -m "add a message here"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command is used to add file version changes to your project's repository and then commit the change with a message attached to it. This way, you will have an idea later why the commit was made. The &lt;code&gt;add&lt;/code&gt; command basically gets your changes ready for being &lt;code&gt;committed&lt;/code&gt; as changes to the remote repository. &lt;/p&gt;

&lt;p&gt;These are the basic commands that you need to know if you want to interact with the code of countless programmers out there in places such as &lt;a href="https://github.com/" rel="noopener noreferrer"&gt;&lt;b&gt;Git&lt;/b&gt;&lt;/a&gt;!  &lt;/p&gt;

&lt;p&gt;Get git! 🚀&lt;/p&gt;

</description>
      <category>watercooler</category>
    </item>
    <item>
      <title>How to Scrape a site with Python</title>
      <dc:creator>Sagar Kapoor</dc:creator>
      <pubDate>Sat, 19 Nov 2022 18:37:26 +0000</pubDate>
      <link>https://forem.com/sagarkapoor/how-to-scrape-a-site-with-python-2fa9</link>
      <guid>https://forem.com/sagarkapoor/how-to-scrape-a-site-with-python-2fa9</guid>
      <description>&lt;h3&gt;
  
  
  Why you should learn to scrape
&lt;/h3&gt;

&lt;p&gt;This guide will help you in getting started with scraping in no time. Reasons why you would want to scrape in the first place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automate the collection of data from a website.&lt;/li&gt;
&lt;li&gt;Chill while you gloat to yourself that you made a Python script.&lt;/li&gt;
&lt;li&gt;Use the data that is automatically collected to run analysis over it and make conclusions.&lt;/li&gt;
&lt;li&gt;Compare characteristics of a product (especially price) from different websites without manually clicking each web page on your own.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Things that you will need
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Python installed on your machine (&lt;em&gt;By "machine", coders mean the device that you are working on; Python can be made to work on mobiles too but stick to a computer or laptop for scraping&lt;/em&gt;) &lt;a href="https://www.python.org/downloads/" rel="noopener noreferrer"&gt;&lt;strong&gt;Follow this link for downloading Python&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install additional modules for Python. Modules are add-on libraries to your Python program that save you the trouble of writing your code. &lt;em&gt;That is the simplest definition of modules.&lt;/em&gt; You will need &lt;strong&gt;bs4,&lt;/strong&gt; &lt;strong&gt;requests&lt;/strong&gt; and the &lt;strong&gt;CSV&lt;/strong&gt; modules for this scraping task. Follow these steps to get them:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open Command Prompt. (If you are not able to find it, then open your Window search bar and type in &lt;code&gt;cmd&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Copy and paste these commands inside the box on the terminal:
&lt;code&gt;python -m pip install bs4&lt;/code&gt;
&lt;code&gt;python -m pip install requests&lt;/code&gt;
&lt;code&gt;python -m pip install csv&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;These commands will install the required modules that I mentioned earlier. Their uses will become quite clear when we go through the code to scrape an example site.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You will need a code editor to start editing your code. Before saving your code in a file, always remember to give it the proper extension, which in this case is &lt;code&gt;.py&lt;/code&gt;. So, for example, if you make a scraping script with the name &lt;code&gt;scraper&lt;/code&gt;, its proper name should be &lt;code&gt;scraper.py&lt;/code&gt; otherwise your script will not run, since it will not be recognized as a Python script by the Python interpreter.
If this is your first coding experience, I would suggest you use &lt;a href="https://www.sublimetext.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Sublime Text&lt;/strong&gt;&lt;/a&gt; or &lt;a href="https://code.visualstudio.com/download" rel="noopener noreferrer"&gt;&lt;strong&gt;VS Code&lt;/strong&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is about all that you will need to make your scraping script!&lt;/p&gt;

&lt;h3&gt;
  
  
  The first steps
&lt;/h3&gt;

&lt;p&gt;We will be using an example site, &lt;a href="http://quotes.toscrape.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Quotes to Scrape&lt;/strong&gt;&lt;/a&gt; to use for our scraping purposes. &lt;strong&gt;Our task, extract all the quotes in a csv file in one column, along with the author's name in the adjacent column.&lt;/strong&gt; Hence, one row will have the quote in one column and the author's name in the next column. The structure will be like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Quote&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Author&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”&lt;/td&gt;
&lt;td&gt;Albert Einstein&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The code that we will use can be found from this &lt;a href="https://github.com/Sagar-Kap/Quotes/blob/master/Quotes%40.py" rel="noopener noreferrer"&gt;&lt;strong&gt;GitHub repository&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So let's get started:&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 1 : Locating the HTML elements
&lt;/h4&gt;

&lt;p&gt;Open the site and add right-click on the web page and click on &lt;strong&gt;inspect&lt;/strong&gt;. Alternatively, you could just hit the &lt;code&gt;F12&lt;/code&gt; on your keyboard to inspect the HTML code.&lt;br&gt;&lt;br&gt;
&lt;em&gt;Do not be afraid&lt;/em&gt; of the console that pops up,it will look something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87g9s3554yiczgtxyisi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87g9s3554yiczgtxyisi.png" alt="An image of the Google Developer Console" width="800" height="374"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Look at the &lt;code&gt;elements&lt;/code&gt; on the console if it is not already showing its contents by default. Now, look at the site's structure through the HTML tags from the console that opened up. It is called a &lt;strong&gt;Developer Console&lt;/strong&gt;. What we require is the quotes from this page, and their author. We will take a look at the HTML document now in front of us.&lt;/p&gt;

&lt;p&gt;Right-click on any of the quote and select &lt;code&gt;inspect&lt;/code&gt;. You will notice that the quote has been highlighted, and you have been directed to the specific HTML tag on the Developer's Console. Click on expand button of the tag that you are concerned with. What this will do is expand the HTML tag, and you will be able to observe the code better, which in this case happens to be the following tag:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork"&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This is a &lt;code&gt;div&lt;/code&gt; tag, with a class named as &lt;code&gt;quote&lt;/code&gt;. We will pass these values to the Python script, which will then use them to give us the required results. For now, remember that these tags hold the value that you are looking for.&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 2: Controlling your browser through Python!
&lt;/h4&gt;

&lt;p&gt;You read that right, this program that you will write will send a request to this website using your internet and you will not have to do a single thing on your own! Python will take care of it for you. To summarise, the program will do the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Send a request to &lt;code&gt;get&lt;/code&gt; the URL that you will pass to the program using the &lt;code&gt;requests&lt;/code&gt; library.&lt;/li&gt;
&lt;li&gt;Turn the DOM elments into a soup using the &lt;code&gt;BeautifulSoup 4&lt;/code&gt; library.&lt;/li&gt;
&lt;li&gt;Search for the HTML tags that have a quote as a text nested inside them which can identified with a specific &lt;a href="https://www.w3schools.com/html/html_attributes.asp" rel="noopener noreferrer"&gt;&lt;strong&gt;attribute&lt;/strong&gt;&lt;/a&gt; and extract them into individual containers.&lt;/li&gt;
&lt;li&gt;Create an array that stores these containers as &lt;code&gt;list&lt;/code&gt; elements.&lt;/li&gt;
&lt;li&gt;Run a for-loop in this list and extract the required information from each container and store them in a CSV file. You can print the results on the console as well to check if you are going wrong anywhere.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To start with your program, create a new file in a folder (also called directories or in short &lt;code&gt;dir&lt;/code&gt;). Write down the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script will direct the Python interpreter to call the libraries that you installed. You will notice that the BeautifulSoup has been called from bs4, since we require only this specific feature and refer it as &lt;code&gt;soup&lt;/code&gt;. This makes it easier for us to call it just by typing &lt;code&gt;soup&lt;/code&gt;, this is done only for the sake of convenience and is not a compulsion. The &lt;code&gt;csv&lt;/code&gt; module works with CSV files, to store values in comma-separated value files, like MS Excel sheets.&lt;/p&gt;

&lt;p&gt;Next, declare a couple of variables which are &lt;code&gt;URL&lt;/code&gt;, and &lt;code&gt;page_num&lt;/code&gt;. Their use will become clear later.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://quotes.toscrape.com/page/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;This function performs a GET request to URL
    passed as a parameter within its execution&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;make_soup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Function returns a soup object stored in the variable&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;get_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the above code snippet, &lt;code&gt;get_url()&lt;/code&gt; and &lt;code&gt;make_soup()&lt;/code&gt; are functions. Functions are blocks of code that you can call later on to do a specific task. Their syntax is like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nf"&gt;function&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;#Block of code here
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;get_url()&lt;/code&gt; function will make a connection to the value passed to the URL variable which in this case is &lt;a href="http://quotes.toscrape.com/" rel="noopener noreferrer"&gt;&lt;code&gt;http://quotes.toscrape.com/&lt;/code&gt;&lt;/a&gt;. If you navigate to the last page of this site by clicking the &lt;strong&gt;next&lt;/strong&gt; button, you will find that the total page number is 10.&lt;/p&gt;

&lt;blockquote&gt;
&lt;h4&gt;
  
  
  So, how does the browser move on to the next page?
&lt;/h4&gt;

&lt;p&gt;It is simple. The next page number is added to the earlier url, which increases by a value of 1 from thereon! Just examine the url carefully when you navigate to the next page. You will notice that the value of the original url changes like this: &lt;a href="http://quotes.toscrape.com/page/2/" rel="noopener noreferrer"&gt;&lt;code&gt;http://quotes.toscrape.com/page/2/&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Do note the &lt;a&gt;&lt;code&gt;page/2&lt;/code&gt;&lt;/a&gt; at the end of the url now. This is how the url changes with each new page as someone navigates through the website. For every website that you scrape, you will have to navigate through the pages to figure out these two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How many pages of the site do you want to scrape?&lt;/li&gt;
&lt;li&gt;What are the extra elements that get added when you navigate to the next page?&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Step 3: Getting the quotes and their author
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_container_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Function finds out all HTML containers with the quotes
    and authors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; names and returns an array&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;text_boxes_array&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;make_soup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;findAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text_boxes_array&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_quote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Function finds the quote from the
    selected HTML container passed to it as an argument&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;quote&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;span&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;quote&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_author_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Function returns the author&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s name from the HTML container passed to it
    as an argument&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;author&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;get_container_array()&lt;/code&gt; function will help you in getting an array of all the HTML elements that contain the quotes and the author. The site's url is passed to it as an argument. The &lt;code&gt;get_quote()&lt;/code&gt; function extracts the quote from each container, similarly the &lt;code&gt;get_author_name()&lt;/code&gt; will extract the author from the given container. Do take note that I had to inspect the HTML tags which held this information. You can inspect them from the developer's console as I have explained before.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 4: Getting the info stored in a CSV file
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fill_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Function compiles the quotes and
    author&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s name into a csv file for all web pages, quotes.csv&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quotes.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;thewriter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;thewriter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writerow&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Quote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;serial_num&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;get_container_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;serial_num&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;get_quote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;get_author_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="n"&gt;thewriter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writerow&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nf"&gt;get_quote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;get_author_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
            &lt;span class="n"&gt;serial_num&lt;/span&gt;&lt;span class="o"&gt;+=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The function &lt;code&gt;fill_csv()&lt;/code&gt; will create a &lt;code&gt;csv&lt;/code&gt; file named &lt;strong&gt;quotes.csv&lt;/strong&gt; and create the first row with the columns &lt;strong&gt;Quote&lt;/strong&gt; and &lt;strong&gt;Author&lt;/strong&gt;. Then it will use the functions &lt;code&gt;get_quote()&lt;/code&gt; and &lt;code&gt;get_author_name()&lt;/code&gt; to extract the quote and its author's name and pass it on to the function &lt;code&gt;fill_csv()&lt;/code&gt; to print it out in a new row in the &lt;strong&gt;quotes.csv&lt;/strong&gt; file. The function runs a &lt;a href="https://www.w3schools.com/python/python_for_loops.asp" rel="noopener noreferrer"&gt;for loop&lt;/a&gt; on the array that is created only when &lt;code&gt;fill_csv()&lt;/code&gt; function is executing.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 5 : Running the program on mutiple pages:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;multi_page&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Function will loop through each page
    of site and invoke scraping func fill_csv() on each&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;Page_Num&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;page_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://quotes.toscrape.com/page/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Page_Num&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;Page_Num&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;fill_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;Page_Num&lt;/span&gt;&lt;span class="o"&gt;+=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="nf"&gt;multi_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The function &lt;code&gt;multi-page()&lt;/code&gt; will run the scraper on the whole site, as you can see in a loop of 10 iterations. The last line of code &lt;code&gt;multi_page()&lt;/code&gt; is the final nail in the coffin! This is when you call the whole program into action.&lt;/p&gt;

&lt;h3&gt;
  
  
  What next?
&lt;/h3&gt;

&lt;p&gt;You can play around with the code. If you run into any issues do not feel discouraged, examine the editor’s console and try to identify the error. You will find plenty of resources online to debug the issue. I hope this little project provides you some incentive to get started with Python straight away and into a journey of deep learning and frustrations and motivations... Ok, you get the drift.&lt;/p&gt;

&lt;h3&gt;
  
  
  The whole code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;

&lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://quotes.toscrape.com/page/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;This function performs a GET request to URL
    passed as a parameter within its execution&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;make_soup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Function returns a soup object stored in the variable&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;get_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_container_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Function finds out all HTML containers with the quotes
    and authors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; names and returns an array&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;text_boxes_array&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;make_soup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;findAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text_boxes_array&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_quote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Function finds the quote from the
    selected HTML container passed to it as an argument&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;quote&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;span&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;quote&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_author_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Function returns the author&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s name from the HTML container passed to it
    as an argument&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;author&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fill_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Function compiles the quotes and
    author&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s name into a csv file for all web pages, quotes.csv&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quotes.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;thewriter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;thewriter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writerow&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Quote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;serial_num&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;get_container_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;serial_num&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;get_quote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;get_author_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="n"&gt;thewriter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writerow&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nf"&gt;get_quote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;get_author_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
            &lt;span class="n"&gt;serial_num&lt;/span&gt;&lt;span class="o"&gt;+=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;multi_page&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Function will loop through each page
    of site and invoke scraping func fill_csv() on each&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;Page_Num&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;page_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://quotes.toscrape.com/page/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Page_Num&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;Page_Num&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;fill_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;Page_Num&lt;/span&gt;&lt;span class="o"&gt;+=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="nf"&gt;multi_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>beginners</category>
      <category>ux</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
