<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Federico Trotta</title>
    <description>The latest articles on Forem by Federico Trotta (@federicotrotta).</description>
    <link>https://forem.com/federicotrotta</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1395272%2F24a440c7-0b73-466f-94e2-0bfc644bea2a.png</url>
      <title>Forem: Federico Trotta</title>
      <link>https://forem.com/federicotrotta</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/federicotrotta"/>
    <language>en</language>
    <item>
      <title>How to Use Proxies in Python</title>
      <dc:creator>Federico Trotta</dc:creator>
      <pubDate>Thu, 14 Nov 2024 19:59:56 +0000</pubDate>
      <link>https://forem.com/federicotrotta/how-to-use-a-proxy-in-python-1278</link>
      <guid>https://forem.com/federicotrotta/how-to-use-a-proxy-in-python-1278</guid>
      <description>&lt;p&gt;If you've been working with Python for a bit, especially in the particular case of data scraping, you've probably encountered situations where you are blocked while trying to retrieve the data you want. In such a situation, knowing how to use a proxy is a handy skill to have. &lt;/p&gt;

&lt;p&gt;In this article, we'll explore what proxies are, why they're useful, and how you can use them using the library request in Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Proxy?
&lt;/h2&gt;

&lt;p&gt;Let’s start from the beginning by defining what a proxy is.&lt;/p&gt;

&lt;p&gt;You can think of a proxy server as a “middleman” between your computer and the internet. When you send a request to a website, the request goes through the proxy server first. The proxy then forwards your request to the website, receives the response, and sends it back to you. This process masks your IP address, making it appear as if the request is coming from the proxy server instead of your own device.&lt;/p&gt;

&lt;p&gt;As understandable, this has a lot of consequences and uses. For example, it can be used to bypass some pesky IP restrictions, or maintain anonymity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why use a proxy in web scraping?
&lt;/h2&gt;

&lt;p&gt;So, why proxies might be helpful while scraping data? Well, we already gave a reason before. For example, you can use them to bypass some restrictions.&lt;/p&gt;

&lt;p&gt;So, in the particular case of web scraping, they can be useful for the following reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Avoiding IP blocking&lt;/strong&gt;: websites often monitor for suspicious activity, like a single IP making numerous requests in a short time.
Using proxies helps distribute your requests across multiple IPs avoiding being blocked.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bypassing geo-restrictions&lt;/strong&gt;: some content is only accessible from certain locations and proxies can help you appear as if you're accessing the site from a different country.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhancing privacy&lt;/strong&gt;: proxies are useful to keep your scraping activities anonymous by hiding your real IP address.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to use a proxy in Python using &lt;code&gt;requests&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;requests&lt;/code&gt; library is a popular choice for making HTTP requests in Python and incorporating proxies into your requests is straightforward.&lt;/p&gt;

&lt;p&gt;Let’s see how!&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting Valid Proxies
&lt;/h3&gt;

&lt;p&gt;First things first: you have to get valid proxies before actually using them. To do so, you have two options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free proxies&lt;/strong&gt;: you can get proxies for free from websites like &lt;a href="https://free-proxy-list.net/" rel="noopener noreferrer"&gt;Free Proxy List&lt;/a&gt;. They're easily accessible but, however, they can be unreliable or slow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paid proxies&lt;/strong&gt;: services like &lt;a href="https://brightdata.com/" rel="noopener noreferrer"&gt;Bright Data&lt;/a&gt; or &lt;a href="https://www.scraperapi.com/" rel="noopener noreferrer"&gt;ScraperAPI&lt;/a&gt; provide reliable proxies with better performance and support, but you have to pay.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Using Proxies with &lt;code&gt;requests&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Now that you have your list of proxies you can start using them. For example, you can create a dictionary like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;proxies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://proxy_ip:proxy_port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://proxy_ip:proxy_port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can make a request using the proxies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;proxies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://your_proxy_ip:proxy_port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://your_proxy_ip:proxy_port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://httpbin.org/ip&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To see the outcome of your request, you can print the response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Should return 200 if successful
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="c1"&gt;# Prints the content of the response
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that, if everything went smoothly, the response should display the IP address of the proxy server, not yours.&lt;/p&gt;

&lt;h3&gt;
  
  
  Proxy Authentication Using &lt;code&gt;requests&lt;/code&gt;: Username and Password
&lt;/h3&gt;

&lt;p&gt;If your proxy requires authentication, you can handle it in a couple of ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Method 1: including Credentials in the Proxy URL&lt;/strong&gt;&lt;br&gt;
To include the username and password to manage authentication in your proxy, you can do so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;proxies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://username:password@proxy_ip:proxy_port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://username:password@proxy_ip:proxy_port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Method 2: using &lt;code&gt;HTTPProxyAuth&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Alternatively, you can use the &lt;code&gt;HTTPProxyAuth&lt;/code&gt; class to handle authentication like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;requests.auth&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HTTPProxyAuth&lt;/span&gt;

&lt;span class="n"&gt;proxies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://proxy_ip:proxy_port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://proxy_ip:proxy_port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HTTPProxyAuth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://httpbin.org/ip&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to Use a Rotating Proxy with &lt;code&gt;requests&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Using a single proxy might not be sufficient if you're making numerous requests. In this case, you can use a rotating proxy: this changes the proxy IP address at regular intervals or per request.&lt;/p&gt;

&lt;p&gt;If you’d like to test this solution, you have two options: manually rotate proxies using a list or using a proxy rotation service.&lt;/p&gt;

&lt;p&gt;Let’s see both approaches!&lt;/p&gt;

&lt;h3&gt;
  
  
  Using a List of Proxies
&lt;/h3&gt;

&lt;p&gt;If you have a list of proxies, you can rotate them manually like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="n"&gt;proxies_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://proxy1_ip:port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://proxy2_ip:port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://proxy3_ip:port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# Add more proxies as needed
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_random_proxy&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;proxy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;proxies_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;proxy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_random_proxy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://httpbin.org/ip&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using a Proxy Rotation Service
&lt;/h3&gt;

&lt;p&gt;Services like &lt;a href="https://www.scraperapi.com/rotating-proxies-for-web-scraping/" rel="noopener noreferrer"&gt;ScraperAPI&lt;/a&gt; handle proxy rotation for you. You typically just need to update the proxy URL they provide and manage a dictionary of URLs like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;proxies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://your_service_proxy_url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://your_service_proxy_url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://httpbin.org/ip&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;Using a proxy in Python is a valuable technique for web scraping, testing, and accessing geo-restricted content. As we’ve seen, integrating proxies into your HTTP requests is straightforward using the library requests.&lt;/p&gt;

&lt;p&gt;A few parting tips when scraping data from the web:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Respect website policies&lt;/strong&gt;: always check the website's robots.txt file and terms of service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handle exceptions&lt;/strong&gt;: network operations can fail for various reasons, so make sure to handle exceptions and implement retries if necessary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure your credentials&lt;/strong&gt;: if you're using authenticated proxies, keep your credentials safe and avoid hardcoding them into your scripts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy coding!&lt;/p&gt;

</description>
      <category>python</category>
      <category>webscraping</category>
      <category>development</category>
    </item>
    <item>
      <title>How to Use Lambda Functions in Python</title>
      <dc:creator>Federico Trotta</dc:creator>
      <pubDate>Wed, 30 Oct 2024 16:03:10 +0000</pubDate>
      <link>https://forem.com/appsignal/how-to-use-lambda-functions-in-python-3llj</link>
      <guid>https://forem.com/appsignal/how-to-use-lambda-functions-in-python-3llj</guid>
      <description>&lt;p&gt;Lambda functions in Python are a powerful way to create small, anonymous functions on the fly. These functions are typically used for short, simple operations where the overhead of a full function definition would be unnecessary.&lt;/p&gt;

&lt;p&gt;While traditional functions are defined using the &lt;code&gt;def&lt;/code&gt; keyword, Lambda functions are defined using the &lt;code&gt;lambda&lt;/code&gt; keyword and are directly integrated into lines of code. In particular, they are often used as arguments for built-in functions. They enable developers to write clean and readable code by eliminating the need for temporary function definitions.&lt;/p&gt;

&lt;p&gt;In this article, we'll cover what Lambda functions do and their syntax. We'll also provide some examples and best practices for using them, and discuss their pros and cons.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Lambda functions have been a part of Python since version 2.0, so you'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Minimum Python version&lt;/strong&gt;: 2.0.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommended Python version&lt;/strong&gt;: 3.10 or later.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this tutorial, we'll see how to use Lambda functions with the library &lt;a href="https://pandas.pydata.org/" rel="noopener noreferrer"&gt;Pandas&lt;/a&gt;: a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation library. If you don't have it installed, run the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Syntax and Basics of Lambda Functions for Python
&lt;/h2&gt;

&lt;p&gt;First, let's define the syntax developers must use to create Lambda functions.&lt;/p&gt;

&lt;p&gt;A Lambda function is defined using the &lt;code&gt;lambda&lt;/code&gt; keyword, followed by one or more arguments and an expression:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lambda arguments: expression
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's imagine we want to create a Lambda function that adds up two numbers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;add&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This results in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We've created an anonymous function that takes two arguments, &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt;. Unlike traditional functions, Lambda functions don't have a name: that's why we say they are "anonymous."&lt;/p&gt;

&lt;p&gt;Also, we don't use the &lt;code&gt;return&lt;/code&gt; statement, as we do in regular Python functions. So we can use the Lambda function at will: it can be printed (as we did in this case), stored in a variable, etc.&lt;/p&gt;

&lt;p&gt;Now let's see some common use cases for Lambda functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Use Cases for Lambda Functions
&lt;/h2&gt;

&lt;p&gt;Lambda functions are particularly used in situations where we need a temporarily simple function. In particular, they are commonly used as arguments for higher-order functions.&lt;/p&gt;

&lt;p&gt;Let's see some practical examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using Lambda Functions with the &lt;code&gt;map()&lt;/code&gt; Function
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;map()&lt;/code&gt; is a built-in function that applies a given function to each item of an iterable and returns a map object with the results.&lt;/p&gt;

&lt;p&gt;For example, let's say we want to calculate the square roots of each number in a list. We could use a Lambda function like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Define the list of numbers
&lt;/span&gt;&lt;span class="n"&gt;numbers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Calculate square values and print results
&lt;/span&gt;&lt;span class="n"&gt;squared&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;squared&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This results in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1, 4, 9, 16]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We now have a list containing the square roots of the initial numbers.&lt;/p&gt;

&lt;p&gt;As we can see, this greatly simplifies processes to use functions on the fly that don't need to be reused later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using Lambda Functions with the &lt;code&gt;filter()&lt;/code&gt; Function
&lt;/h3&gt;

&lt;p&gt;Now, suppose we have a list of numbers and want to filter even numbers.&lt;/p&gt;

&lt;p&gt;We can use a Lambda function as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Create a list of numbers
&lt;/span&gt;&lt;span class="n"&gt;numbers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Filter for even numbers and print results
&lt;/span&gt;&lt;span class="n"&gt;even&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;even&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This results in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[2,4]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using Lambda Functions with the &lt;code&gt;sorted()&lt;/code&gt; Function
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;sorted()&lt;/code&gt; function in Python returns a new sorted list from the elements of any iterable. Using Lambda functions, we can apply specific filtering criteria to these lists.&lt;/p&gt;

&lt;p&gt;For example, suppose we have a list of points in two dimensions: &lt;code&gt;(x,y)&lt;/code&gt;. We want to create a list that orders the &lt;code&gt;y&lt;/code&gt; values incrementally.&lt;/p&gt;

&lt;p&gt;We can do it like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Creates a list of points
&lt;/span&gt;&lt;span class="n"&gt;points&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="c1"&gt;# Sort the points and print
&lt;/span&gt;&lt;span class="n"&gt;points_sorted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;points_sorted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[(5, -1), (3, 1), (1, 2)]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using Lambda Functions in List Comprehensions
&lt;/h3&gt;

&lt;p&gt;Given their conciseness, Lambda functions can be embedded in list comprehensions for on-the-fly computations.&lt;/p&gt;

&lt;p&gt;Suppose we have a list of numbers. We want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Iterate over the whole list&lt;/li&gt;
&lt;li&gt;Calculate and print double the initial values.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's how we can do that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Create a list of numbers
&lt;/span&gt;&lt;span class="n"&gt;numbers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Calculate and print the double of each one
&lt;/span&gt;&lt;span class="n"&gt;squared&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;squared&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we obtain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1, 4, 9, 16]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advantages of Using Lambda Functions
&lt;/h2&gt;

&lt;p&gt;Given the examples we've explored, let's run through some advantages of using Lambda functions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Conciseness and readability where the logic is simple&lt;/strong&gt;: Lambda functions allow for concise code, reducing the need for standard function definitions. This improves readability in cases where function logic is simple.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced functional programming capabilities&lt;/strong&gt;: Lambda functions align well with functional programming principles, enabling functional constructs in Python code. In particular, they facilitate the use of higher-order functions and the application of functions as first-class objects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When and why to prefer Lambda functions&lt;/strong&gt;: Lambda functions are particularly advantageous when defining short, "throwaway" functions that don't need to be reused elsewhere in code. So they are ideal for inline use, such as arguments to higher-order functions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Limitations and Drawbacks
&lt;/h2&gt;

&lt;p&gt;Let's briefly discuss some limitations and drawbacks of Lambda functions in Python:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Readability challenges in complex expressions&lt;/strong&gt;: While Lambda functions are concise, they can become difficult to read and understand when used for complex expressions. This can lead to code that is harder to maintain and debug.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limitations in error handling and debugging&lt;/strong&gt;: As Lambda functions can only contain a single expression, they can't include statements, like the &lt;code&gt;try-except&lt;/code&gt; block for error handling. This limitation makes them unsuitable for complex operations that require these features.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restricted functionality&lt;/strong&gt;: Since Lambda functions can only contain a single expression, they are less versatile than standard functions. This by-design restriction limits their use to simple operations and transformations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best Practices for Using Lambda Functions
&lt;/h2&gt;

&lt;p&gt;Now that we've considered some pros and cons, let's define some best practices for using Lambda functions effectively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Keep them simple&lt;/strong&gt;: To maintain readability and simplicity, Lambda functions should be kept short and limited to straightforward operations. Functions with complex logic should be refactored into standard functions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid overuse&lt;/strong&gt;: While Lambda functions are convenient for numerous situations, overusing them can lead to code that is difficult to read and maintain. Use them judiciously and opt for standard functions when clarity is fundamental.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Combine Lambda functions with other Python features&lt;/strong&gt;: As we've seen, Lambda functions can be effectively combined with other Python features, such as list comprehensions and higher-order functions. This can result in more expressive and concise code when used appropriately.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Advanced Techniques with Lambda Functions
&lt;/h2&gt;

&lt;p&gt;In certain cases, more advanced Lambda function techniques can be of help.&lt;/p&gt;

&lt;p&gt;Let's see some examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nested Lambda Functions
&lt;/h3&gt;

&lt;p&gt;Lambda functions can be nested for complex operations.&lt;/p&gt;

&lt;p&gt;This technique is useful in scenarios where you need to have multiple small transformations in a sequence.&lt;/p&gt;

&lt;p&gt;For example, suppose you want to create a function that calculates the square root of a number and then adds 1. Here's how you can use Lambda functions to do so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Create a nested lambda function
&lt;/span&gt;&lt;span class="n"&gt;nested_lambda&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="c1"&gt;# Print the result for the value 3
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;nested_lambda&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Integration with Python Libraries for Advanced Functionality
&lt;/h3&gt;

&lt;p&gt;Many Python libraries leverage Lambda functions to simplify complex data processing tasks.&lt;/p&gt;

&lt;p&gt;For example, Lambda functions can be used with &lt;code&gt;Pandas&lt;/code&gt; and &lt;code&gt;NumPy&lt;/code&gt; to simplify data manipulation and transformation.&lt;/p&gt;

&lt;p&gt;Suppose we have a data frame with two columns. We want to create another column that is the sum of the other two. In this case, we can use Lambda functions as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Create the columns' data
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;B&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

&lt;span class="c1"&gt;# Create data frame
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create row C as A+B and print the dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;C&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;B&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="n"&gt;A&lt;/span&gt;  &lt;span class="n"&gt;B&lt;/span&gt;  &lt;span class="n"&gt;C&lt;/span&gt;
&lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="mi"&gt;4&lt;/span&gt;  &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="mi"&gt;2&lt;/span&gt;  &lt;span class="mi"&gt;5&lt;/span&gt;  &lt;span class="mi"&gt;7&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;  &lt;span class="mi"&gt;3&lt;/span&gt;  &lt;span class="mi"&gt;6&lt;/span&gt;  &lt;span class="mi"&gt;9&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it for our whistle-stop tour of Lambda functions in Python!&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;In this article, we've seen how to use Lambda functions in Python, explored their pros and cons, some best practices, and touched on a couple of advanced use cases.&lt;/p&gt;

&lt;p&gt;Happy coding!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;P.S. If you'd like to read Python posts as soon as they get off the press, &lt;a href="https://dev.to/python-wizardry"&gt;subscribe to our Python Wizardry newsletter and never miss a single post&lt;/a&gt;!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>lambda</category>
    </item>
    <item>
      <title>Accelerating Polars with RAPIDS cuDF</title>
      <dc:creator>Federico Trotta</dc:creator>
      <pubDate>Tue, 17 Sep 2024 15:51:31 +0000</pubDate>
      <link>https://forem.com/federicotrotta/accelerating-polars-with-rapids-cudf-3833</link>
      <guid>https://forem.com/federicotrotta/accelerating-polars-with-rapids-cudf-3833</guid>
      <description>&lt;p&gt;If you’re a data scientist who migrated from Pandas to Polars because of its performance, you may be happy that Polars has powered up even further thanks to NVIDIA’s cuDF.&lt;/p&gt;

&lt;p&gt;Did I get your attention? Well, read along!&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Polars
&lt;/h2&gt;

&lt;p&gt;In today’s analytics world, data frames are the backbone of most data work. Whether you're cleaning data, transforming it, or running complex analyses, data frames let you organize and manipulate data in a way that feels intuitive. This is mainly because data frames:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Are versatile&lt;/strong&gt;: DataFrames APIs are less verbose than SQL for complex queries.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provide easy Integration:&lt;/strong&gt; Data frames integrate well with existing software solutions (for example. with plotting and ML libraries).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provide a single format for data science and engineering&lt;/strong&gt;: Data frames support both data engineering workflows and that of data scientists.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the last few years, Pandas has been king in this space, but with more data than ever and growing needs for performance, tools like &lt;a href="https://pola.rs/" rel="noopener noreferrer"&gt;Polars&lt;/a&gt; are stepping in to meet those demands without sacrificing the simplicity we’ve come to love from data frames.&lt;/p&gt;

&lt;p&gt;In case you didn't know, Polars is a Python library for data analysis that’s gaining popularity as a speedier alternative to Pandas. While Pandas is the go-to for most data scientists and engineers, it can get sluggish when handling really big datasets.&lt;/p&gt;

&lt;p&gt;Polars, on the other hand, is built with performance in mind and it’s optimized to handle massive datasets much faster, thanks to its use of parallelization and a more modern backend. So, if you've ever felt like Pandas was holding you back with long processing times, Polars might be the upgrade you’re looking for.&lt;/p&gt;

&lt;p&gt;However, if you already use Polar, you may have noticed that its superpowers may not be good enough for very large datasets, especially in distributed systems:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp9eyxxyintxxbb7wz915.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp9eyxxyintxxbb7wz915.png" alt="comparison between systems" width="657" height="574"&gt;&lt;/a&gt;&lt;br&gt;
(Image from NVIDIA/Polars)&lt;/p&gt;

&lt;p&gt;So, let’s see the solution that has been implemented and what to expect from it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Accelerating Polars with RAPIDS
&lt;/h2&gt;

&lt;p&gt;When it comes to processing data with very large datasets, as in industries quantitative finance, healthcare research, and similar, the performance needs to be even higher, due to the great amount of data.&lt;/p&gt;

&lt;p&gt;And here’s why NVIDIA has accelerated Polars with the &lt;a href="https://rapids.ai/" rel="noopener noreferrer"&gt;RAPIDS cuDF&lt;/a&gt; library.&lt;/p&gt;

&lt;p&gt;Here’s what they’ve done:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The RAPIDS cuDF library accelerates Polars workflows up to 13x+ using NVIDIA GPUs. In particular, it’s directly integrated into the Polars Lazy API, so you don’t need to change your code.
&lt;/li&gt;
&lt;li&gt;It has been designed to make processing 100s of millions of rows of
data feel interactive with just a single GPU.
&lt;/li&gt;
&lt;li&gt;The library it’s fully compatible with the ecosystem of tools built for Polars, thus reducing overhead.
&lt;/li&gt;
&lt;li&gt;It gracefully falls back to the CPU for unsupported queries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In particular, moving to RAPIDS cuDF if you already have written Polars code is pretty straight forward, as you only need to add an ‘enging=gpu’ method. &lt;/p&gt;

&lt;p&gt;For example, this is an example written in plain Polars:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa401jml2o4ocpajthvgw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa401jml2o4ocpajthvgw.png" alt="Code in Polars by Federico Trotta" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And here’s Polars accelerated with RAPIDS:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftw2jqprjz4uzherj8xe3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftw2jqprjz4uzherj8xe3.png" alt="Polars code accelerated by RAPIDS by Federico Trotta" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  What to expect?
&lt;/h3&gt;

&lt;p&gt;First of all, using Polars on a GPU should feel the same as using it on the CPU: just faster for many workflows.&lt;/p&gt;

&lt;p&gt;The GPU engine, in fact, fully utilizes the Polars optimizer to ensure efficient execution and minimal memory usage.&lt;/p&gt;

&lt;p&gt;Also, as the team was working on accelerating Polars, they benchmarked it with industry standards and found that, as the data scaled, the performance of Polars (accelerated) scaled too:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frrhtrzno4ve17d9s6tn8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frrhtrzno4ve17d9s6tn8.png" alt="Benchmark by NVIDIA" width="800" height="315"&gt;&lt;/a&gt;&lt;br&gt;
(The Benchmark made by NVIDIA)&lt;/p&gt;

&lt;p&gt;This is perfectly expected, as Polars is accelerated on GPUs (note that the benchmark has been realized on NVIDIA H100).&lt;/p&gt;
&lt;h3&gt;
  
  
  How to use it?
&lt;/h3&gt;

&lt;p&gt;To accelerate Polars with cuDF, you first need to install it in an environment that allows you to use GPUs, for example in Google Colaboratory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;polars&lt;span class="se"&gt;\[&lt;/span&gt;gpu&lt;span class="se"&gt;\]&lt;/span&gt; &lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="nt"&gt;-extra-index-url&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;https://pypi.nvidia.com]&lt;span class="o"&gt;(&lt;/span&gt;https://pypi.nvidia.com&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following example is taken from a 22GB dataset (link at the end of the article to test it).&lt;/p&gt;

&lt;p&gt;Here’s the time needed for an operation with “standard” Polars:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl487rh3c2az4cnchqso8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl487rh3c2az4cnchqso8.png" alt="Polars code by Federico Trotta" width="800" height="286"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And here’s the time needed for the same operation, with accelerated Polars:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdt84068tbdhhcpaaklit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdt84068tbdhhcpaaklit.png" alt="Accelerated Polars code by Federico Trotta" width="800" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, the same operation took:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;12 seconds with Polars.
&lt;/li&gt;
&lt;li&gt;0.34 seconds with accelerated Polars.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;With this new Polars GPU engine, you can potentially reach high performance with huge datasets, maintaining the same Polars code you are already using.&lt;/p&gt;

&lt;p&gt;So, why not give it a try? You can easily test it using a &lt;a href="https://colab.research.google.com/github/rapidsai-community/showcase/blob/main/accelerated_data_processing_examples/polars_gpu_engine_demo.ipynb?utm_source=influencer&amp;amp;utm_medium=social&amp;amp;utm_campaign=organic" rel="noopener noreferrer"&gt;Colab notebook&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Want to read more? &lt;a href="https://pola.rs/posts/gpu-engine-release/" rel="noopener noreferrer"&gt;Here&lt;/a&gt; are all the details about that release directly on the Polars website.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>polars</category>
      <category>data</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Serverless Cost Optimization Three Key Strategies</title>
      <dc:creator>Federico Trotta</dc:creator>
      <pubDate>Tue, 30 Jul 2024 11:44:29 +0000</pubDate>
      <link>https://forem.com/federicotrotta/serverless-cost-optimization-three-key-strategies-442f</link>
      <guid>https://forem.com/federicotrotta/serverless-cost-optimization-three-key-strategies-442f</guid>
      <description>&lt;p&gt;Serverless computing has revolutionized the way developers build and deploy applications, offering significant benefits such as reduced operational complexity, automatic scaling, and a pay-as-you-go pricing model. &lt;/p&gt;

&lt;p&gt;However, while serverless architectures can help you save on costs, they are not free. So, managing their costs effectively requires careful planning and optimization.&lt;/p&gt;

&lt;p&gt;This article explores three key techniques for serverless cost optimization, helping you improve your serverless applications and avoid uneccessary expenses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serverless computing: an introduction for developers
&lt;/h2&gt;

&lt;p&gt;Before discussing and presenting the strategies for serverless cost optimization, we want to briefly introduce what is serverless computing and why you may need it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Introducing serverless computing
&lt;/h3&gt;

&lt;p&gt;Serverless computing is a cloud-native development model that allows developers to build and run applications without managing the infrastructure. In a serverless setup, in fact, cloud service providers automatically allocate and manage servers to execute code in response to events, such as HTTP requests, database changes, or message queue activities. This allows developers to focus only on writing and deploying code, rather than worrying about server provisioning, scaling, and maintenance.&lt;/p&gt;

&lt;p&gt;Also, serverless architecture is particularly appealing for several reasons like the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Payment model&lt;/strong&gt;. Serverless offers a true pay-as-you-go model, where you only pay for the compute time you consume. This can lead to significant cost savings, especially for applications with variable or unpredictable workloads. A typical use case implemented nowadays regards the fact that big AI models, like Deep Neural Networks or Large Language Models, need GPUs to be trained. To save costs on GPUs, a solution can be the possibility of &lt;a href="https://levelup.gitconnected.com/accelerating-ai-how-serverless-gpus-are-revolutionizing-model-training-af14dd978d64" rel="noopener noreferrer"&gt;using serverless&lt;/a&gt; so that you pay-as-you-train the models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automatic scaling&lt;/strong&gt;. Serverless provides automatic scaling, which means your application can handle the variation of loads seamlessly without manual intervention. When demand spikes, the serverless platform automatically scales out; when demand drops, it scales back, ensuring optimal resource usage. This also helps save on costs, since you pay-as-you-use the service, without the need to buy extensive hardware or to pay a monthly fee to a cloud service.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Comparing serverless to other technologies
&lt;/h3&gt;

&lt;p&gt;Serverless advantages can be compared to other methodologies like traditional server-based (virtual machines or dedicated servers), Platform as a Service (PaaS), and containerization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Traditional server-based models&lt;/strong&gt;. This solution requires developers to manage the entire stack, from the physical or virtual server to the application code. This includes tasks like OS updates, patching, and capacity planning, which can be time-consuming and prone to errors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PaaS&lt;/strong&gt;. These solutions simplify some of the tasks needed with traditional server-based models by providing a managed environment for application deployment, but developers still need to handle aspects like scaling and environment configuration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Containerization&lt;/strong&gt;. Containers, often implemented with technologies like Docker and Kubernetes, offer another layer of abstraction by packaging applications and their dependencies into containers. This approach provides greater flexibility and scalability compared to traditional servers and PaaS. However, managing container orchestration, scaling, and networking can still be complex and resource-intensive.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Serverless, on the other hand, abstracts all infrastructure management tasks, allowing developers to deploy individual functions that execute in response to specific triggers. This model reduces operational issues, speeds up development cycles, and improves application resilience by leveraging the cloud provider’s infrastructure. It also integrates seamlessly with other cloud services, enabling the creation of highly scalable, event-driven applications with minimal effort.&lt;/p&gt;

&lt;p&gt;So, serverless solutions should be preferred to the other mentioned in cases of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Variable or unpredictable workloads&lt;/strong&gt;. Serverless is ideal for applications with workloads that vary significantly or are difficult to predict, thanks to its automatic scaling feature.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Event-driven applications&lt;/strong&gt;. Applications that are inherently event-driven, such as those responding to HTTP requests, processing files in object storage, reacting to database changes, or &lt;a href="https://dzone.com/articles/an-introduction-to-stream-processing" rel="noopener noreferrer"&gt;stream processing&lt;/a&gt;, are well-suited for serverless. The event-driven nature of serverless platforms, in fact, allows functions to execute in response to specific triggers, making it efficient and straightforward to build such applications.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rapid development and deployment situations&lt;/strong&gt;. When speed to market is crucial, serverless can accelerate development cycles. By eliminating the need to manage infrastructure, in fact, developers can focus only on writing and deploying code. This may be particularly beneficial for startups or projects requiring rapid iteration and deployment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, given the fact that serverless computing can help you save time and money with respect to the other methodologies described, they, anyway, come with their costs. So, let's continue this article by providing three strategies for serverless cost optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serverless cost optimization strategy 1: optimizing function execution time
&lt;/h2&gt;

&lt;p&gt;One of the most direct ways to reduce serverless costs is by minimizing function execution time which refers to the duration from when a serverless function starts executing until it finishes. &lt;/p&gt;

&lt;p&gt;Serverless providers such as AWS Lambda, Azure Functions, and Google Cloud Functions, charge based on the time it takes for the function to execute. The billing is typically calculated in milliseconds, and combined with the memory allocated to the function, determines the overall cost.&lt;/p&gt;

&lt;p&gt;Here are some best practices to optimize the function execution time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Write efficient code&lt;/strong&gt;. Ensure that your code is optimized for performance by avoiding unnecessary computations, and using efficient algorithms. For example, prefer in-memory operations over database queries where possible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Asynchronous processing&lt;/strong&gt;. Utilize asynchronous processing to handle tasks that can be performed in parallel or do not require immediate completion. This can reduce the time your functions spend waiting, thus lowering execution time and costs. For instance, background tasks such as sending emails or processing logs can typically be handled asynchronously.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory Allocation&lt;/strong&gt;. Choose the appropriate memory allocation for your functions. Allocating more memory can sometimes speed up execution due to higher CPU availability, but over-allocating memory leads to higher costs. Use monitoring tools to analyze your functions' performance and adjust memory settings accordingly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simple example of making efficient Python code that saves memory usage could be the following:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inefficient&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sum_of_squares_inefficient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Use list comprehension inside a sum function
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Example usage
&lt;/span&gt;&lt;span class="n"&gt;numbers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10001&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum_of_squares_inefficient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Efficient&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sum_of_squares_efficient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Use generator expression inside a sum function
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Example usage
&lt;/span&gt;&lt;span class="n"&gt;numbers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10001&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum_of_squares_efficient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The inefficient version uses a list comprehension inside the &lt;code&gt;sum()&lt;/code&gt; function. This creates an intermediate list in memory, which can be memory-intensive and slow, especially for large lists.&lt;/p&gt;

&lt;p&gt;The efficient version uses a generator expression inside the &lt;code&gt;sum()&lt;/code&gt;  function. This avoids creating an intermediate list, yielding elements one by one. This approach is more memory efficient and faster for large datasets and leads to a reduction in the execution time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serverless cost optimization strategy 2: implementing auto-scaling and scheduled scaling
&lt;/h2&gt;

&lt;p&gt;Auto-scaling is a fundamental feature provided by serverless platforms that automatically adjusts the number of function instances based on demand. However, without proper configuration, auto-scaling can lead to cost overruns.&lt;/p&gt;

&lt;p&gt;So, here are some best practices to implement the auto-scaling feature in serverless and save on costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Demand-based auto-scaling&lt;/strong&gt;. Set up auto-scaling policies that align with your application's usage patterns. Configure thresholds for scaling up and down based on metrics such as CPU usage, memory usage, or custom application metrics. This ensures that you are only using the resources you need, when you need them. Note that the most known serverless providers grant the possibility of implementing auto-scaling. For example, AWS Lambda provides AWS Auto Scaling, Azure Functions provides Azure Monitor, while Google Cloud Functions can be configured for auto-scaling with the help of Google Cloud Monitoring.&lt;/p&gt;

&lt;p&gt;Also, consider the possibility of using concurrency autoscaling. This refers to the automatic adjustment of the number of concurrent executions or instances of a serverless function based on the current demand. This helps ensure that the function can handle incoming requests efficiently without being overwhelmed, while also controlling costs by scaling down when demand is low. While the most known serverless providers grant the possibility of implementing concurrency autoscaling, you can also use proper packages for your serverless projects such as the &lt;a href="https://www.serverless.com/plugins/serverless-provisioned-concurrency-autoscaling" rel="noopener noreferrer"&gt;NPM package&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scheduled scaling&lt;/strong&gt;. For applications with predictable traffic patterns, scheduled scaling can be highly effective. By scheduling scaling events to match peak and off-peak times, you can ensure that your application has sufficient resources during high-demand periods while saving costs during low-demand periods. For example, if you know that your application experiences high traffic during business hours, you can schedule additional instances to be available during those times.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementation example
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: All the code described in this section is available in &lt;a href="https://github.com/federico-trotta/semaphore_deploy" rel="noopener noreferrer"&gt;this repository&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's take AWS as an example to illustrate a simple implementation: we want to deploy a Lambda function on AWS using &lt;a href="https://semaphoreci.com/" rel="noopener noreferrer"&gt;Semaphore CI&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;But before that, you need to install the &lt;a href="https://pypi.org/project/boto3/" rel="noopener noreferrer"&gt;Python package boto3&lt;/a&gt; - if you haven't done it yet - by typing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, let's create an example that sets up a demand-based auto-scaling solution by dynamically adjusting the resources allocated to your serverless functions, based on real-time usage metrics in Python (see &lt;code&gt;/function/function.py&lt;/code&gt; in the linked repository). &lt;/p&gt;

&lt;p&gt;First of all, define the metrics that reflect your application's performance and load, such as CPU usage or request latency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="n"&gt;cloudwatch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cloudwatch&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_metric_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MyApp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MetricData&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MetricName&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CPUUsage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Dimensions&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;FunctionName&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;my_lambda_function&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;70.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Unit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Percent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, create alarms based on these metrics to trigger scaling actions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_metric_alarm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;AlarmName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;HighCPUUsageAlarm&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MetricName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CPUUsage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MyApp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Statistic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Average&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Period&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;EvaluationPeriods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;75.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ComparisonOperator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GreaterThanThreshold&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;AlarmActions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;arn:aws:autoscaling:us-west-2:123456789012:scalingPolicy:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:autoScalingGroupName/my-asg:policyName/MyScalingPolicy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, link your Lambda function with Application Auto Scaling to adjust concurrency based on the CloudWatch alarms.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;appscaling&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application-autoscaling&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;appscaling&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register_scalable_target&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ServiceNamespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;lambda&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ResourceId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;function:my_lambda_function&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ScalableDimension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;lambda:function:ProvisionedConcurrency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MinCapacity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MaxCapacity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;appscaling&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_scaling_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;PolicyName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MyScalingPolicy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ServiceNamespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;lambda&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ResourceId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;function:my_lambda_function&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ScalableDimension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;lambda:function:ProvisionedConcurrency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;PolicyType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;TargetTrackingScaling&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TargetTrackingScalingPolicyConfiguration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;TargetValue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;75.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PredefinedMetricSpecification&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PredefinedMetricType&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;LambdaProvisionedConcurrencyUtilization&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ScaleOutCooldown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ScaleInCooldown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: Here we reported the code as shown, for simplicity. In the repository, the Python code remains the same but we created a function out of it, for obvious reasons (we are deploying a lambda function on AWS...).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We created the function in Python. Now we need to write the code that executes the deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="c"&gt;# Define variables&lt;/span&gt;
&lt;span class="nv"&gt;FUNCTION_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"my_lambda_function"&lt;/span&gt;
&lt;span class="nv"&gt;ZIP_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"function.zip"&lt;/span&gt;
&lt;span class="nv"&gt;HANDLER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"function.lambda_handler"&lt;/span&gt;
&lt;span class="nv"&gt;ROLE_ARN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::123456789012:role/my-lambda-role"&lt;/span&gt;
&lt;span class="nv"&gt;RUNTIME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"python3.8"&lt;/span&gt;
&lt;span class="nv"&gt;TIMEOUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;30

&lt;span class="c"&gt;# Go to /function folder&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt;

&lt;span class="c"&gt;# Install requirements and pack the Python function&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt &lt;span class="nt"&gt;-t&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;  
zip &lt;span class="nt"&gt;-r&lt;/span&gt; ../&lt;span class="nv"&gt;$ZIP_FILE&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;  

&lt;span class="c"&gt;# Go to main directory&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ..

&lt;span class="c"&gt;# Verify if lambda function already exists&lt;/span&gt;
aws lambda get-function &lt;span class="nt"&gt;--function-name&lt;/span&gt; &lt;span class="nv"&gt;$FUNCTION_NAME&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt; &lt;span class="nt"&gt;-eq&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Updating existing function..."&lt;/span&gt;
  aws lambda update-function-code &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--function-name&lt;/span&gt; &lt;span class="nv"&gt;$FUNCTION_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--zip-file&lt;/span&gt; fileb://&lt;span class="nv"&gt;$ZIP_FILE&lt;/span&gt;
&lt;span class="k"&gt;else
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Creating new function..."&lt;/span&gt;
  aws lambda create-function &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--function-name&lt;/span&gt; &lt;span class="nv"&gt;$FUNCTION_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--zip-file&lt;/span&gt; fileb://&lt;span class="nv"&gt;$ZIP_FILE&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--handler&lt;/span&gt; &lt;span class="nv"&gt;$HANDLER&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--runtime&lt;/span&gt; &lt;span class="nv"&gt;$RUNTIME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--role&lt;/span&gt; &lt;span class="nv"&gt;$ROLE_ARN&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--timeout&lt;/span&gt; &lt;span class="nv"&gt;$TIMEOUT&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Remove zip file after upload&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nv"&gt;$ZIP_FILE&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;deploy.sh&lt;/code&gt; bash script does the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Goes into the &lt;code&gt;function&lt;/code&gt; directory and installs the Python dependencies that are in the &lt;code&gt;requirements.txt&lt;/code&gt; file in the current directory.&lt;/li&gt;
&lt;li&gt;Creates a &lt;code&gt;.zip&lt;/code&gt; file that contains the Python function and the dependencies.&lt;/li&gt;
&lt;li&gt;Verifies if the lambda function already exists:

&lt;ul&gt;
&lt;li&gt;If exists, it updates the code with the new one contained in the &lt;code&gt;.zip&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;If it does not exist, it creates a new Lambda function using the &lt;code&gt;.zip&lt;/code&gt; file.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Removes the &lt;code&gt;.zip&lt;/code&gt; file after the deployment is ended.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Finally, the file &lt;code&gt;semaphore.yaml&lt;/code&gt; defines the CI/CD pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1.0&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Initial Pipeline&lt;/span&gt;
&lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;machine&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;e1-standard-2&lt;/span&gt;
    &lt;span class="na"&gt;os_image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu2004&lt;/span&gt;
&lt;span class="na"&gt;blocks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install Dependencies&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install AWS CLI&lt;/span&gt;
          &lt;span class="na"&gt;commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sudo apt-get update&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sudo apt-get install -y python3-pip&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;pip3 install awscli&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;pip3 install boto3&lt;/span&gt;
      &lt;span class="na"&gt;prologue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;checkout&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy to AWS&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws_credentials&lt;/span&gt;
          &lt;span class="na"&gt;commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;chmod +x deploy.sh&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./deploy.sh&lt;/span&gt;
      &lt;span class="na"&gt;prologue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;checkout&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;semaphore.yaml&lt;/code&gt; does the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the initial part, it specifies a &lt;code&gt;version&lt;/code&gt;, a &lt;code&gt;name&lt;/code&gt; for the pipeline, a machine type (&lt;code&gt;machine&lt;/code&gt;), and an OS image.&lt;/li&gt;
&lt;li&gt;The section &lt;code&gt;block&lt;/code&gt; (&lt;code&gt;Install Dependencies&lt;/code&gt;):

&lt;ul&gt;
&lt;li&gt;Downloads the latest version of the code with &lt;code&gt;checkout&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Updates the Ubuntu packages.&lt;/li&gt;
&lt;li&gt;Installs &lt;code&gt;pip&lt;/code&gt; to manage Python packages.&lt;/li&gt;
&lt;li&gt;Installs &lt;code&gt;awscli&lt;/code&gt; and &lt;code&gt;boto3&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;The section &lt;code&gt;Deploy to AWS&lt;/code&gt;:

&lt;ul&gt;
&lt;li&gt;Defines the credentials to make the deployment throuh &lt;code&gt;secrets&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Deploys the Lambda function with the latest commands (&lt;code&gt;checkout&lt;/code&gt;, etc...).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Note that, to make the code work, you need to configure the &lt;code&gt;secrets&lt;/code&gt; in Sempahore CI including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt;: this is the ID to access AWS.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt;: this is the secret key to access AWS.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: To learn more about how to use &lt;code&gt;yaml&lt;/code&gt; in Semaphore, read the &lt;a href="https://docs.semaphoreci.com/reference/pipeline-yaml-reference/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Serverless cost optimization strategy 3: monitoring and right-sizing resource usage
&lt;/h2&gt;

&lt;p&gt;Continuous monitoring of serverless functions is another useful way of maintaining cost efficiency in serverless solutions. By regularly reviewing performance metrics and resource usage, in fact, you can make informed decisions about resource allocation and configuration, and make adjustments accordingly.&lt;/p&gt;

&lt;p&gt;Here are some best practices to implement as a reference:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use monitoring tools&lt;/strong&gt;. Use monitoring tools provided by your serverless platform or third-party solutions to track function performance, execution times, and resource usage. Tools like AWS CloudWatch, Azure Monitor, or Google Cloud Monitoring offer insights into how your functions are performing and where inefficiencies may lie.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Analyze metrics&lt;/strong&gt;. Analyze metrics regularly to identify patterns and anomalies. For example, look for functions with consistently high execution times or memory usage and investigate potential causes. This can help you pinpoint areas where optimizations are needed.&lt;/p&gt;

&lt;p&gt;Also, if you work in a CI/CI environment, you can consider using &lt;a href="https://semaphoreci.com/product/metrics-and-observability" rel="noopener noreferrer"&gt;Semaphore&lt;/a&gt; as it streamlines issue detection and addresses error-prone tasks and unpredictable tests that could cause sporadic build failures.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Right-size resources&lt;/strong&gt;. Based on your analyses, right-size your functions to ensure they have the appropriate resources. This might involve reducing memory allocation for functions that do not require it, or splitting larger functions into smaller, more efficient ones. Right-sizing helps avoid over-provisioning and ensures that you are not paying for unused resources.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;In this article, we've shown that serverless cost optimization involves a combination of optimizing function execution time, implementing intelligent scaling strategies, and continuously monitoring and right-sizing resource usage.&lt;/p&gt;

&lt;p&gt;By adopting these strategies, you can ensure that your serverless applications run efficiently and cost-effectively.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Pandas reset_index(): How To Reset Indexes in Pandas</title>
      <dc:creator>Federico Trotta</dc:creator>
      <pubDate>Sat, 27 Apr 2024 14:34:12 +0000</pubDate>
      <link>https://forem.com/federicotrotta/pandas-resetindex-how-to-reset-indexes-in-pandas-475b</link>
      <guid>https://forem.com/federicotrotta/pandas-resetindex-how-to-reset-indexes-in-pandas-475b</guid>
      <description>&lt;p&gt;In data analysis, managing the structure and layout of data before analyzing them is crucial. Python offers versatile tools to manipulate data, including the often-used &lt;a href="https://pandas.pydata.org/"&gt;Pandas&lt;/a&gt; &lt;a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html"&gt;&lt;code&gt;reset_index()&lt;/code&gt;&lt;/a&gt; method.&lt;/p&gt;

&lt;p&gt;This article provides an in-depth exploration of the Pandas &lt;code&gt;reset_index()&lt;/code&gt; method, explaining its importance, usage, and the scenarios where it’s useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Pandas reset_index() and when to use it?
&lt;/h2&gt;

&lt;p&gt;![Pandas reset_index() visualized as real pandas playing in the threes by Federico Trotta(&lt;a href="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6u9e4roresb0i6tuu344.png"&gt;https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6u9e4roresb0i6tuu344.png&lt;/a&gt;)&lt;br&gt;
(Pandas playing in the threes. Image by Federico Trotta.)&lt;/p&gt;

&lt;p&gt;In Pandas, each DataFrame and Series has an index, which is a set of labels used for identifying each row or item uniquely.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;reset_index()&lt;/code&gt; method is used to reset the index of the DataFrame or Series, which can involve turning the index into a regular column, or discarding it entirely. This is particularly useful when the index needs reorganizing, or when integrating the index into DataFrame columns for further analysis.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;reset_index()&lt;/code&gt; is typically used in the following scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reverting an index after group operations&lt;/strong&gt;. Post-grouping operations might leave you with grouped or multi-level indexes which are sometimes inconvenient for further analysis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrating the index as a feature&lt;/strong&gt;. If the index itself carries valuable data (e.g., time stamps or unique identifiers), you might want to move it into a DataFrame column to use as a feature in data analysis or machine learning models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resetting after sorting or filtering&lt;/strong&gt;. Sorting or filtering can alter the order or number of rows, and resetting the index can be necessary to maintain a contiguous, integer index.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  How to use Pandas reset_index()
&lt;/h2&gt;

&lt;p&gt;The basic syntax of &lt;code&gt;reset_index()&lt;/code&gt; is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col_level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col_fill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each parameter has a specific function:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;level&lt;/code&gt;. It Specifies which index levels to reset (for MultiIndex).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;drop&lt;/code&gt;. If True, the old index is discarded and not added as a column in the new DataFrame.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;inplace&lt;/code&gt;. If True, modifies the DataFrame in-place; otherwise, a new DataFrame is returned.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;col_level&lt;/code&gt;, &lt;code&gt;col_fill&lt;/code&gt;. Is used when the columns are a MultiIndex.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Usage examples&lt;/strong&gt;&lt;br&gt;
Basic reset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# Create a DataFrame
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;c&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Original DataFrame:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Reset the index
&lt;/span&gt;&lt;span class="n"&gt;reset_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;DataFrame after reset_index():&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reset_df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That results is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Original DataFrame:
   Data
a    10
b    20
c    30
d    40

DataFrame after reset_index():
  index  Data
0     a    10
1     b    20
2     c    30
3     d    40
Dropping an index
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the index is irrelevant and not needed as a column, set the parameter &lt;code&gt;drop=True&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;reset_df_drop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reset_df_drop&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That results is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   Data
0    10
1    20
2    30
3    40
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Multi-index reset&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Create a MultiIndex DataFrame
&lt;/span&gt;&lt;span class="n"&gt;mindex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MultiIndex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_tuples&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;first&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;second&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;df_multi&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mindex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Original MultiIndex DataFrame:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_multi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Reset the 'second' level of the index
&lt;/span&gt;&lt;span class="n"&gt;reset_multi_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_multi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;second&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;DataFrame after resetting &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;second&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; level:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reset_multi_df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That results in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Original MultiIndex DataFrame:
              Data
first second      
1     a        100
      b        200
2     a        300
      b        400

DataFrame after resetting 'second' level:
      second  Data
first             
1          a   100
1          b   200
2          a   300
2          b   400
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;Pandas &lt;code&gt;reset_index()&lt;/code&gt; is a versatile tool in the Pandas library that provides essential functionality for DataFrame and Series index manipulation. Whether you’re preparing data for analysis, integrating index data as a feature, or simply organizing data post-transformation, understanding how it works will speed your processes up.&lt;/p&gt;

&lt;p&gt;--&lt;/p&gt;

&lt;p&gt;Hi, my name is Federico and I am a freelance Technical Writer.&lt;/p&gt;

&lt;p&gt;Do you want to start a documentation project, collaborating with me? &lt;a href="https://bio.link/federicotrotta"&gt;Contact me&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Do you want to know more about my work? You can start with my &lt;a href="https://federico-trotta.github.io/index.html"&gt;portfolio&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;--&lt;br&gt;
The article "Pandas reset_index(): How T Reset Indexes in Pandas" was first published &lt;a href="https://federicotrotta.com/pandas-reset_index-how-to-reset-indexes-in-pandas"&gt;in my blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How To Easily Remove a Password From a PDF file</title>
      <dc:creator>Federico Trotta</dc:creator>
      <pubDate>Wed, 24 Apr 2024 13:36:54 +0000</pubDate>
      <link>https://forem.com/federicotrotta/how-to-easily-remove-a-password-from-a-pdf-file-158b</link>
      <guid>https://forem.com/federicotrotta/how-to-easily-remove-a-password-from-a-pdf-file-158b</guid>
      <description>&lt;p&gt;Imagine a scenario (which happened to me): your employer gives you the documentation relating to your financial situation and, at a certain point, you have to give it to your financial advisor.&lt;/p&gt;

&lt;p&gt;Problem: the document is in PDF and is password-protected (which is good), but you can give it to your financial advisor only by loading it on a platform. How do you tell your financial advisor the password?&lt;/p&gt;

&lt;p&gt;These are the scenarios I imagined to solve the problem:&lt;/p&gt;

&lt;p&gt;You can load the file on the platform and tell the financial advisor the password via email or phone.&lt;/p&gt;

&lt;p&gt;You can name the file as &lt;code&gt;password_******&lt;/code&gt; so that they can understand what the password is.&lt;/p&gt;

&lt;p&gt;You can print the file, scan it, and create a new PDF (problem: in my case, these were 30 pages!!).&lt;/p&gt;

&lt;p&gt;Now, as a Cyber Security enthusiast, I didn’t want to share the password with anyone, even if it was a unique one, for obvious reasons. Also, I didn’t want to waste time and paper to print 30 pages just to create another PDF.&lt;/p&gt;

&lt;p&gt;So, the only (right) solution was to find something to remove the password from the file, so that it could be opened by my financial advisor.&lt;/p&gt;

&lt;p&gt;In this article, I show you how you can remove a password from a PDF in a couple of minutes, and for free.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to remove a password from a PDF file
&lt;/h2&gt;

&lt;p&gt;Before arriving at the solution I’m showing you, I’ve navigated a lot on the Internet.&lt;/p&gt;

&lt;p&gt;Let me tell you one thing: you can find a lot of online services that can remove a password from a PDF file, but, in my case, I didn’t want to use one of them for a simple reason: the file was about my financial situation, so I didn't want to leave somehow a track or a record of my financial situation on a third party database (which is not my financial advisor’s one).&lt;/p&gt;

&lt;p&gt;So, in this scenario, I found &lt;strong&gt;qpdf&lt;/strong&gt; which is a library that can convert a PDF file into an equivalent one. It has only one disadvantage: it can be used only via a terminal.&lt;/p&gt;

&lt;p&gt;But don’t worry, is very easy to use it.&lt;/p&gt;

&lt;p&gt;Let’s see.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Qpdf&lt;/strong&gt; can be installed and used on any OS, but here we’ll see the procedure to do so on Ubuntu.&lt;br&gt;
If you are a Windows user, don’t worry: you can install WSL)&lt;/p&gt;

&lt;p&gt;So, in a Linux environment, install qpdf by typing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ sudo apt-get install qpdf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, at this point, we have to be careful if we don’t want to make mistakes. If you are using Ubuntu on Windows (via WSL) you have to move your PDF file to the environment where Ubuntu actually works. You should see something like the following:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwld9sfb66qz1iijxdqjh.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwld9sfb66qz1iijxdqjh.jpg" alt="The Linux environment by Federico Trotta" width="267" height="255"&gt;&lt;/a&gt;&lt;br&gt;
The Linux environment on Windows created by WSL. Image by Federico Trotta.&lt;/p&gt;

&lt;p&gt;So, as you can see, I have Ubuntu 18.04 under the Linux environment. This folder has been created by WSL when you install it.&lt;/p&gt;

&lt;p&gt;So, at this point suppose that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your file is called my_file and it is located on the following path: &lt;code&gt;C:/Home/Linux/Ubuntu-18.04/my_file.pdf&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;my_file is protected with the following password: my_fileExample&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Qpdf&lt;/strong&gt; will create a copy of the file named my_file_free located in &lt;code&gt;C:/Home/Linux/Ubuntu-18.04/my_file_free.pdf&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Now, via terminal, you just need to type the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ qpdf --decrypt --password=my_fileExample C:/Home/Linux/Ubuntu-18.04/my_file.pdf C:/Home/Linux/Ubuntu-18.04/my_file_free.pdf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the job is done. So the scheme is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ qpdf --decrypt --password=&amp;lt;YOUR PASSWORD&amp;gt; file.PDF new_file.PDF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;I hope this article helps you remove a password from a PDF.&lt;br&gt;
Let me tell you that protecting your files with passwords is very important to protect your data, but in special cases, this procedure may cause you a lot of headaches.&lt;/p&gt;




&lt;p&gt;The article "How To Easily Remove a Password From a PDF file" has been primarily created form my blog &lt;a href="https://federicotrotta.com/how-to-easily-remove-a-password-from-a-pdf-file/"&gt;here&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;Hi, my name is Federico and I am a freelance Technical Writer:&lt;/p&gt;

&lt;p&gt;Do you want to start a documentation project, collaborating with me? &lt;a href="https://federicotrotta.com/"&gt;Contact me&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Do you want to know more about my work? You can start with my&lt;a href="https://federicotrotta.com/case-studies/"&gt; case studies&lt;/a&gt; and my &lt;a href="https://federico-trotta.github.io/"&gt;portfolio&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>pdf</category>
      <category>tutorial</category>
      <category>beginners</category>
    </item>
    <item>
      <title>How To Create a Repository in GitHub</title>
      <dc:creator>Federico Trotta</dc:creator>
      <pubDate>Wed, 24 Apr 2024 13:03:49 +0000</pubDate>
      <link>https://forem.com/federicotrotta/how-to-create-a-repository-in-github-kd6</link>
      <guid>https://forem.com/federicotrotta/how-to-create-a-repository-in-github-kd6</guid>
      <description>&lt;p&gt;As I’ve worked for several years with documents, I understand the need to define the revision index of a document, when it needs to be changed. So, when I first read about version control (and Git and GitHub), I could immediately understood its importance, even if this is not properly the same thing as a document revision.&lt;/p&gt;

&lt;p&gt;One good thing to do when learning to program is to take confidence in version control, especially for two reasons:&lt;/p&gt;

&lt;p&gt;When a project needs to be revised, version control gives you the possibility to see all the precedent versions.&lt;/p&gt;

&lt;p&gt;A system version control, like GitHub for example, gives you the possibility to work locally on your project and store the versions even online, so that you can show your projects to the world, share knowledge, etc…&lt;/p&gt;

&lt;p&gt;So, let’s see how to create a repository on GitHub.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create a GitHub account and install Git on your PC
&lt;/h2&gt;

&lt;p&gt;The first step is to create a GitHub account. You can do it &lt;a href="https://github.com/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After that, you have to install Git on your PC. You can download it &lt;a href="https://git-scm.com/downloads"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now, open the terminal and set up your email by typing the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git config --global user.email YOUR_EMAIL
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Of course, &lt;code&gt;YOUR_EMAIL&lt;/code&gt; is your complete email address. Make sure you use the same email address for setting up Git and for signing up on GitHub.&lt;/p&gt;

&lt;p&gt;Now, set up your Git name by typing this code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git config --global user.name YOUR_NAME
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where &lt;code&gt;YOUR_NAME&lt;/code&gt; can be your complete name or a nickname: you decide.&lt;/p&gt;

&lt;p&gt;Git is now set up and configured. The next step is the creation of an SSH key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create an SSH key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An SSH key is an item to protect you when accessing a remote PC that guarantees you protection from cyber attacks.&lt;/p&gt;

&lt;p&gt;So, let’s see how to set an SSH key.&lt;/p&gt;

&lt;p&gt;Open another terminal and type this code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh-keygen -t ed25519 -C "YOUR_EMAIL"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above code use the email you’ve used in the previous steps, and use the &lt;code&gt;“”&lt;/code&gt; properly as typed.&lt;/p&gt;

&lt;p&gt;Now, you will see a lot of code after you have typed this command, as this command generates a private key and a public key.&lt;/p&gt;

&lt;p&gt;Before going on, to check that everything works fine type this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eval `ssh-agent`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you get something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent pid 125746
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;it means that everything works fine and we can go on.&lt;/p&gt;

&lt;p&gt;Now, you can get the public SSH key. If you see the code generated after you typed the &lt;code&gt;ssh-keygen -t ed25519 -C "YOUR_EMAIL"&lt;/code&gt; command before you can easily find a line which is something like:&lt;/p&gt;

&lt;p&gt;Your public key has been saved in &lt;code&gt;/home/YOUR_PC/.ssh/id_ed25519.pub&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Which defines the folder where your keys are stored. Anyway, as said before, we need just the public key. To get it, we can type in the terminal (this is for Linux users. If you are not a Linux user, you can navigate to the file and open it):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat ~/.ssh/id_ed25519.pub
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and we get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh-ed25519 NUMBERS_AND_LETTERS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, copy this result: we’ll paste it on GitHub.&lt;/p&gt;

&lt;p&gt;Go to GitHub and log in to your account. Go to &lt;strong&gt;settings&lt;/strong&gt; &amp;gt; &lt;strong&gt;SSH and GPG key&lt;/strong&gt; and click on New SSH key. Here's  what you'll see:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxdqkz3lqvx8f3du0wa1u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxdqkz3lqvx8f3du0wa1u.png" alt="Add SSH KEY in GitHub by Federico Trotta" width="700" height="245"&gt;&lt;/a&gt;&lt;br&gt;
(Adding SSH key on GitHub. Image by the Federico Trotta.)&lt;/p&gt;

&lt;p&gt;Give it a title and copy the key (&lt;code&gt;ssh-ed25519 NUMBERS_AND_LETTERS&lt;/code&gt;) under the key tab.&lt;/p&gt;

&lt;p&gt;Now, we can create our first local repository and connect it to a remote repository on GitHub.&lt;/p&gt;
&lt;h2&gt;
  
  
  Your first GitHub repository
&lt;/h2&gt;

&lt;p&gt;Now, let’s create our remote repository on GitHub. Let’s go on New repository and let’s give it a name. Let’s say we call it example and let’s set it to be public (so it can be seen by anyone). This is what you see:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgvvm18lieszk0z5mjqln.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgvvm18lieszk0z5mjqln.png" alt="Create new repository in GitHub by Federico Trotta" width="700" height="305"&gt;&lt;/a&gt;&lt;br&gt;
(Your first repository. Image by the Federico Trotta.)&lt;/p&gt;

&lt;p&gt;The only thing you have to do now is copy the last three lines of code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git remote add origin git@github.com:t-YOURNAME/example.git
git branch -M main
git push -u origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get those lines copied: we will paste them into the terminal in a moment.&lt;/p&gt;

&lt;p&gt;Now, on your PC create a local folder where the files will be. We’ll call the folder example so that the local and remote repositories have the same name. &lt;/p&gt;

&lt;p&gt;Now, we are going to use Git.&lt;/p&gt;

&lt;p&gt;On the terminal type the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will make your local folder a repository. Then, you can see its status by typing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And you will see the status of the repository. Since it is a new one, the terminal will tell you that there is no commit. Moreover, the terminal will tell you all the files that are in the folder.&lt;/p&gt;

&lt;p&gt;Let’s say you have a file &lt;code&gt;my_file.p&lt;/code&gt;y which is the file you want to stay in the repository (the local one, and you want it to be in the remote one in GitHub). You can choose the files to commit or to commit all. In this case, we just have one so we type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git add my_file.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(In case you want to commit all the files, type: &lt;code&gt;git add&lt;/code&gt;.)&lt;/p&gt;

&lt;p&gt;Now, commit it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git commit -m 'initial version'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means that the name of the revision is initial version, but you can call it as you want.&lt;/p&gt;

&lt;p&gt;Finally, paste the lines copied on GitHub. Let’s catch them again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git remote add origin git@github.com:t-YOURNAME/example.git
git branch -M main
git push -u origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we are done!&lt;/p&gt;

&lt;p&gt;If you now see in your GitHub repository (called example) you will see the my_example.py file. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;In this article, we've shown how to create a new repository on GitHub.&lt;/p&gt;

&lt;p&gt;I hope you find it useful!&lt;/p&gt;

&lt;p&gt;Hi, my name is Federico and I am a freelance Technical Writer:&lt;/p&gt;

&lt;p&gt;Do you want to start a documentation project, collaborating with me? Contact me!&lt;/p&gt;

&lt;p&gt;Do you want to know more about my work? You can start with my case studies and my portfolio.&lt;/p&gt;




&lt;p&gt;The article "How To Create a Repository in GitHub" was first created for my blog &lt;a href="https://federicotrotta.com/how-to-create-a-repository-in-github/"&gt;here&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;Hi, I am Federico Trotta and I'm a freelance Technical Writer.&lt;br&gt;
Do you want to collaborate with me? &lt;a href="https://federicotrotta.com/"&gt;Hire me&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>tutorial</category>
      <category>git</category>
      <category>github</category>
    </item>
    <item>
      <title>What if ChatGPT Had Already Reached its Glory?</title>
      <dc:creator>Federico Trotta</dc:creator>
      <pubDate>Tue, 09 Apr 2024 07:31:14 +0000</pubDate>
      <link>https://forem.com/federicotrotta/what-if-chatgpt-had-already-reached-its-glory-f47</link>
      <guid>https://forem.com/federicotrotta/what-if-chatgpt-had-already-reached-its-glory-f47</guid>
      <description>&lt;p&gt;As ChatGPT "was born" nearly a year and a half from now, everyone out there is telling us that AI, especially LLMs, will steal our jobs. No more writers, no more developers, no more marketers, no more operators. The future seems to see AI as the king of the world.&lt;/p&gt;

&lt;p&gt;While there's no doubt that AI is here to stay and support us in our daily jobs, is not so easy to say that it will replace jobs (and what jobs, particularly).&lt;/p&gt;

&lt;p&gt;Also, as a user of ChatGPT, I noticed a degradation in its performance since it came on the market. &lt;/p&gt;

&lt;p&gt;So, in this article, I'd like to raise some questions - and eventually, create discussions - about this topic:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;what if ChatGPT has already seen its glorified period? Will it really increase its capabilities in a short period of time or will it take years to make great improvements?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I'll do so by introducing known procedures like training models to guide you through necessary things to take into account to reason about the topic.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: In this article, I will talk about ChatGPT for simplicity as it's the most famous (and maybe used) LLM, but the considerations apply to other similar software.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  An introduction to training and evaluating an ML/DL model
&lt;/h2&gt;

&lt;p&gt;When training and evaluating a Machine Learning (ML) or a Deep Learning (DL) model, data scientists always do the same thing: they get the available dataset and split it into the train and the test set.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmdjnhrbvf5cvm8159g9l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmdjnhrbvf5cvm8159g9l.png" alt="Splitting a dataset by Federico Trotta." width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This operation is done to find the model that best fits the data. To do so, data scientists train different models on the train set and calculate some performance metrics. Then, they calculate the same performance metrics using the test set and find the best-performing model.&lt;/p&gt;

&lt;p&gt;So, the importance of this methodology is that ML and DL models have to be evaluated on new and unseen data to verify that they are generalizing well what they have learned in the train set.&lt;br&gt;
This is an important introduction to keep in mind for the subsequent part of this article.&lt;/p&gt;
&lt;h2&gt;
  
  
  The sets of combinations
&lt;/h2&gt;

&lt;p&gt;In mathematics - in particular, in linear algebra - we talk about the sets of combinations. &lt;/p&gt;

&lt;p&gt;The most famous one is the sets of linear combinations (also called "&lt;strong&gt;Span&lt;/strong&gt;"). To define it, we use &lt;a href="https://en.wikipedia.org/wiki/Linear_combination"&gt;Wikipedia&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In mathematics, a linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, for example, the linear combination of two variables &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; could be created as:&lt;/p&gt;

&lt;p&gt;

&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;z=ax+b
  z = ax+b
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;z&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;a&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;b&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
&lt;/p&gt;

&lt;p&gt;Where &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt; are two constant values (two numbers).&lt;/p&gt;

&lt;p&gt;Anyway, the sets of combinations can also be non-linear. This means that a variable &lt;code&gt;z&lt;/code&gt; can be created as a non-linear combination of &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt;. It could be quadratic, cubic, or it could have another mathematical form.&lt;/p&gt;

&lt;p&gt;Of course: this applies to variables as well as to all mathematical entities, like datasets.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqr6xu8im9r3tjvtqr1ne.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqr6xu8im9r3tjvtqr1ne.png" alt="Datasets combined by Federico Trotta." width="715" height="728"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How sets of combinations influence the performance of a model
&lt;/h2&gt;

&lt;p&gt;Now, considering what we've defined until now, a question may arise: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What happens if the test set is a combination of the train set?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this case, two major events may occur (both or one of the two):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://machinelearningmastery.com/data-leakage-machine-learning/"&gt;Data leakage&lt;/a&gt;&lt;/strong&gt;. "Data leakage is when information from outside the training dataset is used to create the model. This additional information can allow the model to learn or know something that it otherwise would not know and in turn, invalidate the estimated performance of the model being constructed."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overfitting&lt;/strong&gt;. This phenomenon occurs when the model has learned the specific patterns in the training data. This results in a low performance on the unseen data in the test set.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When these two phenomena occur, the model generalizes poorly. This means that the model may perform well on the test set because it reflects the combinations present in the training set, but its ability to perform on entirely new data is still unknown.&lt;/p&gt;

&lt;p&gt;In other words, the model has an evaluation bias. This means that the evaluation metrics calculated on the test set can not reflect the model's actual performance (thus, are biased). This could lead to misguided confidence in the model's abilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Possible sets of combinations in training LLMs
&lt;/h2&gt;

&lt;p&gt;Now, our interest here is in LLMs. As known, these models are trained on a vast amount of data. So, the more the data, the more the possibility of getting biased data.&lt;/p&gt;

&lt;p&gt;Also, we don't know the actual data used to train ChatGPT, but we know that a significant proportion of the training data came from the Internet.&lt;/p&gt;

&lt;p&gt;So, first of all, on the Internet (but this is a consideration that applies, in general, to books and other sources) a lot of websites describe the same topic. This may led to possible sets of combinations between the train and the test sets used.&lt;/p&gt;

&lt;p&gt;Also, ML and DL models often need retraining to evaluate if the model still applies to new data incoming.&lt;/p&gt;

&lt;p&gt;Imagine that ChatGPT was trained and evaluated in September 2022 for the first time (only using the Internet, for simplicity).&lt;/p&gt;

&lt;p&gt;Imagine that the first retraining and re-evaluation was made in March 2023 (still only using the Internet, for simplicity). Some questions that may arise are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How much new content has been generated on the Internet since the first release of ChatGPT in November 2022 and the first retraining?&lt;/li&gt;
&lt;li&gt;How much new content on the Internet has been AI-generated since the first release of ChatGPT in November 2022 and the first retraining?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These questions are interesting to understand if LLMs are really improving or not because - as stated at the beginning of this article - I've seen more of a degradation in the performance, rather than an improvement (but, sure: I may be biased).&lt;/p&gt;

&lt;p&gt;So, if the point of training (and re-training) is to evaluate a model on new unseen data, we can state that AI-generated content probably creates a subset of combination from data previously used to train the model, thus leading the model to data leakage, even though the training may be done with proper techniques to avoid it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Will LLMs need years to get improvements in performance with other training? 
&lt;/h2&gt;

&lt;p&gt;When trying to find the model that best fits the data, we know that data quality has a higher impact than using a "better model" or better-fine-tuned hyperparameters of the same model.&lt;/p&gt;

&lt;p&gt;So data quality is more important than model tuning. This particularly applies to LLMs that need a vast amount of data to be trained.&lt;/p&gt;

&lt;p&gt;So, another question may be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Given the fact that AI-generated text on the Internet may lead to creating new data that are a subset of combinations of the train data, how much time should pass before a great and new amount of content is generated so that the performance can increase?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words: will LLMs need years to make actual improvements, on the contrary to what we are daily reading on the news?&lt;/p&gt;

&lt;p&gt;Or, we should ask: what about ChatGPT can not actually be better than we know it today? As it already seen its glory days?&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;In this article, I wanted to make a reasoning about how ChatGPT is evolving on the side of performance.&lt;/p&gt;

&lt;p&gt;I believe that re-training it using the Internet may lead to data leakage because AI-generated content is not a small proportion today of the whole content existing on the Internet. Also, this may lead to data leakage, thus a degradation of the performance, because this content is a subset of combinations of content previously created.&lt;/p&gt;

&lt;p&gt;I'd like this article to create a genuine discussion on this topic, hoping to generate a positive and constructive one. Please: share your thoughts in the comments!&lt;/p&gt;




&lt;p&gt;Hi, I am Federico Trotta and I'm a freelance Technical Writer.&lt;br&gt;
Do you want to collaborate with me? &lt;a href="https://bio.link/federicotrotta"&gt;Hire me&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>community</category>
      <category>ai</category>
      <category>discuss</category>
    </item>
    <item>
      <title>How Python’s Argparse Can Be Useful in Data Science</title>
      <dc:creator>Federico Trotta</dc:creator>
      <pubDate>Mon, 01 Apr 2024 16:37:27 +0000</pubDate>
      <link>https://forem.com/federicotrotta/how-pythons-argparse-can-be-useful-in-data-science-511h</link>
      <guid>https://forem.com/federicotrotta/how-pythons-argparse-can-be-useful-in-data-science-511h</guid>
      <description>&lt;p&gt;When I first approached Python’s Argparse,  I had great difficulty understanding how it works because I had never programmed before.&lt;/p&gt;

&lt;p&gt;Also, I asked myself: “How can a command-line interface be useful in Data Science??”. Well, I’m showing it, with a practical example.&lt;/p&gt;

&lt;p&gt;But first, let’s explain what &lt;code&gt;Argparse&lt;/code&gt; is.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Python’s Argparse?
&lt;/h2&gt;

&lt;p&gt;Python’s Argparse is a library that gives you the possibility to pass arguments via the command-line interface. It is not the only module you can use (you can also use &lt;code&gt;sys.argv&lt;/code&gt;), but it is definitely the most complete.&lt;/p&gt;

&lt;p&gt;As we can see in its documentation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The &lt;a href="https://docs.python.org/3/library/argparse.html#module-argparse"&gt;argparse&lt;/a&gt; module makes it easy to write user-friendly command-line interfaces. […]. The argparse module also automatically generates help and usage messages and issues errors when users give the program invalid arguments.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How Python’s Argparse can be useful in your Data Science projects: a practical example
&lt;/h2&gt;

&lt;p&gt;Let’s say you have an empirical way to calculate a parameter and this empirical method needs to insert a value to achieve a “considered good result”. The problem is that you have to calculate the right value, iteratively. If you work in Jupyter Notebooks, you’ll need to find the exact line of code to modify the parameter, each time.&lt;/p&gt;

&lt;p&gt;For the purpose of this article, I’ve created a dataset with simulated data which reflects the reality of typical distributions, in real cases. Let’s say that our data are measured times in minutes; let’s import the data and see the data frame:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# Import data and show head
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_excel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;example.xlsx&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here’s the data frame:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flhfzal83plr4k61z6fib.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flhfzal83plr4k61z6fib.png" alt="A data frame by Federico Trotta" width="242" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The purpose of the exercise is to find the measured time that best fits the distribution&lt;/p&gt;

&lt;p&gt;Let’s say that those measurements are times related to athletes running a fixed distance; let’s say 1 km.&lt;/p&gt;

&lt;p&gt;We want to evaluate the athletes based on the time they need to run 1 km. But how can we fix a reasonable value of time to be achieved? One minute is a good time? Can the majority of the athletes run 1 km in one minute? When an athlete can be considered too slow and when too fast?&lt;/p&gt;

&lt;p&gt;The purpose of this study relies on that.&lt;/p&gt;

&lt;p&gt;As often happens in these cases, the mean value is typically far away from being a good value, because, often, the data are not normally distributed. So we need a different metric, but this metric can rely on the mean time.&lt;/p&gt;

&lt;p&gt;To find the metric, we have to empirically find a factor that, multiplied by the mean time, gives a value that is one of the most frequent values.&lt;/p&gt;

&lt;p&gt;Let’s show a plot for a better understanding:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwjsgfysfuh2kbu743bdp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwjsgfysfuh2kbu743bdp.png" alt="Frequencies to describe Python's Argparse by Federico Trotta" width="800" height="540"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, the mean time (4.4 min) is not a good value to use to evaluate the athletes because the majority of them run 1 km in 3 or 4 minutes. In similar cases, I found that a good value is “0.85*mean time”; but this ‘0.85’ factor is an empirical value and sometimes it can be more, sometimes less (depending on how skewed is the data distribution). So the goal of using Argparse is to modify just the multiplication factor to fit a good final result (a time on which evaluate your athletes on running 1 km).&lt;/p&gt;

&lt;p&gt;So, let’s see a bit of code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;

&lt;span class="c1"&gt;# Create parser
&lt;/span&gt;&lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ArgumentParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Specify the arguments that has to be insert
&lt;/span&gt;&lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;multiple&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;moltiplication factor (0.85 is typical)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Parse and control the arguments
&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse_args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Define factor of percentage
&lt;/span&gt;&lt;span class="n"&gt;fac&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;multiple&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the above code, I’ve created the parser, specified the arguments to parse (in this case, the argument is just one: the factor of percentage), and in the end, after controlling the arguments, I’ve defined the factor of percentage as controlled by Argparse (&lt;code&gt;fac = args.multiple&lt;/code&gt;). The work is done, and in the end, we can calculate the mean time and the adjusted time (as the mean time multiplied by the factor of percentage):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Calculate mean values
&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;measures [min]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#mean
&lt;/span&gt;
&lt;span class="c1"&gt;# Define adjusted value
&lt;/span&gt;&lt;span class="n"&gt;adj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;fac&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now plot the graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.patches&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;mpatches&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# Define figure size in inches and font scale
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rcParams&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;figure.figsize&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;font_scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Plotting the frequences
&lt;/span&gt;&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;histplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;measures [min]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;binwidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;red&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

&lt;span class="c1"&gt;# Addin the time mean and the theoretical time mean
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;axvline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;adj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;blue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Vertical line to "adjusted" value 
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;axvline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;green&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Vertical line to "mean" value
&lt;/span&gt;
&lt;span class="c1"&gt;#Create labels
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FREQUENCES OF THE MEASURED VALUES&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VALUES&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FREQUENCES&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define mean and adjusted legend
&lt;/span&gt;&lt;span class="n"&gt;blu_line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mpatches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;blue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adjusted value: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;adj&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;green_line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mpatches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;green&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mean value: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;blu_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;green_line&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;prop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn10d4lhtraytydkavyma.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn10d4lhtraytydkavyma.png" alt="Frequencies to describe Python's Argparse by Federico Trotta" width="800" height="558"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, the adjusted time (3.7 min) can be a good value to evaluate the athletes, instead of the mean time (4.4 min) since it is near the mean of the most bar related to the most frequent times measured. And how can we use Argparse to arrive here?&lt;/p&gt;

&lt;p&gt;First of all, save your Jupyter Notebook with &lt;code&gt;.py&lt;/code&gt; extension. Let’s call it &lt;code&gt;exercise.py&lt;/code&gt; and save it in a directory. Open the file with the terminal and type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 exercise.py --h
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This shows the help:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffsnlot2kh6jzyv3yf31c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffsnlot2kh6jzyv3yf31c.png" alt="Python's Argparse help by Federico Trotta" width="800" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, if you want to play with the multiplication factors and if you want to try starting from “0.85” you just need to write this in the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 exercise.py 0.85
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the above code, Python will display the image of the plot and if “0.85” is not a good fit, you can change it very easily and in a very fast way, without the need to search in all your Notebook the exact line of code to modify!&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;This article shows how Python’s Argparse can be useful even in Data Science projects.&lt;/p&gt;

&lt;p&gt;Sometimes, in fact, when analyzing data you may need to adjust some parameters: in that case, Python’s Argparse can be the best library you can choose.&lt;/p&gt;




&lt;p&gt;The article "&lt;a href="https://federicotrotta.com/how-pythons-argparse-can-be-useful-in-data-science/"&gt;How Python’s Argparse Can Be Useful in Data Science&lt;/a&gt;" was originally created for my blog.&lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>datascience</category>
    </item>
    <item>
      <title>How to calculate RGB values in Python</title>
      <dc:creator>Federico Trotta</dc:creator>
      <pubDate>Mon, 01 Apr 2024 15:41:00 +0000</pubDate>
      <link>https://forem.com/federicotrotta/how-to-calculate-rgb-values-in-python-3ph5</link>
      <guid>https://forem.com/federicotrotta/how-to-calculate-rgb-values-in-python-3ph5</guid>
      <description>&lt;p&gt;When managing images, a good exercise is to calculate RGB values in Python.&lt;/p&gt;

&lt;p&gt;If you’re asking yourself: “What does RGB mean?”; don’t worry: this was the first question I’ve asked myself before coding for this exercise.&lt;/p&gt;

&lt;p&gt;So, before the code, let’s talk about RGB.&lt;/p&gt;

&lt;h2&gt;
  
  
  Black and white, RGB, Alpha level: basic information about images
&lt;/h2&gt;

&lt;p&gt;RGB stands for “Red, Blue, Green” and it's a model of “additive colors”: their sum results in the white color. In particular, it's a model used in electronic devices because it is helpful to visualize the pixels of an image.&lt;/p&gt;

&lt;p&gt;This means, that when analyzing a colored image, Python — in some ways — gives us three numbers: one for Red, one for Green, and the other for Blue.&lt;/p&gt;

&lt;p&gt;Of course, this means that, from a black-and-white image, we can calculate just one value.&lt;/p&gt;

&lt;p&gt;On the contrary, the alpha level is transparency, and this means that we can calculate four values (one for R, one for G, one for B, and one for the alpha level).&lt;/p&gt;

&lt;h2&gt;
  
  
  RGB values in Python: a preliminary study
&lt;/h2&gt;

&lt;p&gt;All right, let’s use some code here!&lt;/p&gt;

&lt;p&gt;Let’s say we have a folder called images.&lt;/p&gt;

&lt;p&gt;In this folder, we have three images:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One in black and white, named &lt;code&gt;bw.png&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;One colored, named &lt;code&gt;daffodil.jpg&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;One colored with the alpha level, called &lt;code&gt;eclipse.png&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We want to calculate the RGB values in Python for each one of these images. &lt;/p&gt;

&lt;p&gt;To do so, we can use the library PIL which can load images, and NumPy to transform the images in NumPy’s arrays.&lt;/p&gt;

&lt;p&gt;So, let’s import the libraries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tabulate&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tabulate&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, before coding, we have to understand some topics.&lt;/p&gt;

&lt;p&gt;We’ll use NumPy’s arrays and, first of all, let's remember that the shape of an array can be defined as the number of elements in each dimension. Moreover, the ndim function returns the number of dimensions of an array.&lt;/p&gt;

&lt;p&gt;So, let’s calculate the shape and the ndim for each array.&lt;/p&gt;

&lt;p&gt;For the black and white image (&lt;code&gt;bw.png&lt;/code&gt;) we have:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst_img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bw.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndim&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(512, 512)
2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, this image has a height and a width equal to 512 px. It also has 2 dimensions, which is in accord with the fact that is in black and white (it has just two dimensions).&lt;/p&gt;

&lt;p&gt;For the RGB image (&lt;code&gt;daffodil.jpg&lt;/code&gt;) we have:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst_img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;daffodil.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndim&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(500, 335, 3)
3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, this image has a height equal to 500 px and a width equal to 335 px. It also has 3 dimensions, which is in accord with the fact that is an RGB image.&lt;/p&gt;

&lt;p&gt;In the end, for the RGB+alpha (&lt;code&gt;eclipse.png&lt;/code&gt;) image we have:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst_img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eclipse.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndim&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(256, 256, 4)
3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;this image has a height and a width equal to 256 px. It also has 3 dimensions, which is in accord with the fact that is an RGB image, but it has 4 channels!&lt;/p&gt;

&lt;p&gt;Using NumPy’s mean function, we can calculate the mean value for each color channel. Now, we can create a loop to calculate our values.&lt;/p&gt;

&lt;p&gt;Let’s see the whole code and then I’ll explain some details.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to calculate RGB values in Python: an exercise
&lt;/h2&gt;

&lt;p&gt;This is the code I’ve used to derive the information we’ve seen before from three images. Of course, this is just one way to do it!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# List files in images folder
&lt;/span&gt;&lt;span class="n"&gt;dst_img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 

&lt;span class="c1"&gt;# Iterate over dst_image to get the images as arrays
&lt;/span&gt;&lt;span class="n"&gt;list_img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst_img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;list_img&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ext&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;splitext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Split file name from extension
&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst_img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="c1"&gt;# Create arrays for all the images
&lt;/span&gt;
&lt;span class="c1"&gt;# Calculate height and width for each image
&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Calculate the dimension for each array
&lt;/span&gt;&lt;span class="n"&gt;arr_dim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndim&lt;/span&gt; 

&lt;span class="c1"&gt;# Calculate the shape for each array
&lt;/span&gt;&lt;span class="n"&gt;arr_shape&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt; 
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;arr_dim&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;arr_mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, greyscale=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;arr_mean&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="n"&gt;arr_mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr_mean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;#RGB CASE
&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, R=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;arr_mean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, G=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;arr_mean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, B=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;arr_mean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;#ALPHA CASE
&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, R=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;arr_mean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, G=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;arr_mean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, B=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;arr_mean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, ALPHA=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;arr_mean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[bw, greyscale=21.5]
[daffodil, R=109.3, G=85.6, B=5.0 ]
[eclipse, R=109.0, G=109.5, B=39.8, ALPHA=133.6]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Some code explanations
&lt;/h3&gt;

&lt;p&gt;The importance of coding is to try to generalize so that we can use the code again in the future if needed (with the due changes, of course).&lt;/p&gt;

&lt;p&gt;In this case, I’ve decided to differentiate the images by studying the shape and the ndim values derived from NumPy.&lt;/p&gt;

&lt;p&gt;In the beginning, I wanted to derive the general information from all the images, and this is why I’ve imported them all using Numpy and the library PIL; I could, then, calculate immediately the file name and the dimensions for each image, because this information can be calculated for each image, preliminary.&lt;/p&gt;

&lt;p&gt;Then, I wanted to study the black-and-white image using the if &lt;code&gt;arr_dim == 2&lt;/code&gt; statement: the black-and-white image, as I said before, has just two dimensions.&lt;br&gt;
Then I wanted to study the RGB and the RGB with the ALPHA channel images.&lt;/p&gt;

&lt;p&gt;So, before I wanted to generalize the calculation of the mean values using the &lt;code&gt;arr_mean = np.mean(arr, axis=(0,1))&lt;/code&gt; code; in this case, I had to use &lt;code&gt;axis=(0,1)&lt;/code&gt; because those images have 3 dimensions, and the calculation has to be done along the &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; axis in NumPy.&lt;/p&gt;

&lt;p&gt;Then, I’ve differentiated the RGB from the RGB+ALPHA image with &lt;code&gt;len(arr_mean)&lt;/code&gt;; since the RGB image has 3 channels, &lt;code&gt;len(arr_mean)&lt;/code&gt; has to be equal to 3; instead, since the RGB+ALPHA has 4 channels, &lt;code&gt;len(arr_mean)&lt;/code&gt; has to be equal to 4; hence, the underlined code before.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;This article describes what RGB values are and how to calculate RGB values in Python.&lt;/p&gt;

&lt;p&gt;The best thing you can do now is to try it with your images.&lt;/p&gt;




&lt;p&gt;The post "&lt;a href="https://federicotrotta.com/how-to-calculate-rgb-values-in-python/"&gt;How to calculate RGB values in Python&lt;/a&gt;" was originally created for my blog.&lt;/p&gt;

</description>
      <category>python</category>
      <category>images</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
