<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: msc2020</title>
    <description>The latest articles on Forem by msc2020 (@msc2020).</description>
    <link>https://forem.com/msc2020</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1318764%2F6e2e6836-d05c-440e-9d42-36e9092f917a.png</url>
      <title>Forem: msc2020</title>
      <link>https://forem.com/msc2020</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/msc2020"/>
    <language>en</language>
    <item>
      <title>How to export a CSV with my data posts in DEV using its API</title>
      <dc:creator>msc2020</dc:creator>
      <pubDate>Tue, 21 May 2024 21:44:49 +0000</pubDate>
      <link>https://forem.com/msc2020/how-to-export-a-csv-with-my-data-posts-in-dev-using-its-api-382f</link>
      <guid>https://forem.com/msc2020/how-to-export-a-csv-with-my-data-posts-in-dev-using-its-api-382f</guid>
      <description>&lt;p&gt;In this post we show a quick step-by-step guide to collecting data from "my" publications on DEV (&lt;a href="//dev.to"&gt;dev.to&lt;/a&gt;) using its beta API. We use Python 3.9+ libraries (Requests, Json and Pandas) to make requests to DEV API endpoints, then pass them to the DataFrame format and then export as a &lt;code&gt;.CSV&lt;/code&gt; file. This CSV will contain data from the posts that have been published by the user &lt;u&gt;msc2020&lt;/u&gt; so far. It is possible for you to collect your data too.&lt;/p&gt;




&lt;p&gt;&lt;a id="contents"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Contents ☕
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;DEV API, versions v0 and v1&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Some API endpoints&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Getting data from the API&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Exporting collected data to CSV&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tests using another username&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Conclusion&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a id="dev_api"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  DEV API, versions v0 and v1 [^]
&lt;/h2&gt;

&lt;p&gt;DEV needs no introduction, but it's worth mentioning that it's built on Forem, an "&lt;a href="https://github.com/forem/forem" rel="noopener noreferrer"&gt;open source tool for building communities&lt;/a&gt;" 👊🏼. When visiting the &lt;a href="https://forem.dev/" rel="noopener noreferrer"&gt;Forem community&lt;/a&gt; homepage we noticed the many similarities between the two.&lt;/p&gt;

&lt;p&gt;DEV currently has an API (beta version 0.9.7) with documentation at &lt;a href="https://developers.forem.com/api" rel="noopener noreferrer"&gt;https://developers.forem.com/api&lt;/a&gt;. There are some differences between the two available versions. The main one is that some &lt;strong&gt;v0&lt;/strong&gt; endpoints can be accessed without an access token (&lt;code&gt;API_TOKEN&lt;/code&gt;). &lt;strong&gt;v1&lt;/strong&gt; uses tokens on all its endpoints. According to the documentation, endpoints that do not require token authentication use &lt;a href="https://en.wikipedia.org/wiki/Cross-origin_resource_sharing" rel="noopener noreferrer"&gt;CORS&lt;/a&gt; (Cross-origin resource sharing) to control access.&lt;/p&gt;

&lt;p&gt;&lt;a id="endpoints"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Some API endpoints [^]
&lt;/h2&gt;

&lt;p&gt;The table below shows some DEV API endpoints accompanied by information that may be useful.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;small&gt;API version&lt;/small&gt;&lt;/th&gt;
&lt;th&gt;&lt;small&gt;Endpoint&lt;/small&gt;&lt;/th&gt;
&lt;th&gt;&lt;small&gt;HTTP method&lt;/small&gt;&lt;/th&gt;
&lt;th&gt;Use &lt;small&gt;API_KEY&lt;/small&gt;
&lt;/th&gt;
&lt;th&gt;&lt;small&gt;Describe&lt;/small&gt;&lt;/th&gt;
&lt;th&gt;&lt;small&gt;Example&lt;/small&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;small&gt;v0&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;/articles&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;GET&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;No&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;Returns all posts (articles, questions, announcements, etc.) published, 30 per page&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;curl https://dev.to/api/articles&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;small&gt;v0&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;/articles&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;POST&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;Yes&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;Create an article&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt; &lt;code&gt;curl -X POST -H "Content-Type: application/json" -H "api-key: API_KEY" -d '{"article": "title":"Title","body_markdown":"Body","published":false,"tags":["discuss", "javascript"]}}' https://dev.to/api/articles&lt;/code&gt;&lt;/small&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;small&gt;v0&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;&lt;small&gt;/comments&lt;/small&gt;&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;GET&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;No&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;Returns all comments from an article or comments from a podcast, 30 per page&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;&lt;code&gt;curl https://dev.to/api/comments?a_id=270180&lt;/code&gt;&lt;/small&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Lists of API endpoints in version v0 and v1 can be found, respectively, at: &lt;a href="https://developers.forem.com/api/v0" rel="noopener noreferrer"&gt;https://developers.forem.com/api/v0&lt;/a&gt; and &lt;a href="https://developers.forem.com/api/v1" rel="noopener noreferrer"&gt;https://developers.forem.com/api/v1&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🙈 Attention:&lt;/strong&gt; Although some v0 version &lt;em&gt;endpoints&lt;/em&gt; can be used without &lt;code&gt;API_TOKEN&lt;/code&gt;, on the API website it is recommended that all of them use this authentication.&lt;/p&gt;

&lt;p&gt;&lt;a id="request"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting data from the API [^]
&lt;/h2&gt;

&lt;p&gt;The code below captures, via the DEV API, data relating to my (&lt;code&gt;username = msc2020&lt;/code&gt;) posts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt; &lt;span class="c1"&gt;# install with: pip install requests
&lt;/span&gt;
&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://dev.to/api/articles&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;querystring&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;default_headers&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;querystring&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
output:

[{
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type_of&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;article&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:1850779,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Raspagem de dados de um site de notícias em pt-BR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: ...
&lt;/span&gt;&lt;span class="gp"&gt;  ...&lt;/span&gt;
&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="sh"&gt;'''&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output returned from the &lt;code&gt;GET&lt;/code&gt; call above is an object (&lt;code&gt;response&lt;/code&gt;) from the &lt;code&gt;Requests&lt;/code&gt; library. To convert/&lt;em&gt;parse&lt;/em&gt; the contents of &lt;code&gt;response.text&lt;/code&gt; (type &lt;code&gt;str&lt;/code&gt;) into a list of dictionaries (type &lt;code&gt;dict&lt;/code&gt;) we use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt; &lt;span class="c1"&gt;# python standard library
&lt;/span&gt;
&lt;span class="n"&gt;res_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In order to facilitate analysis of data collected with Python libraries, we will be converting this &lt;code&gt;JSON&lt;/code&gt; into a &lt;code&gt;CSV&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🗒️ Note:&lt;/strong&gt; In the &lt;em&gt;script&lt;/em&gt; above we passed the &lt;code&gt;username&lt;/code&gt; parameter in the &lt;code&gt;GET&lt;/code&gt; call. To see other parameters available in &lt;em&gt;endpoint&lt;/em&gt; &lt;code&gt;/articles&lt;/code&gt; visit this &lt;a href="https://developers.forem.com/api/v0#tag/articles/operation/getArticles" rel="noopener noreferrer"&gt;link&lt;/a&gt; of the API documentation.&lt;/p&gt;

&lt;p&gt;&lt;a id="csv"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Exporting collected data to CSV [^]
&lt;/h2&gt;

&lt;p&gt;After collecting the &lt;code&gt;JSON&lt;/code&gt; data via API, we use Pandas' &lt;code&gt;to_csv&lt;/code&gt; to export the data to &lt;code&gt;CSV&lt;/code&gt; format.&lt;/p&gt;

&lt;p&gt;Including this step, we obtain the complete &lt;code&gt;export_posts.py&lt;/code&gt; code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# export_posts.py
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt; &lt;span class="c1"&gt;# pip install requests
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt; &lt;span class="c1"&gt;# pip install pandas
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt; &lt;span class="c1"&gt;# standard library
&lt;/span&gt;
&lt;span class="c1"&gt;# define username
&lt;/span&gt;&lt;span class="n"&gt;USER_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# run the request
&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://dev.to/api/articles&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;querystring&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;USER_NAME&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;default_headers&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;querystring&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# print(response.text)
&lt;/span&gt;
&lt;span class="c1"&gt;# converts request response into a list of dict
&lt;/span&gt;&lt;span class="n"&gt;res_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# convert JSON to Pandas DataFrame
&lt;/span&gt;&lt;span class="n"&gt;df_posts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res_json&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# export post data to CSV
&lt;/span&gt;&lt;span class="n"&gt;df_posts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dataset_articles_published_msc2020.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# displays the first 3 rows of the dataset
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_posts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;output:
. 1) content of the first three lines:
&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;
&lt;span class="n"&gt;type_of&lt;/span&gt;       &lt;span class="nb"&gt;id&lt;/span&gt;  &lt;span class="p"&gt;...&lt;/span&gt;                                               &lt;span class="n"&gt;tags&lt;/span&gt;                                               &lt;span class="n"&gt;user&lt;/span&gt;
&lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="n"&gt;article&lt;/span&gt;  &lt;span class="mi"&gt;1850779&lt;/span&gt;  &lt;span class="p"&gt;...&lt;/span&gt;         &lt;span class="n"&gt;tutorial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;braziliandevs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beginners&lt;/span&gt;  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tw...
1  article  1842575  ...  deeplearning, machinelearning, python, brazili...  {&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;tw&lt;/span&gt;&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;  &lt;span class="n"&gt;article&lt;/span&gt;  &lt;span class="mi"&gt;1835701&lt;/span&gt;  &lt;span class="p"&gt;...&lt;/span&gt;                    &lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tutorial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;braziliandevs&lt;/span&gt;  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tw...

[3 rows x 25 columns]

. 2) a CSV in local directory: `dataset_articles_published_msc2020.csv`
&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvtgwromp6897swlcdzme.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvtgwromp6897swlcdzme.png" alt="pandas head msc2020" width="800" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;small&gt;Print of df_posts.head(3) output in Jupyter notebook&lt;/small&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;&lt;a id="tests"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tests using another username [^]
&lt;/h2&gt;

&lt;p&gt;Currently, it is also possible to obtain data about other users' posts using the &lt;em&gt;endpoint&lt;/em&gt; &lt;code&gt;articles&lt;/code&gt; of the DEV API. For example, now using &lt;code&gt;USER_NAME = 'anuragrana'&lt;/code&gt; and changing the output name to &lt;code&gt;dataset_articles_published_user.csv&lt;/code&gt; in the full code &lt;code&gt;export_posts.py&lt;/code&gt; the return is expected to be the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="s"&gt;type_of       id  ...                                               user flare_tag&lt;/span&gt;
&lt;span class="s"&gt;0  article  1855307  ...  {'name'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Anurag&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Rana'&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anuragran...&lt;/span&gt;&lt;span class="nv"&gt;       &lt;/span&gt;&lt;span class="s"&gt;NaN&lt;/span&gt;
&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;article&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;1276096&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;{'name'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Anurag&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Rana'&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anuragran...&lt;/span&gt;&lt;span class="nv"&gt;       &lt;/span&gt;&lt;span class="s"&gt;NaN&lt;/span&gt;
&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;article&lt;/span&gt;&lt;span class="nv"&gt;   &lt;/span&gt;&lt;span class="s"&gt;262178&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;{'name'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Anurag&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Rana'&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anuragran...&lt;/span&gt;&lt;span class="nv"&gt;       &lt;/span&gt;&lt;span class="s"&gt;NaN&lt;/span&gt;

&lt;span class="s"&gt;[3&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;rows&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;26&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;columns]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyiss4vbqsarks4awes4u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyiss4vbqsarks4awes4u.png" alt="pandas head" width="800" height="345"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;small&gt;Print the output of df_posts.head(3) from the export_posts.py code in Jupyter notebook&lt;/small&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;&lt;a id="conclusao"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion [^]
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;CSV&lt;/code&gt; obtained in this post can help with data analysis with Python libraries. With few adaptations to the created code, we can obtain data from other &lt;em&gt;endpoints&lt;/em&gt; of the DEV API. There are many possibilities for using the collected data.&lt;/p&gt;

&lt;center&gt;☕ 🧘‍♂️ 💻 ☯️ 🪬&lt;/center&gt;

</description>
      <category>webdev</category>
      <category>beginners</category>
      <category>api</category>
      <category>python</category>
    </item>
    <item>
      <title>Como exportar um CSV com dados dos meus posts no DEV usando sua API</title>
      <dc:creator>msc2020</dc:creator>
      <pubDate>Sun, 19 May 2024 11:39:29 +0000</pubDate>
      <link>https://forem.com/msc2020/como-exportar-um-csv-com-dados-dos-meus-posts-no-devto-usando-sua-api-2ckm</link>
      <guid>https://forem.com/msc2020/como-exportar-um-csv-com-dados-dos-meus-posts-no-devto-usando-sua-api-2ckm</guid>
      <description>&lt;p&gt;Neste post mostramos um passo a passo rápido para coletar dados de "minhas" publicações no DEV (&lt;a href="//dev.to"&gt;dev.to&lt;/a&gt;) utilizando sua API versão beta. Usamos bibliotecas do Python 3.9+ (Requests, Json e Pandas) para realizar requisições a endpoints da API DEV, então passar para o formato &lt;code&gt;DataFrame&lt;/code&gt; e, em seguida, exportar como um arquivo &lt;code&gt;.CSV&lt;/code&gt;. Esse &lt;code&gt;CSV&lt;/code&gt; terá os dados dos posts que foram publicados pelo usuário &lt;code&gt;msc2020&lt;/code&gt; até o momento.&lt;/p&gt;




&lt;p&gt;&lt;a id="conteudo"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conteúdo ☕
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;API DEV, versões v0 e v1 &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Alguns endpoints da API &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Obtendo dados da API via Python&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Exportando os dados para CSV&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Testes usando outro nome de usuário&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Possibilidades de uso&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Conclusão&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;🎶 &lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=vu8nNJVX0hs" rel="noopener noreferrer"&gt;Wayne Shorter (Featuring Milton Nascimento) - Native Dancer (1975) full album&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;a id="versoes_api"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  API DEV, versões v0 e v1 [^]
&lt;/h2&gt;

&lt;p&gt;O DEV dispensa apresentações, mas vale mencionar que ele é feito sobre o Forem, uma "&lt;a href="https://github.com/forem/forem" rel="noopener noreferrer"&gt;ferramenta &lt;em&gt;open source&lt;/em&gt; para construir comunidades&lt;/a&gt;" 👊🏼. Ao visitar a &lt;a href="https://forem.dev/" rel="noopener noreferrer"&gt;homepage&lt;/a&gt; da comunidade Forem notamos as várias semelhanças entre ambas.&lt;/p&gt;

&lt;p&gt;Atualmente, o DEV possui uma API (versão beta 0.9.7) com documentação em &lt;a href="https://developers.forem.com/api" rel="noopener noreferrer"&gt;https://developers.forem.com/api&lt;/a&gt;. Há algumas diferenças entre as duas versões disponíveis. A principal delas é que alguns endpoints da versão &lt;strong&gt;v0&lt;/strong&gt; podem ser acessados sem token de acesso (&lt;code&gt;API_TOKEN&lt;/code&gt;). Já a &lt;strong&gt;v1&lt;/strong&gt; usa token em todos seus endpoints. Segundo a documentação, os endpoints que não exigem autenticação com o token usam o &lt;a href="https://en.wikipedia.org/wiki/Cross-origin_resource_sharing" rel="noopener noreferrer"&gt;CORS&lt;/a&gt; (&lt;em&gt;Cross-origin resource sharing&lt;/em&gt;) para controlar o acesso.&lt;/p&gt;

&lt;p&gt;&lt;a id="endpoints"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Alguns endpoints da API [^]
&lt;/h2&gt;

&lt;p&gt;A tabela abaixo mostra alguns endpoints da API (&lt;a href="https://dev.to/api/"&gt;https://dev.to/api/&lt;/a&gt;) acompanhados de informações que podem ser úteis.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;small&gt;Versão&lt;/small&gt;&lt;/th&gt;
&lt;th&gt;&lt;small&gt;Endpoint&lt;/small&gt;&lt;/th&gt;
&lt;th&gt;&lt;small&gt;Método&lt;/small&gt;&lt;/th&gt;
&lt;th&gt;&lt;small&gt;API_KEY&lt;/small&gt;&lt;/th&gt;
&lt;th&gt;&lt;small&gt;Descrição&lt;/small&gt;&lt;/th&gt;
&lt;th&gt;&lt;small&gt;Exemplo&lt;/small&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;small&gt;v0&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;/articles&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;GET&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;Não&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;Retorna todos os posts (artigos, dúvidas, divulgações, etc) publicados, sendo 30 por página &lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;&lt;code&gt;curl https://dev.to/api/articles&lt;/code&gt; &lt;small&gt;&lt;/small&gt;&lt;/small&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;small&gt;v0&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;/articles&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;POST&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;Sim&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;Cria um artigo&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;&lt;code&gt;curl -X POST -H "Content-Type: application/json" -H "api-key: API_KEY" -d '{"article": "title":"Title","body_markdown":"Body","published":false,"tags":["discuss", "javascript"]}}' https://dev.to/api/articles&lt;/code&gt; &lt;/small&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;small&gt;v0&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;&lt;small&gt;/comments&lt;/small&gt;&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;GET&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;Não&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt;Retorna todos os comentários de um artigo ou comentários de um podcast, sendo 30 por página&lt;/small&gt;&lt;/td&gt;
&lt;td&gt;&lt;small&gt; &lt;code&gt;curl https://dev.to/api/comments?a_id=270180&lt;/code&gt; &lt;small&gt;&lt;/small&gt;&lt;/small&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;As documentações dos endpoints da API na versão v0 e v1 podem ser encontradas, respectivamente, em: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://developers.forem.com/api/v0" rel="noopener noreferrer"&gt;https://developers.forem.com/api/v0&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://developers.forem.com/api/v1" rel="noopener noreferrer"&gt;https://developers.forem.com/api/v1&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🙈 Atenção:&lt;/strong&gt;  Embora para alguns endpoints da versão v0 não seja exigido um token, no site da API recomendam o uso dessa autenticação.&lt;/p&gt;

&lt;p&gt;&lt;a id="api"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Obtendo dados da API via Python [^]
&lt;/h2&gt;

&lt;p&gt;O código abaixo realiza a captura, via API DEV, dos dados referentes as minhas (&lt;code&gt;username = msc2020&lt;/code&gt;) postagens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt; &lt;span class="c1"&gt;# instalação: pip install requests
&lt;/span&gt;
&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://dev.to/api/articles&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;querystring&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;default_headers&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;querystring&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
saída esperada:

[{
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type_of&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;article&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:1850779,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Raspagem de dados de um site de notícias em pt-BR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: ...
&lt;/span&gt;&lt;span class="gp"&gt;  ...&lt;/span&gt;
&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="sh"&gt;'''&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O retorno da chamada &lt;code&gt;GET&lt;/code&gt; usada no código acima, é um objeto (&lt;code&gt;response&lt;/code&gt;) da biblioteca &lt;code&gt;Requests&lt;/code&gt;. Para converter/&lt;em&gt;parsear&lt;/em&gt; o conteúdo de &lt;code&gt;response.text&lt;/code&gt; (&lt;code&gt;str&lt;/code&gt;) em uma lista de dicionários (&lt;code&gt;dict&lt;/code&gt;) usamos:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt; &lt;span class="c1"&gt;# biblioteca padrão do Python
&lt;/span&gt;
&lt;span class="n"&gt;res_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;🗒️ Nota:&lt;/strong&gt; No &lt;em&gt;script&lt;/em&gt;  da requisição passamos o parâmetro &lt;code&gt;username&lt;/code&gt; na chamada &lt;code&gt;GET&lt;/code&gt;. Para ver outros parâmetros disponíveis no endpoint &lt;code&gt;/articles&lt;/code&gt; acesse este &lt;a href="https://developers.forem.com/api/v0#tag/articles/operation/getArticles" rel="noopener noreferrer"&gt;link&lt;/a&gt; da documentação da API.&lt;/p&gt;

&lt;p&gt;A fim de facilitar uma futura análise dos dados coletado, estaremos convertendo esse &lt;code&gt;JSON&lt;/code&gt; em um &lt;code&gt;CSV&lt;/code&gt;. Para quem optar em trabalhar com o &lt;code&gt;Pandas&lt;/code&gt; nas análises, o formato &lt;code&gt;CSV&lt;/code&gt; poderá ajudar muito.&lt;/p&gt;

&lt;p&gt;&lt;a id="csv"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Exportando os dados para CSV [^]
&lt;/h2&gt;

&lt;p&gt;Após coletar o &lt;code&gt;JSON&lt;/code&gt; dos dados via API, utilizamos o &lt;code&gt;to_csv&lt;/code&gt; do Pandas para exportar os dados para o formato &lt;code&gt;CSV&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Incluindo essa etapa, obtemos o código completo &lt;code&gt;exporta_posts.py&lt;/code&gt;: &lt;a id="codigo_completo"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# exporta_posts.py
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt; &lt;span class="c1"&gt;# instalação: pip install requests
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt; &lt;span class="c1"&gt;# instalação: pip install pandas
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt; &lt;span class="c1"&gt;# biblioteca padrão do Python
&lt;/span&gt;
&lt;span class="c1"&gt;# define nome do usuário
&lt;/span&gt;&lt;span class="n"&gt;USER_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# faz requisição GET
&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://dev.to/api/articles&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;querystring&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;USER_NAME&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;default_headers&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;querystring&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# print(response.text)
&lt;/span&gt;
&lt;span class="c1"&gt;# converte resposta da requisição em uma lista de dict
&lt;/span&gt;&lt;span class="n"&gt;res_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# converte o JSON para Pandas DataFrame
&lt;/span&gt;&lt;span class="n"&gt;df_posts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res_json&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# exporta os dados do posts para CSV
&lt;/span&gt;&lt;span class="n"&gt;df_posts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dataset_articles_published_msc2020.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# exibe as 3 primeiras linhas do dataset
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_posts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;saídas esperadas:
. 1) 3 primeiras linhas do CSV:
&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;
&lt;span class="n"&gt;type_of&lt;/span&gt;       &lt;span class="nb"&gt;id&lt;/span&gt;  &lt;span class="p"&gt;...&lt;/span&gt;                                               &lt;span class="n"&gt;tags&lt;/span&gt;                                               &lt;span class="n"&gt;user&lt;/span&gt;
&lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="n"&gt;article&lt;/span&gt;  &lt;span class="mi"&gt;1850779&lt;/span&gt;  &lt;span class="p"&gt;...&lt;/span&gt;         &lt;span class="n"&gt;tutorial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;braziliandevs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beginners&lt;/span&gt;  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tw...
1  article  1842575  ...  deeplearning, machinelearning, python, brazili...  {&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;tw&lt;/span&gt;&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;  &lt;span class="n"&gt;article&lt;/span&gt;  &lt;span class="mi"&gt;1835701&lt;/span&gt;  &lt;span class="p"&gt;...&lt;/span&gt;                    &lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tutorial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;braziliandevs&lt;/span&gt;  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msc2020&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tw...

[3 rows x 25 columns]

. 2) criação de um arquivo CSV no diretório local: `dataset_articles_published_msc2020.csv`
&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F06ufyqey87tmh58ro6d5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F06ufyqey87tmh58ro6d5.png" title="Print da saída no Jupyter notebook" alt="df_posts.head(3) no jupyter notebook" width="800" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;small&gt;Print da saída de df_posts.head(3) no Jupyter notebook&lt;/small&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;&lt;a id="testes"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Testes usando outro nome de usuário [^]
&lt;/h2&gt;

&lt;p&gt;Atualmente, também é possível obter dados sobre os posts de outros usuários usando o endpoint &lt;code&gt;articles&lt;/code&gt; da API DEV. Por exemplo, usando agora &lt;code&gt;USER_NAME = 'anuragrana'&lt;/code&gt; e mudando o nome da saída para &lt;code&gt;dataset_articles_published_user.csv&lt;/code&gt; no código completo &lt;code&gt;exporta_posts.py&lt;/code&gt; é esperado que o retorno seja o seguinte:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="s"&gt;type_of       id  ...                                               user flare_tag&lt;/span&gt;
&lt;span class="s"&gt;0  article  1855307  ...  {'name'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Anurag&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Rana'&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anuragran...&lt;/span&gt;&lt;span class="nv"&gt;       &lt;/span&gt;&lt;span class="s"&gt;NaN&lt;/span&gt;
&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;article&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;1276096&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;{'name'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Anurag&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Rana'&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anuragran...&lt;/span&gt;&lt;span class="nv"&gt;       &lt;/span&gt;&lt;span class="s"&gt;NaN&lt;/span&gt;
&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;article&lt;/span&gt;&lt;span class="nv"&gt;   &lt;/span&gt;&lt;span class="s"&gt;262178&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="nv"&gt;  &lt;/span&gt;&lt;span class="s"&gt;{'name'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Anurag&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Rana'&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anuragran...&lt;/span&gt;&lt;span class="nv"&gt;       &lt;/span&gt;&lt;span class="s"&gt;NaN&lt;/span&gt;

&lt;span class="s"&gt;[3&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;rows&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;26&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;columns]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jwl4lpwjkznvjmxhe2a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jwl4lpwjkznvjmxhe2a.png" alt="head jupyter notebook" width="800" height="345"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;small&gt;Print da saída de df_posts.head(3) do código exporta_posts.py no Jupyter notebook&lt;/small&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;&lt;a id="possibilidades_uso"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Possibilidades de uso [^]
&lt;/h2&gt;

&lt;p&gt;É possível explorar os dados coletados de várias maneiras. A seguir listamos algumas:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Usar o &lt;u&gt;Insomnia&lt;/u&gt; (&lt;a href="https://insomnia.rest/download" rel="noopener noreferrer"&gt;https://insomnia.rest/download&lt;/a&gt;) ou &lt;u&gt;Postman&lt;/u&gt;&lt;br&gt;
(&lt;a href="https://www.postman.com/downloads/" rel="noopener noreferrer"&gt;https://www.postman.com/downloads/&lt;/a&gt;) para acessar outros endpoints da &lt;a href="https://developers.forem.com/api/v1" rel="noopener noreferrer"&gt;v1&lt;/a&gt; da API DEV. Por exemplo, os endpoints &lt;code&gt;display_ads/&lt;/code&gt;, &lt;code&gt;follows/tags/&lt;/code&gt;, &lt;code&gt;reactions/&lt;/code&gt; e &lt;code&gt;readinglist&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Realizar uma &lt;u&gt;análise exploratória&lt;/u&gt; do &lt;code&gt;CSV&lt;/code&gt; construído, usando bibliotecas como &lt;a href="https://pandas.pydata.org/" rel="noopener noreferrer"&gt;Pandas&lt;/a&gt;, &lt;a href="https://scikit-learn.org/stable/index.html" rel="noopener noreferrer"&gt;Scikit-learn&lt;/a&gt;, &lt;a href="https://seaborn.pydata.org/" rel="noopener noreferrer"&gt;Seaborn&lt;/a&gt; e &lt;a href="https://plotly.com/python/" rel="noopener noreferrer"&gt;Plotly&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Aplicar um modelo de LLM como o &lt;a&gt;&lt;em&gt;Social-LLM: Modeling User Behavior at Scale using Language Models and Social Network Data&lt;/em&gt;&lt;/a&gt; para analisar um &lt;em&gt;dataset&lt;/em&gt; formado com dados de vários endpoints selecionados da API DEV.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Contribuir para evolução do projeto Forem, colaborando em &lt;a href="https://github.com/forem/forem" rel="noopener noreferrer"&gt;seu Github&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;a id="conclusao"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusão [^]
&lt;/h2&gt;

&lt;p&gt;O &lt;code&gt;CSV&lt;/code&gt; obtido neste post pode ajudar nas análises de dados com bibliotecas Python. Com poucas adaptações no código criado, é possível obter dados de outros endpoints da API DEV. Se vc tem alguma ideia para usarmos os dados coletados da API, compartilhe.&lt;/p&gt;


&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;center&gt;Agradecemos a leitura!&lt;/center&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;center&gt;☕ 🧘‍♂️ 💻 ☯️ 🪬&lt;/center&gt;

</description>
      <category>api</category>
      <category>python</category>
      <category>braziliandevs</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Raspagem de dados de um site de notícias em pt-BR</title>
      <dc:creator>msc2020</dc:creator>
      <pubDate>Mon, 13 May 2024 01:20:22 +0000</pubDate>
      <link>https://forem.com/msc2020/raspagem-de-dados-de-um-site-de-noticias-em-pt-br-1f91</link>
      <guid>https://forem.com/msc2020/raspagem-de-dados-de-um-site-de-noticias-em-pt-br-1f91</guid>
      <description>&lt;p&gt;Neste post realizamos a raspagem de dados (&lt;em&gt;web scraping&lt;/em&gt;) de um site de notícias em português do Brasil (pt-BR). Utilizando a biblioteca &lt;code&gt;Requests&lt;/code&gt; do Python 3, coletamos o conteúdo HTML do site. Em seguida, extraímos as &lt;u&gt;notícias mais lidas e seus respectivos links&lt;/u&gt; com o &lt;code&gt;BeautifulSoup&lt;/code&gt;. Introduzimos alguns aspectos legais do &lt;em&gt;scraping&lt;/em&gt; e apresentamos casos de uso que estão atualmente rodando na Internet.&lt;/p&gt;

&lt;p&gt;&lt;a id="sumario"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Sumário ☕
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Pré-requisitos&lt;/li&gt;
&lt;li&gt;Coleta do conteúdo do HTML&lt;/li&gt;
&lt;li&gt;Inspecionando o trecho HTML de interesse&lt;/li&gt;
&lt;li&gt;Busca pela tag com o BeautifulSoup&lt;/li&gt;
&lt;li&gt;Sobre o &lt;code&gt;robots.txt&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Exemplos de uso da raspagem de dados rodando em produção&lt;/li&gt;
&lt;li&gt;Conclusão e próximos passos&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;small&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;📻 Sugestão de disco para acompanhar o tutorial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/watch?v=w6PfwriXJbs" rel="noopener noreferrer"&gt;Miles Davis - &lt;em&gt;Round About Midnight (1957) Full Album&lt;/em&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a id="pre_requisitos"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Pré-requisitos ^
&lt;/h2&gt;

&lt;p&gt;Para rodar os códigos apresentados, é necessário instalar as seguintes bibliotecas do Python 3:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Beautiful Soup&lt;/code&gt; (&lt;code&gt;pip install beautifulsoup4&lt;/code&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Requests&lt;/code&gt; (&lt;code&gt;pip install requests&lt;/code&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a id="coleta_html"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Coleta do conteúdo do HTML ^
&lt;/h2&gt;

&lt;p&gt;O primeiro passo para extrair as notícias mais lidas é obter o HTML do site de interesse. Para isso usamos a biblioteca &lt;a href="https://docs.python-requests.org/en/latest/index.html" rel="noopener noreferrer"&gt;&lt;code&gt;Requests&lt;/code&gt;&lt;/a&gt;. Com ela é possível executar requisições HTTP como &lt;code&gt;GET&lt;/code&gt;, &lt;code&gt;POST&lt;/code&gt;, &lt;code&gt;PUT&lt;/code&gt; e &lt;code&gt;DELETE&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllnzo18im4g05s88vp86.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllnzo18im4g05s88vp86.png" alt="Homepage site UOL" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;small&gt;Homepage do site &lt;a href="www.uol.com.br"&gt;www.uol.com.br&lt;/a&gt;&lt;/small&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;A página escolhida para realizarmos a raspagem é a tela inicial do site de notícias UOL (&lt;a href="https://www.uol.com.br" rel="noopener noreferrer"&gt;https://www.uol.com.br&lt;/a&gt;). O código abaixo realiza a coleta do conteúdo HTML deste site:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://www.uol.com.br&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;content_html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

&lt;span class="c1"&gt;# descomente a linha abaixo para ver o conteúdo do HTML
# print(content_html)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpi2hanbfppe9i0rqx9qk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpi2hanbfppe9i0rqx9qk.png" alt="html homepage" width="800" height="540"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;small&gt;Imagem do HTML coletado com a biblioteca Requests.&lt;/small&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;Após coletar o HTML, usamos o &lt;code&gt;BeautifulSoup&lt;/code&gt; para extrair os dados de interesse. Essa biblioteca disponibiliza funções para a navegação e extração de elementos das tags HTML.  &lt;/p&gt;

&lt;p&gt;Para transformar o conteúdo de texto HTML coletado (&lt;code&gt;content_html&lt;/code&gt;) em um objeto &lt;code&gt;BeautifulSoup&lt;/code&gt;, é necessário realizar o &lt;em&gt;parsing&lt;/em&gt; desse texto, convertendo ele numa estrutura de dados navegável.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;

&lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content_html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Após realizar o &lt;em&gt;parsing&lt;/em&gt; do conteúdo HTML (&lt;code&gt;content_html&lt;/code&gt;), temos um objeto que organiza as tags, e todo conteúdo do HTML, em forma de árvore. As relações das tags (pais, filhos, etc,) é mantida, o que facilita a navegação nos elementos HTML. No código acima, esse objeto é chamado de &lt;code&gt;soup&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exemplo
&lt;/h3&gt;

&lt;p&gt;Como teste inicial, vamos extrair o título (tag &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt;) do HTML coletado. Para isso basta acessar o elemento &lt;code&gt;title&lt;/code&gt; do objeto &lt;code&gt;soup&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# saída esperada: 
# &amp;lt;title&amp;gt;UOL - Seu universo online&amp;lt;/title&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;É possível extrair outros elementos de forma semelhante. Por exemplo, para obter o primeiro link (tag &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt;) do HTML, usamos o elemento &lt;code&gt;a&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
saída esperada: 
&amp;lt;a class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hyperlink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; href=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.ingresso.com/?partnership=home&amp;amp;amp;utm_source=uol.com.br&amp;amp;amp;utm_medium=barrauol&amp;amp;amp;utm_campaign=linkfixo_barrauol&amp;amp;amp;utm_content=barrauol-link-ingressocom&amp;amp;amp;utm_term=barrauol-ingressocom&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; title=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ingresso.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;
Ingresso.com
&amp;lt;/a&amp;gt;
&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Podemos coletar todos os links do HTML com o método &lt;code&gt;find_all('a')&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;saída esperada: uma lista com todos os links da página

[&amp;lt;a class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hyperlink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; href=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.ingresso.com/?partnership=home&amp;amp;amp;utm_source=uol.com.br&amp;amp;amp;utm_medium=barrauol&amp;amp;amp;utm_campaign=linkfixo_barrauol&amp;amp;amp;utm_content=barrauol-link-ingressocom&amp;amp;amp;utm_term=barrauol-ingressocom&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; title=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ingresso.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;
Ingresso.com
&amp;lt;/a&amp;gt;,
 &amp;lt;a class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hyperlink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; href=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://batepapo.uol.com.br/?utm_source=uol.com.br&amp;amp;amp;utm_medium=barrauol&amp;amp;amp;utm_campaign=linkfixo_barrauol&amp;amp;amp;utm_term=barrauol-uolplay&amp;amp;amp;utm_content=barrauol&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; title=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bate-Papo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;
Bate-Papo
&amp;lt;/a&amp;gt;,
&lt;/span&gt;&lt;span class="gp"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="sh"&gt;'''&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Documentação:&lt;/strong&gt; Uma lista completa dos comandos do &lt;code&gt;BeautifulSoup&lt;/code&gt; pode ser encontrada em sua documentação. Clique &lt;a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc.ptbr/#" rel="noopener noreferrer"&gt;aqui&lt;/a&gt; para acessá-la.&lt;/p&gt;




&lt;p&gt;&lt;a id="inspeciona_html"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Inspecionando o trecho HTML de interesse ^
&lt;/h2&gt;

&lt;p&gt;Agora veremos como coletar a seção das &lt;u&gt;notícias mais lidas&lt;/u&gt; do site de notícias.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4wgyposaa0jr3wsggclc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4wgyposaa0jr3wsggclc.png" alt="Mais lidas" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Atualmente (maio/2024), essa seção fica mais a direita da página. Para obter o elemento HTML referente a ela, &lt;a href="https://pt.wikihow.com/Inspecionar-um-Elemento-no-Chrome" rel="noopener noreferrer"&gt;inspecionamos a página &lt;/a&gt;. Posicionamos o cursor do mouse sobre a região da seção 'Mais lidas', clicamos com o botão direito do mouse e, em seguida, clicamos em &lt;u&gt;Inspecionar&lt;/u&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzms7pl0i2kt6jkm3yflf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzms7pl0i2kt6jkm3yflf.png" alt="Inspecionar" width="800" height="743"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Atalho:&lt;/strong&gt; Também é possível acessar o &lt;em&gt;Inspect&lt;/em&gt; (Inspetor) do navegador web com o atalho &lt;code&gt;Ctrl + Shift + c&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Com o Inspetor é possível identificar a tag HTML que caracteriza o elemento HTML desejado.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc1dm137puae3gb23yvx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc1dm137puae3gb23yvx.png" alt="Inspetor check" width="800" height="543"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A figura acima mostra essa etapa. Note que o trecho destacado, mostra que a seção "Mais Lidas" está na tag &lt;code&gt;&amp;lt;li&amp;gt;&lt;/code&gt; e sua classe (&lt;code&gt;class&lt;/code&gt;) é igual a &lt;code&gt;mostRead__item&lt;/code&gt;.&lt;/p&gt;




&lt;p&gt;&lt;a id="procura_tag"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Busca pela tag com o BeautifulSoup ^
&lt;/h2&gt;

&lt;p&gt;Agora que identificamos a tag HTML correspondente as notícias mais lidas, basta procurarmos por ela no objeto &lt;code&gt;soup&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;O código abaixo procura por todas tags to tipo &lt;code&gt;&amp;lt;li class= "mostRead__item"&amp;gt;&lt;/code&gt; e exibe o resultado.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;all_most_read&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;li&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;class_&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mostRead__item&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_most_read&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;saída esperada:
[&amp;lt;li class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mostRead__item&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;a class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mostRead__item__hyperlink hyperlink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; href=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://noticias.uol.com.br/cotidiano/ultimas-noticias/2024/05/12/quatro-membros-da-mesma-familia-sao-encontrados-mortos-abracados-no-rs.htm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; title=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Quatro membros da mesma família são encontrados mortos abraçados no RS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;h3 class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title__element mostRead__title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;
Quatro membros da mesma família são encontrados mortos abraçados no RS
&amp;lt;!-- --&amp;gt;&amp;lt;/h3&amp;gt;&amp;lt;!-- --&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;, &amp;lt;li class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mostRead__item&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;a class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mostRead__item__hyperlink hyperlink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; href=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://noticias.uol.com.br/cotidiano/ultimas-noticias/2024/05/12/nunca-vi-um-rapaz-ele-conta-como-e-ser-marido-de-presa.htm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; title=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ele narra a rotina de &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;marido&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; de presa: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Nunca vi um rapaz nas visitas&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;h3 class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title__element mostRead__title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;
Ele narra a rotina de &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;marido&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; de presa: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Nunca vi um rapaz nas visitas&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
&amp;lt;!-- --&amp;gt;&amp;lt;/h3&amp;gt;&amp;lt;!-- --&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;, &amp;lt;li class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mostRead__item&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;a class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mostRead__item__hyperlink hyperlink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; href=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://noticias.uol.com.br/cotidiano/ultimas-noticias/2024/05/12/rs-prefeito-de-canoas-pede-evacuacao-imediata-de-quem-retornou-as-casas.htm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; title=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Com novo alerta de cheias, prefeitos do RS pedem que moradores deixem casas&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;h3 class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title__element mostRead__title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;
Com novo alerta de cheias, prefeitos do RS pedem que moradores deixem casas
&amp;lt;!-- --&amp;gt;&amp;lt;/h3&amp;gt;&amp;lt;!-- --&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;, &amp;lt;li class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mostRead__item&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;a class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mostRead__item__hyperlink hyperlink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; href=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://noticias.uol.com.br/opiniao/coluna/2024/05/12/minha-indignacao.htm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; title=&lt;/span&gt;&lt;span class="sh"&gt;"'&lt;/span&gt;&lt;span class="s"&gt;Confesso que perdi a paciência com mentirosos&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, diz ministro Paulo Pimenta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;h3 class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title__element mostRead__title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;
&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Confesso que perdi a paciência com mentirosos&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, diz ministro Paulo Pimenta
&amp;lt;!-- --&amp;gt;&amp;lt;/h3&amp;gt;&amp;lt;!-- --&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;, &amp;lt;li class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mostRead__item&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;a class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mostRead__item__hyperlink hyperlink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; href=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.uol.com.br/esporte/futebol/colunas/futebol-pelo-mundo/2024/05/12/como-mensagem-de-rodrygo-para-o-rs-virou-fake-news-politica-na-europa.htm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; title=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mensagem de Rodrygo para o RS vira fake news política na Europa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;h3 class=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title__element mostRead__title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;
Mensagem de Rodrygo para o RS vira fake news política na Europa
&amp;lt;!-- --&amp;gt;&amp;lt;/h3&amp;gt;&amp;lt;!-- --&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;]
&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Para melhorar a organização dos dados coletados, podemos separar o título e o link da notícia mais lida com o &lt;code&gt;get_text()&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Código completo
&lt;/h3&gt;

&lt;p&gt;Incluindo essa etapa o código completo para coleta e exibição das notícias na seção mais lida fica:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://www.uol.com.br&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

&lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;news_most_read&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;li&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;class_&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mostRead__item&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;href&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;href&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;saída esperada:
0. Quatro membros da mesma família são encontrados mortos abraçados no RS
&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;noticias&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uol&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;br&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;cotidiano&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;ultimas&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;noticias&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;quatro&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;membros&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;da&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;mesma&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;familia&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;sao&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;encontrados&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;mortos&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;abracados&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;htm&lt;/span&gt;

&lt;span class="s"&gt;1. Ele narra a rotina de &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;marido&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; de presa: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Nunca vi um rapaz nas visitas&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;noticias&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uol&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;br&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;cotidiano&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;ultimas&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;noticias&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;nunca&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;vi&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;um&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rapaz&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ele&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;conta&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;como&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ser&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;marido&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;de&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;presa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;htm&lt;/span&gt;

&lt;span class="s"&gt;2. Com novo alerta de cheias, prefeitos do RS pedem que moradores deixem casas
&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;noticias&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uol&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;br&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;cotidiano&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;ultimas&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;noticias&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;prefeito&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;de&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;canoas&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;pede&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;evacuacao&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;imediata&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;de&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;quem&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;retornou&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;casas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;htm&lt;/span&gt;

&lt;span class="s"&gt;3. &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Confesso que perdi a paciência com mentirosos&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, diz ministro Paulo Pimenta
&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;noticias&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uol&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;br&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;opiniao&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;coluna&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;minha&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;indignacao&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;htm&lt;/span&gt;

&lt;span class="s"&gt;4. Mensagem de Rodrygo para o RS vira fake news política na Europa
&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;www&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uol&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;br&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;esporte&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;futebol&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;colunas&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;futebol&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;pelo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;mundo&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;como&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;mensagem&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;de&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rodrygo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;para&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;virou&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;fake&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;news&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;politica&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;na&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;europa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;htm&lt;/span&gt;
&lt;span class="sh"&gt;'''&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Nota:&lt;/strong&gt; Uma raspagem de dados como a que fizemos acessa os elementos HTML disponíveis no &lt;u&gt;momento atual&lt;/u&gt;. Como os sites podem mudar ao longo do seu desenvolvimento, sempre que houver alterações, os nomes utilizados nas buscas e os resultados devem ser atualizados conforme necessidade. O site da UOL, em particular, tem mantido sua estrutura de HTML há um bom tempo&lt;a id="continua1"&gt;&lt;/a&gt;, mudando apenas o conteúdo.&lt;/p&gt;




&lt;p&gt;&lt;a id="robots"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Sobre o &lt;code&gt;robots.txt&lt;/code&gt; ^
&lt;/h2&gt;

&lt;p&gt;Alguns sites não autorizam a raspagem de dados e informam isso no arquivo &lt;code&gt;robots.txt&lt;/code&gt;. Para verificar o conteúdo deste arquivo basta acessar &lt;code&gt;www.&amp;lt;nome-do-site.com.br&amp;gt;/robots.txt&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;No caso do site de notícias aqui escolhido, ao acessar &lt;a href="//www.uol.com.br/robots.txt"&gt;www.uol.com.br/robots.txt&lt;/a&gt; temos o seguinte retorno:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mm9ucmqyzj7z18cpupl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mm9ucmqyzj7z18cpupl.png" alt="Print robots" width="796" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;O asterisco em &lt;code&gt;User-agent: *&lt;/code&gt; indica que todos os agentes (softwares/bots) podem acessar todo conteúdo do site (&lt;code&gt;Allow: /&lt;/code&gt;) com exceção do conteúdo de &lt;code&gt;/carros/dev/&lt;/code&gt; (&lt;code&gt;Disallow: /carros/dev/&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Ainda há outras restrições. Os agentes &lt;code&gt;GPTBot&lt;/code&gt; e &lt;code&gt;Google-Extended&lt;/code&gt; não podem acessar o site (&lt;code&gt;Disalow: /&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Uma lista de agentes e mais informações sobre o &lt;code&gt;robots.txt&lt;/code&gt; estão disponíveis em &lt;a href="https://www.robotstxt.org/db.html" rel="noopener noreferrer"&gt;https://www.robotstxt.org/db.html&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Atenção:&lt;/strong&gt; Embora o &lt;code&gt;robots.txt&lt;/code&gt; possua recomendações do que o site permite ou não acessar via software, algumas vezes o webmaster responsável não inclui o &lt;code&gt;robots.txt&lt;/code&gt;, pois não é algo obrigatório. O que de fato regulamenta a coleta, armazenamento, uso, etc. dos dados vindo da raspagem é a legislação do país. Geralmente, para projetos pessoais, sem fins lucrativos, sem requisições massivas, costuma ser tranquilo.&lt;/p&gt;

&lt;p&gt;De toda forma, vale avaliar como os dados coletados serão usados, visto que dependendo do caso pode haver implicações legais.&lt;/p&gt;




&lt;p&gt;&lt;a id="exemplos"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Exemplos de uso da raspagem de dados rodando em produção ^
&lt;/h2&gt;

&lt;p&gt;A fim de ilustrar as possibilidades de uso do &lt;em&gt;web scraping&lt;/em&gt;, deixamos dois exemplos de uso.&lt;/p&gt;

&lt;h3&gt;
  
  
  Querido Diário (&lt;a href="https://queridodiario.ok.org.br" rel="noopener noreferrer"&gt;https://queridodiario.ok.org.br&lt;/a&gt;)
&lt;/h3&gt;

&lt;p&gt;A plataforma Querido Diário disponibiliza vários conteúdos de interesse público, muitas vezes coletados através da raspagem de dados. Nela o usuário pode obter informações relacionadas ao Diário Oficial de inúmeros municípios brasileiros.&lt;/p&gt;

&lt;p&gt;Os dados do Querido Diário muitas vezes servem de matéria-prima para reportagens, pesquisas e análises de diferentes tipos.&lt;/p&gt;

&lt;p&gt;Para contribuir com o esse projeto open source acesse seu Github: &lt;a href="https://github.com/okfn-brasil/querido-diario-comunidade" rel="noopener noreferrer"&gt;https://github.com/okfn-brasil/querido-diario-comunidade&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Mini-app &lt;a id="mini_app"&gt;&lt;/a&gt;^
&lt;/h3&gt;

&lt;p&gt;Outro exemplo que usa a raspagem de dados é esse mini-app experimental que fizemos algum tempo atrás:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://newsaggapp-1-j9368482.deta.app/" rel="noopener noreferrer"&gt;https://newsaggapp-1-j9368482.deta.app/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ele foi disponibilizado em produção de forma 100% gratuita com o Deta Space: &lt;a href="https://deta.space/" rel="noopener noreferrer"&gt;https://deta.space/&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;a id="conclusao"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusão e próximos passos ^
&lt;/h2&gt;

&lt;p&gt;Assim como fizemos o &lt;em&gt;scraping&lt;/em&gt; da seção "mais lidas", podemos coletar outras seções do site da UOL ou de outro de preferência. Basta usar a etapa de análise com o Inspetor do navegador web para conhecer a estrutura do HTML. Além disso, o que vimos sobre o &lt;code&gt;robots.txt&lt;/code&gt; pode servir de base caso haja dúvidas sobre a viabilidade ou não de realizar a raspagem de algum site. Acreditamos que o conteúdo visto pode contribuir de alguma forma para iniciantes no assunto ou interessados. Dicas ou sugestões são bem-vindas.&lt;/p&gt;

&lt;center&gt;&lt;h3&gt;Agradecemos a leitura! ☕ 💻 🗞️&lt;/h3&gt;&lt;/center&gt;

</description>
      <category>tutorial</category>
      <category>braziliandevs</category>
      <category>python</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Um "Hello World" em Deep Learning</title>
      <dc:creator>msc2020</dc:creator>
      <pubDate>Sun, 05 May 2024 13:20:22 +0000</pubDate>
      <link>https://forem.com/msc2020/um-hello-world-em-deep-learning-oob</link>
      <guid>https://forem.com/msc2020/um-hello-world-em-deep-learning-oob</guid>
      <description>&lt;p&gt;Este post apresenta um passo a passo para rodar um modelo de &lt;em&gt;Deep Learning&lt;/em&gt; (DL) que realiza uma tarefa que até pouco tempo atrás era um desafio. A partir de uma base de dados, devidamente organizada e rotulada, contendo dezenas de milhares de dígitos (0 a 9) escritos a mão, o modelo de DL será capaz de predizer qual é o dígito da imagem de entrada. Como nem todo mundo possui uma letra tão bonita, essa tarefa não é trivial, principalmente para um computador.&lt;/p&gt;

&lt;p&gt;Modelos desse tipo, são comumente usados para identificar de forma automática o conteúdo de texto em documentos, PDFs, imagens, entre outros. Com poucas alterações no que veremos, é possível classificar outros conteúdos de outros datasets. No final do post falamos um pouco sobre isso.&lt;/p&gt;

&lt;p&gt;Vamos lá!? ☕&lt;/p&gt;




&lt;h2&gt;
  
  
  Pré-requisitos
&lt;/h2&gt;

&lt;p&gt;Para executar os códigos deste post precisamos das seguintes bibliotecas do Python 3:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Matplotlib (&lt;code&gt;pip install matplotlib&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Numpy (&lt;code&gt;pip install numpy&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Scikit-learn (&lt;code&gt;pip install scikit-learn&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Dataset MNIST
&lt;/h2&gt;

&lt;p&gt;Criado em 1994, o dataset MNIST (&lt;em&gt;Modified National Institute of Standards and Technology&lt;/em&gt;) é bastante utilizado na área de visão computacional e processamento de imagens. A versão atual é composta por 60k imagens de treino e 10k de teste. Cada amostra do MNIST é uma imagem 28x28 em tons de cinza e representa um dígito manuscrito que assume valores entre 0 e 9.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1showgv1onue4hmb6tiz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1showgv1onue4hmb6tiz.png" alt="dígitos do MNIST" width="793" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;small&gt;Imagem de amostras do MNIST.&lt;/small&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;&lt;strong&gt;Download do MNIST&lt;/strong&gt;: Usaremos a versão do dataset MNIST disponibilizada neste site: &lt;a href="http://yann.lecun.com/exdb/mnist/" rel="noopener noreferrer"&gt;http://yann.lecun.com/exdb/mnist/&lt;/a&gt;. Devemos entrar no site e baixar os 4 datasets disponibilizados, colocando-os num diretório. Para facilitar criamos uma pasta chamada &lt;code&gt;datasets&lt;/code&gt; e colocamos os arquivos &lt;code&gt;.gz&lt;/code&gt; baixados nele.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fit6jwqmbbrbtk619fq5o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fit6jwqmbbrbtk619fq5o.png" alt="Download MNIST" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;small&gt;Site para download do MNIST. &lt;a href="http://yann.lecun.com/exdb/mnist/" rel="noopener noreferrer"&gt;http://yann.lecun.com/exdb/mnist/&lt;/a&gt;&lt;/small&gt;&lt;/small&gt;&lt;/center&gt;




&lt;h2&gt;
  
  
  Carregando o dataset
&lt;/h2&gt;

&lt;p&gt;Usaremos as seguintes funções para abrir o MNIST, pois o mesmo está no formato &lt;code&gt;.gz&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;gzip&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;struct&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path_dataset&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gzip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;magic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unpack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;II&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ncols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unpack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;II&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;frombuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;newbyteorder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ncols&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path_label&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
     &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gzip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path_label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;magic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unpack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;II&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;frombuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;newbyteorder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;        
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O código abaixo carrega as 4 partes do dataset MNIST: &lt;code&gt;train-images-idx3-ubyte.gz&lt;/code&gt; (conjunto de &lt;u&gt;treino&lt;/u&gt;); &lt;code&gt;train-labels-idx1-ubyte.gz&lt;/code&gt; (&lt;u&gt;labels&lt;/u&gt; do conjunto de &lt;u&gt;treino&lt;/u&gt;); &lt;code&gt;t10k-images-idx3-ubyte.gz&lt;/code&gt; (conjunto de &lt;u&gt;teste&lt;/u&gt;) e &lt;code&gt;t10k-labels-idx1-ubyte.gz&lt;/code&gt; (&lt;u&gt;labels&lt;/u&gt; do conjunto de &lt;u&gt;teste&lt;/u&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;X_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./datasets/train-images-idx3-ubyte.gz&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./datasets/train-labels-idx1-ubyte.gz&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./datasets/t10k-images-idx3-ubyte.gz&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./datasets/t10k-labels-idx1-ubyte.gz&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Após carregar os datasets, checamos se suas dimensões estão dentro do esperado.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# valor esperado: (60000, 28, 28)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# (60000,)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# (10000, 28, 28)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# (10000, )
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note que a dimensão das imagens dos conjuntos de treino e de teste é 28 x 28.&lt;/p&gt;




&lt;h2&gt;
  
  
  Histograma dos labels
&lt;/h2&gt;

&lt;p&gt;Uma etapa importante na construção dos datasets em DL é considerar classes balanceadas. Datasets de treino com classes desbalanceadas costumam introduzir um viés, fazendo com que as predições favoreçam algumas classes ao invés de outras. Por ex., se existe uma grande quantidade de amostras do dígito 1 e poucas do dígito 7, é comum que o modelo aprenda esse padrão e tente reproduzi-lo nas predições que fizer. Nesse caso, o modelo tende a achar que "tudo" se parece mais com o dígito 1 e considera o dígito 7 como algo menos comum. Como o MNIST possui 10 classes, referentes aos dígitos de 0 a 9, então é esperado que possuam distribuições semelhantes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Curiosidade:&lt;/strong&gt; Na prática, o viés pode levar a falhas brutais. Por ex., &lt;a href="https://g1.globo.com/ba/bahia/noticia/2023/09/01/com-mais-de-mil-prisoes-na-ba-sistema-de-reconhecimento-facial-e-criticado-por-racismo-algoritmico-inocente-ficou-preso-por-26-dias.ghtml" rel="noopener noreferrer"&gt;um caso ocorrido em 2022&lt;/a&gt; levou um inocente a ficar preso por 26 dias devido a um erro "algorítmico", causando danos e prejuízos reais. Para mais detalhes do caso clique &lt;a href="https://g1.globo.com/ba/bahia/noticia/2023/09/01/com-mais-de-mil-prisoes-na-ba-sistema-de-reconhecimento-facial-e-criticado-por-racismo-algoritmico-inocente-ficou-preso-por-26-dias.ghtml" rel="noopener noreferrer"&gt;aqui&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Para avaliar a distribuição dos labels de treino do MNIST  usaremos o histograma:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dpi&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x_ticks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;bins&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_ticks&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;edgecolor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rwidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bins&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#ABCDEF&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Histograma do label de treino y_train&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Classe - Label&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Frequência&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xticks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_ticks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpxlwpba2bgrjet82sq21.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpxlwpba2bgrjet82sq21.png" alt="Histograma dos labels de treino" width="800" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;O gráfico acima mostra que as classes do dataset MNIST estão balanceadas. Sendo assim, seguimos para a etapa de treinar e utilizar os modelos treinados para predizer dígitos manuscritos de entrada (&lt;em&gt;input&lt;/em&gt;).&lt;/p&gt;




&lt;h2&gt;
  
  
  Aplicando modelos de Machine Learning e Deep Learning
&lt;/h2&gt;

&lt;p&gt;Para avaliar a performance do modelo de &lt;em&gt;Deep Learning&lt;/em&gt; que usaremos, também vamos realizar predições dos dígitos manuscritos com um modelo de &lt;em&gt;Machine Learning&lt;/em&gt; (ML).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nota:&lt;/strong&gt; Não entraremos nos detalhes do que é ou não é &lt;em&gt;Machine Learning&lt;/em&gt; e &lt;em&gt;Deep Learning&lt;/em&gt;. No entanto, vale ter em mente que os modelos de &lt;em&gt;Deep Learning&lt;/em&gt; costumam ser considerados como um subconjunto dos modelos de &lt;em&gt;Machine Learning&lt;/em&gt;. Isso porque eles são modelos capazes de aprender de forma automática (sem intervenção humana) e fazer predições baseadas naquilo que aprenderam. A grande diferença entre os dois é que os modelos de DL são capazes de aprender mais detalhes. Além disso, sua formulação é mais complexa e sua arquitetura, as famosas &lt;a href="https://en.wikipedia.org/wiki/Neural_network_(machine_learning)" rel="noopener noreferrer"&gt;redes neurais artificiais&lt;/a&gt;, geralmente possuem inúmeras camada ocultas.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj9t3ioth8u9egie419jz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj9t3ioth8u9egie419jz.png" alt="Uma rede MLP" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;small&gt;Imagem de uma MLP. (Rede gerada com ajuda do site &lt;a href="https://alexlenail.me/NN-SVG/" rel="noopener noreferrer"&gt;https://alexlenail.me/NN-SVG&lt;/a&gt;)&lt;/small&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;O fato de um determinado modelo ser classificado como X ou Y, não mudará sua natureza intrínseca. Questões de nomenclatura/taxonomia geralmente buscam organizar e facilitar o entendimento do objeto de estudo e não criar barreiras ou complexidades extras.&lt;/p&gt;

&lt;h3&gt;
  
  
  Modelo SGD
&lt;/h3&gt;

&lt;p&gt;O modelo SGD (&lt;em&gt;Stochastic Gradient Descent&lt;/em&gt;) &lt;code&gt;SGDClassifier&lt;/code&gt; do &lt;code&gt;scikit-learn&lt;/code&gt; será o modelo de ML que iremos compara com o de DL. Ele realiza o ajuste de um modelo SVM (&lt;em&gt;Support Vector Machine&lt;/em&gt;). O código a seguir treina/ajusta o &lt;code&gt;SGDClassifier&lt;/code&gt; sobre o dataset MNIST. Note que usamos uma etapa de pré-processamento com o &lt;code&gt;StandardScaler&lt;/code&gt;, que transforma a média dos dados para 0 e o desvio padrão igual a 1. Mais detalhes sobre podem ser encontrados em sua &lt;a href="https://scikit-learn.org/stable/modules/sgd.html#sgd" rel="noopener noreferrer"&gt;documentação&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SGDClassifier&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StandardScaler&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_pipeline&lt;/span&gt;

&lt;span class="n"&gt;model_sgd_classifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;make_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;StandardScaler&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                                     &lt;span class="nc"&gt;SGDClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;model_sgd_classifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                         &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Após o treinamento do modelo &lt;code&gt;SGDClassifier&lt;/code&gt;, já podemos realizar as predições.&lt;/p&gt;

&lt;h3&gt;
  
  
  Modelo Multilayer Perceptron Classifier
&lt;/h3&gt;

&lt;p&gt;Agora apresentaremos de fato o "&lt;em&gt;Hello World!&lt;/em&gt;" (Olá, Mundo!) em DL. O modelo escolhido para isso é uma rede neural conhecida como &lt;a href="https://en.wikipedia.org/wiki/Multilayer_perceptron" rel="noopener noreferrer"&gt;&lt;em&gt;Multilayer Percepetron&lt;/em&gt;&lt;/a&gt; ou MLP. Sua arquitetura geralmente possui 3 ou mais camadas ocultas. Esse modelo foi um marco, pois possui capacidade de aprender padrões não lineares e ser treinado em computadores pessoais. Além disso, suas implementações atuais, costumam ter um bom desempenho para tarefas um tanto complexas, como classificar dígitos manuscritos. Embora para os humanos, reconhecer dígitos ou letras seja algo aparentemente simples (principalmente no idioma raiz), para os computadores nem sempre foi assim. No caso do MNIST, de 1998 a 2012 houve um grande salto na performance dos modelos, &lt;a href="http://yann.lecun.com/exdb/mnist/" rel="noopener noreferrer"&gt;passando de 88% de acerto para 99.77%&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;O código abaixo treina o modelo &lt;code&gt;MLPClassifier&lt;/code&gt;, uma implementação de um MLP classificador. Ele também está disponível na biblioteca open-source Scikit-learn.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.neural_network&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MLPClassifier&lt;/span&gt;
&lt;span class="n"&gt;model_mlp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MLPClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model_mlp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Realizando as predições com os modelos treinados
&lt;/h3&gt;

&lt;p&gt;Agora que treinamos os modelos, realizamos as predições usando o método &lt;code&gt;predict&lt;/code&gt;. A seguir realizamos a predição sobre o conjunto de testes (&lt;code&gt;X_test&lt;/code&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_sgd_classifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_mlp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
saída esperada:

0.9027
0.9784
&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Comparando o valor do score de ambos, notamos que o desempenho do &lt;code&gt;MLPClassifier&lt;/code&gt; foi maior. Vale a pena comentar que ambos modelos se saíram muito bem, com assertividades acima de 90%. No entanto, a diferença entre eles pode ser considerada alta, já que foi de &lt;code&gt;0.9784 - 0.9027 = 0.0757&lt;/code&gt;. Ou seja, &lt;u&gt;cerca de 7.57% &lt;/u&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Salvando e carregando um modelo
&lt;/h3&gt;

&lt;p&gt;As etapas de treinamento podem demorar e consumir recursos como energia, vida útil da CPU ou GPU  entre outros. Dessa forma, é interessante salvarmos o modelo treinado em nosso computador e carregá-lo sempre que quisermos realizar uma predição.&lt;/p&gt;

&lt;p&gt;Para salvar o modelo utilizamos a &lt;a href="https://docs.python.org/3.12/library/pickle.html" rel="noopener noreferrer"&gt;biblioteca &lt;code&gt;pickle&lt;/code&gt;&lt;/a&gt;, responsável por serializar (passar de objeto Python/Scikit-learn para binário) e deserializar (fazer o processo contrário). Como o &lt;code&gt;pickle&lt;/code&gt; é uma biblioteca padrão do Python 3, não é necessário sua instalação, basta importá-la de forma usual com o &lt;code&gt;import&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pickle&lt;/span&gt;

&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_classifier_mnist.pkl&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pickle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_mlp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O código acima salva o modelo &lt;code&gt;MLPClassifier&lt;/code&gt; já treinado (objeto &lt;code&gt;model_mlp&lt;/code&gt;) com o nome &lt;code&gt;model_classifier_mnist.pkl&lt;/code&gt;. Note que usamos o método &lt;code&gt;dump()&lt;/code&gt; do &lt;code&gt;pickle&lt;/code&gt; para realizar essa tarefa.&lt;/p&gt;

&lt;p&gt;Já para carregar o modelo salvo, utilizamos o &lt;code&gt;load()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pickle_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_classifier_mnist.pkl&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pickle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pickle_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pickle_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Teste com uma amostra aletória
&lt;/h3&gt;

&lt;p&gt;Para finalizar, vamos realizar um simples teste que sorteia de forma aleatória um dígito manuscrito e realiza a predição com ambos modelos. Será que ambos modelos vão reconhecer o dígito manuscrito?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;idx_random&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# sorteia um índice
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;idx_random = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;idx_random&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;some_digit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx_random&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# seleciona o valor dos pixels no conjunto de treino
&lt;/span&gt;
&lt;span class="c1"&gt;# exibe o dígito manuscrito como uma imagem
&lt;/span&gt;&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dpi&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imshow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;some_digit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;binary&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;axis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;off&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvx73h5ibmgcw92v644qi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvx73h5ibmgcw92v644qi.png" alt="Uma amostra aleatória" width="538" height="135"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# realiza predição do dígito sorteado com os modelos MLP e SGD
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_mlp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;some_digit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;)]))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_sgd_classifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;some_digit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;)]))&lt;/span&gt;

&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
saída esperada:

[2]
[8]
&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Embora a tendência seja que ambos modelos acertem, visto que escolhemos uma amostra do conjunto de treino, apenas o modelo MLP acertou a predição do dígito sorteado. Para dificultar um pouco mais, podemos escolher uma amostra do conjunto de testes ou fazer uma imagem manuscrita com nossa própria letra e usar como entrada.&lt;/p&gt;

&lt;h2&gt;
  
  
  Datasets semelhantes
&lt;/h2&gt;

&lt;p&gt;Outros exemplos de datasets que poderíamos ter usado ao invés do MNIST:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/CIFAR-10" rel="noopener noreferrer"&gt;CIFAR-10&lt;/a&gt;: possui 10 classes com imagens de aviões, cães, gatos, carros, caminhões, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Fashion_MNIST" rel="noopener noreferrer"&gt;Fashion MNIST&lt;/a&gt;: possui 10 classes com imagens de roupas, sapatos, bolsas e outros itens semelhantes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.cs.toronto.edu/~kriz/cifar.html" rel="noopener noreferrer"&gt;CIFAR-100&lt;/a&gt;: possui 100 classes, contendo imagens como exemplos de animais, veículos, árvores e flores.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;center&gt;
&lt;h4&gt;Esperamos que tenham gostado e agradecemos a leitura!&lt;/h4&gt;🤖 ☕ 🗿 🦥 ♟️&lt;/center&gt;

</description>
      <category>deeplearning</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>braziliandevs</category>
    </item>
    <item>
      <title>Envio e recebimento de mensagens de texto dentro de imagens com Python</title>
      <dc:creator>msc2020</dc:creator>
      <pubDate>Mon, 29 Apr 2024 22:51:07 +0000</pubDate>
      <link>https://forem.com/msc2020/envio-e-recebimento-de-mensagens-de-texto-dentro-de-imagens-com-python-37pp</link>
      <guid>https://forem.com/msc2020/envio-e-recebimento-de-mensagens-de-texto-dentro-de-imagens-com-python-37pp</guid>
      <description>&lt;p&gt;O processo de enviar e receber mensagens de texto dentro de imagens faz parte da área de &lt;a href="https://pt.wikipedia.org/wiki/Esteganografia" rel="noopener noreferrer"&gt;Esteganografia&lt;/a&gt;. No post de hoje, mostramos uma forma simples de como fazer isso utilizando a linguagem Python. ☕&lt;/p&gt;




&lt;h2&gt;
  
  
  Pré-requisitos
&lt;/h2&gt;

&lt;p&gt;Para fazer este tutorial é necessário instalar a biblioteca &lt;u&gt;&lt;a href="https://pillow.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;&lt;code&gt;Pillow&lt;/code&gt; (&lt;code&gt;PIL&lt;/code&gt;)&lt;/a&gt;&lt;/u&gt; do Python 3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt; pip &lt;span class="nb"&gt;install &lt;/span&gt;pillow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Representação de um pixel RGB
&lt;/h2&gt;

&lt;p&gt;Uma imagem digital é formada por pixels, sendo essas sua menor unidade. Por sua vez, cada pixel de imagens usuais como &lt;code&gt;.JPG&lt;/code&gt;, &lt;code&gt;.PNG&lt;/code&gt;, &lt;code&gt;.JPEG&lt;/code&gt;, está associado a três valores inteiros (&lt;code&gt;int&lt;/code&gt;) que representam a quantidade de cores R (&lt;em&gt;red&lt;/em&gt;, vermelho), G (&lt;em&gt;green&lt;/em&gt;, verde) e B (&lt;em&gt;blue&lt;/em&gt;, azul). Os valores das cores R, G e B variam entre 0 e 255. A combinação desses valores irá formar uma ampla gama de cores do sistema RGB.&lt;/p&gt;

&lt;p&gt;Algumas imagens contam com uma componente extra, além dos canais de cores R, G e B, chamada A (&lt;em&gt;alpha&lt;/em&gt;) que controla a &lt;em&gt;transparência&lt;/em&gt; da imagem. Esse valor varia de 0 a 1 (ou de 0 a 255, como no &lt;code&gt;Pillow&lt;/code&gt;). Quanto mais próximo de 0, mais transparente a imagem ficará.&lt;/p&gt;

&lt;p&gt;Neste post usaremos a imagem abaixo para realizar os testes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi8tc0ogxflclcnovhisy.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi8tc0ogxflclcnovhisy.jpeg" alt="imagem de camaleão" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;small&gt;&lt;u&gt;Fonte:&lt;/u&gt; &lt;a href="https://it.wikifur.com/wiki/Camaleonte" rel="noopener noreferrer"&gt;https://it.wikifur.com/wiki/Camaleonte&lt;/a&gt;&lt;/small&gt;&lt;/small&gt;&lt;/center&gt;




&lt;h2&gt;
  
  
  Exemplo
&lt;/h2&gt;

&lt;p&gt;Vejamos um exemplo com a biblioteca &lt;code&gt;Pillow&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# exemplo.py
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;

&lt;span class="c1"&gt;# path para imagem de entrada
&lt;/span&gt;&lt;span class="n"&gt;filename_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./images/img_camaleao.jpeg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; 
&lt;span class="c1"&gt;# download da imagem: https://it.wikifur.com/wiki/Camaleonte
&lt;/span&gt;
&lt;span class="c1"&gt;# carrega imagem como um objeto Image
&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# print do formato e sistema de cores
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Formato: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Cores: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getbands&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Tamanho: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Valor das cores do pixel na posição &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s"&gt;(x, y) = (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getpixel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# exibe imagem
&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
saída esperada no terminal:

Formato: JPEG
Cores: (&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;R&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;G&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;B&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)
Tamanho: (800, 500)
Valor das cores do pixel na posição (x, y) = (400, 100): (184, 215, 95)
&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No exemplo considerado, o pixel selecionado faz parte do camaleão verde, pois a componente G tem o maior valor na tripla ordenada (R, G, B) = (184, &lt;strong&gt;215&lt;/strong&gt;, 95). Usando um &lt;a href="https://www.rapidtables.com/web/color/RGB_Color.html" rel="noopener noreferrer"&gt;site&lt;/a&gt; para facilitar na conversão, checamos que a cor do pixel escolhido:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffcagw2fl5ghe48ycrpmz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffcagw2fl5ghe48ycrpmz.png" alt="imagem da cor (184, 215, 95)" width="87" height="186"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Inserindo uma mensagem de texto em uma imagem RGB
&lt;/h2&gt;

&lt;p&gt;Assim como o &lt;code&gt;Pillow&lt;/code&gt; possui um método para coletar as informações sobre um determinado pixel de uma imagem, &lt;code&gt;getpixel()&lt;/code&gt;, também há um para inserir um pixel. Para inserir um pixel com cores (R, G, B), onde 0

&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;≤\le&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;≤&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 R, G, B 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;≤\le&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;≤&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
255 usamos o &lt;code&gt;putpixel()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;O código a seguir insere uma mensagem numa imagem com cores RGB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;insert_msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_original&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
    Input:
      . img_original: imagem com cores em RGB
      . msg: uma mensagem em forma de string
    Output:
      . img_with_msg: imagem com a mensagem introduzida em alguns pixels
    &lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;

    &lt;span class="n"&gt;img_with_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img_original&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;x_cte&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img_with_msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="c1"&gt;# codifica mensagem de string para números inteiros
&lt;/span&gt;    &lt;span class="n"&gt;msg_encoded_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;strict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;msg_encoded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;msg_encoded_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;byteorder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;little&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# insere mensagem no canal de cor R da imagem
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg_encoded&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;rgb_pixel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getpixel&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x_cte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
        &lt;span class="n"&gt;red_color&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rgb_pixel&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# (r, g, b)[0]
&lt;/span&gt;        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;red_color&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;red_color&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
                &lt;span class="n"&gt;rgb_pixel&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="n"&gt;red_color&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;                
                &lt;span class="n"&gt;rgb_pixel&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;img_with_msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putpixel&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x_cte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rgb_pixel&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;            
            &lt;span class="n"&gt;img_with_msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putpixel&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x_cte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rgb_pixel&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# adiciona pixels no fim como flag
&lt;/span&gt;    &lt;span class="n"&gt;flag_pixel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;233&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;233&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;233&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg_encoded&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;
        &lt;span class="n"&gt;rgb_pixel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getpixel&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x_cte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# tenta retorna um flag_pixel
&lt;/span&gt;            &lt;span class="n"&gt;img_with_msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putpixel&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x_cte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;flag_pixel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# se der exceção, retorna o pixel original
&lt;/span&gt;            &lt;span class="n"&gt;img_with_msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putpixel&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x_cte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;rgb_pixel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;img_with_msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg_encoded&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Estratégia usada:&lt;/strong&gt; Para inserir a mensagem na imagem, primeiramente codificamos ela como uma lista de inteiros e depois inserimos cada valor inteiro no canal de cor vermelho R, mantendo fixo um valor para coordenada &lt;code&gt;x&lt;/code&gt; (&lt;code&gt;x = x_cte&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;As inserções alteram os pixels da imagem original. Por exemplo, suponha que o valor do pixel original escolhido for, digamos, &lt;code&gt;(23, 127, 53)&lt;/code&gt; e vamos inserir o número &lt;code&gt;7&lt;/code&gt; na lista de inteiros que codificam a mensagem. Então, o pixel alterado é &lt;code&gt;(27, 127, 53)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Essa estratégia de inserir, foi definida na etapa &lt;code&gt;r = red_color % 10&lt;/code&gt;. Caso haja valores maiores do que &lt;code&gt;255&lt;/code&gt;, que é o valor máximo permitido, tratamos com o &lt;code&gt;try/except&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nota:&lt;/strong&gt; Essa forma de inserir pode não ser a mais eficiente, nem tem essa intenção. O objetivo do post é apresentar &lt;em&gt;uma&lt;/em&gt;, dentre tantas, maneira de fazer a tarefa desejada.&lt;/p&gt;

&lt;p&gt;Após inserir a mensagem codificada na imagem, finalizamos a lista de pixels alterados com uma &lt;em&gt;flag&lt;/em&gt;. A &lt;em&gt;flag&lt;/em&gt; corresponde em alterar os 5 próximos pixels, após o último que foi necessário, para o valor de &lt;code&gt;(233, 233, 233)&lt;/code&gt;. Isso nos ajudará no momento de recuperar a mensagem enviada.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Curiosidade:&lt;/strong&gt; O &lt;a href="https://en.wikipedia.org/wiki/233_(number)" rel="noopener noreferrer"&gt;número &lt;code&gt;233&lt;/code&gt;&lt;/a&gt; tem um lado cabalístico, pois é ao mesmo tempo um &lt;a href="https://en.wikipedia.org/wiki/Prime_number" rel="noopener noreferrer"&gt;número primo&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Safe_and_Sophie_Germain_primes" rel="noopener noreferrer"&gt;primo de Shopie Germain&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Ramanujan_prime" rel="noopener noreferrer"&gt;primo de Srinivasa Ramujan&lt;/a&gt; e também de &lt;a href="https://en.wikipedia.org/wiki/Fibonacci_prime" rel="noopener noreferrer"&gt;Fibonacci&lt;/a&gt;. Números com essas características são de extrema importância para área de criptografia. Por exemplo, o projeto &lt;a href="https://www.primegrid.com/" rel="noopener noreferrer"&gt;PrimeGrid&lt;/a&gt; investiga números primos como esses desde 2005.&lt;/p&gt;




&lt;h2&gt;
  
  
  Extraindo a mensagem de texto da imagem
&lt;/h2&gt;

&lt;p&gt;Para extrair a mensagem presente na imagem, usamos essa função:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;exctract_msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_with_msg&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
    Input:
      . img_with_msg: imagem com uma mensagem inserir pela função `insert_msg()`
    Output:
     . msg_decoded_str: mensagem na forma de string
    &lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;

    &lt;span class="n"&gt;img_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img_with_msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;x_cte&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="c1"&gt;# pega posição da flag
&lt;/span&gt;    &lt;span class="n"&gt;is_flag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;flag_pixel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;233&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;233&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;233&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;list_pixels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;rgb_pixel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getpixel&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x_cte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rgb_pixel&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;flag_pixel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;is_flag&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;is_flag&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;j_end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;

    &lt;span class="c1"&gt;# cria lista com os números inteiros correspondentes a mensagem codificada como inteiros
&lt;/span&gt;    &lt;span class="n"&gt;msg_encoded_int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;j_end&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;rgb_pixel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getpixel&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x_cte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
        &lt;span class="n"&gt;red_color&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rgb_pixel&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# (r, g, b)
&lt;/span&gt;        &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;red_color&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;msg_encoded_int&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;list_pixels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rgb_pixel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# decodifica, passando a lista de inteiros para uma string com a mensagem escolhida
&lt;/span&gt;    &lt;span class="n"&gt;msg_encoded_int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg_encoded_int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;msg_encoded_int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg_encoded_int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;msg_decoded_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg_encoded_int&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_bytes&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;msg_encoded_int&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bit_length&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;little&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;msg_decoded_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg_decoded_bytes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;msg_decoded_str&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Resumidamente, para extrair o texto da imagem, basta seguir o processo contrário do que foi feito em &lt;code&gt;insert_msg()&lt;/code&gt;. Ou seja, capturar a lista de números inteiros que foram inseridos nos pixels alterados e decodificar essa lista para uma string que será a mensagem original.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testes
&lt;/h2&gt;

&lt;p&gt;Para testar as funções criadas, iremos considerar a imagem do camaleão já mostrada.&lt;/p&gt;

&lt;p&gt;O seguinte código usa as duas funções, para inserção e posterior extração da mensagem. Para evitar repetir o código dessas funções, inserimos ambas, &lt;code&gt;insert_msg()&lt;/code&gt; e &lt;code&gt;exctract_msg ()&lt;/code&gt;, no &lt;code&gt;utils.py&lt;/code&gt; e importamos seu conteúdo com &lt;code&gt;from utils import *&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#insere_extrai_msg_em_imagem.py
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;

&lt;span class="c1"&gt;# caminho para imagem JPEG
&lt;/span&gt;&lt;span class="n"&gt;img_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./images/img_camaleao.jpeg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="c1"&gt;# https://it.wikifur.com/wiki/Camaleonte
&lt;/span&gt;
&lt;span class="c1"&gt;# mensagem escolhida
&lt;/span&gt;&lt;span class="n"&gt;msg_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Há mais força no perdão do que na ofensa, há mais força no reparo do que no erro. Raduan Nassar.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# carrega imagem como um objeto Image
&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# descomentar a linha abaixo para exibir imagem original
# img.show(title='Imagem original')
&lt;/span&gt;
&lt;span class="c1"&gt;# insere mensagem na imagem
&lt;/span&gt;&lt;span class="n"&gt;img_encoded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;insert_msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# descomentar para exibir imagem com a mensagem inserida
# img_encoded.show(title='Imagem com mensagem')
&lt;/span&gt;
&lt;span class="c1"&gt;# extrai mensagem de imagem
&lt;/span&gt;&lt;span class="n"&gt;msg_decoded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;exctract_msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_encoded&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg_decoded&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
saída esperada:

&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Há mais força no perdão do que na ofensa, há mais força no reparo do que no erro. Raduan Nassar.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftm5sx6mrjesglo73eubb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftm5sx6mrjesglo73eubb.png" alt="img original vs img com mensagem" width="800" height="253"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;small&gt;&lt;u&gt;Sobre a figura:&lt;/u&gt; A imagem a esquerda é a original e a direita possui a mensagem inserida. Se dermos uma 'zoom in' na imagem a direita notaremos uma sequência de pixels brancos em um pedaço da borda.&lt;/small&gt;&lt;/small&gt;&lt;/center&gt;




&lt;h2&gt;
  
  
  Outras formas de inserir e extrair mensagens em imagem
&lt;/h2&gt;

&lt;p&gt;Há maneiras mais eficientes de se inserir e extrair mensagens em imagens digitais. Uma forma muito usada é considerar a codificação binária da mensagem e dos pixels da imagem e alterar os &lt;a href="https://en.wikipedia.org/wiki/Bit_numbering" rel="noopener noreferrer"&gt;bits menos significativos da imagem (LSB)&lt;/a&gt;. Dessa maneira, as mensagens transmitidas podem ser um tanto longas e a chance das mudanças na imagem serem perceptíveis a olho nu é baixa. Um bom material sobre o assunto pode ser encontrado neste &lt;a href="https://www.vivaolinux.com.br/artigo/Esteganografia-e-Esteganalise-transmissao-e-deteccao-de-informacoes-ocultas-em-imagens-digitais/?pagina=1" rel="noopener noreferrer"&gt;link&lt;/a&gt; ou &lt;a href="https://dev.to/vapourisation/steganograhy-part-1-2j73"&gt;aqui&lt;/a&gt;.&lt;/p&gt;

&lt;center&gt;👾 👻 🦎 🐉 🧌 🖼️ 🕴️&lt;/center&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Esperamos que tenham gostado e agradecemos a leitura!&lt;/em&gt;&lt;/strong&gt; &lt;/p&gt;

</description>
      <category>python</category>
      <category>tutorial</category>
      <category>braziliandevs</category>
    </item>
    <item>
      <title>Extração de metadados de imagens com Python</title>
      <dc:creator>msc2020</dc:creator>
      <pubDate>Sun, 14 Apr 2024 14:17:46 +0000</pubDate>
      <link>https://forem.com/msc2020/extracao-de-metadados-de-imagens-com-python-2f8n</link>
      <guid>https://forem.com/msc2020/extracao-de-metadados-de-imagens-com-python-2f8n</guid>
      <description>&lt;p&gt;Neste post, falamos sobre os metadados de imagens. Mostramos como usar ferramentas na linguagem Python para extrair esse tipo de informação.&lt;/p&gt;

&lt;p&gt;Atualmente, na "Era dos Dados/IA", é comum ouvir que &lt;a href="https://medium.com/geekculture/if-data-is-petrol-a4358b3f2038" rel="noopener noreferrer"&gt;&lt;em&gt;"Dados são o novo petróleo"&lt;/em&gt;&lt;/a&gt;. Não é difícil entender esse jargão, mas fica a dúvida: "Se os dados são análogos ao petróleo, nos dias de hoje, os &lt;a href="https://en.wikipedia.org/wiki/Metadata" rel="noopener noreferrer"&gt;metadados&lt;/a&gt; seriam comparáveis ao que"?&lt;/p&gt;




&lt;h2&gt;
  
  
  Um pouco sobre imagens e metadados
&lt;/h2&gt;

&lt;p&gt;Antes de rodar o código Python que fará a extração dos metadados, vejamos rapidamente alguns pontos sobre as imagens digitais.&lt;/p&gt;

&lt;h3&gt;
  
  
  Como é formado um arquivo de imagem?
&lt;/h3&gt;

&lt;p&gt;De forma resumida, uma &lt;a href="https://en.wikipedia.org/wiki/Digital_imaging" rel="noopener noreferrer"&gt;imagem digital&lt;/a&gt; é um &lt;a href="https://en.wikipedia.org/wiki/File_format" rel="noopener noreferrer"&gt;arquivo&lt;/a&gt;, no formato &lt;code&gt;PNG&lt;/code&gt;, &lt;code&gt;JPEG&lt;/code&gt;, &lt;code&gt;GIF&lt;/code&gt;, etc., que segue as especificações que o definem. Assim como muitos arquivos digitais, no computador as imagens podem ser representadas por matrizes, onde cada entrada da matriz está associada a um &lt;a href="https://en.wikipedia.org/wiki/Pixel" rel="noopener noreferrer"&gt;pixel&lt;/a&gt;. Essa relação não é biunívoca, pois depende da configuração do computador usado para trabalhar com a imagem. Se uma imagem digital tem resolução 800 X 600 (largura X altura), então corresponde a um total de 480.000 pixels na tela do computador. Geralmente, quando se aplica alguma transformação numa imagem, estamos realizando operações com matrizes/submatrizes de pixels.&lt;/p&gt;

&lt;h3&gt;
  
  
  O que significa o prefixo &lt;em&gt;meta&lt;/em&gt;?
&lt;/h3&gt;

&lt;p&gt;Segundo o &lt;a href="https://dicionario.priberam.org/meta" rel="noopener noreferrer"&gt;dicionário online Priberam&lt;/a&gt;, o prefixo &lt;code&gt;meta&lt;/code&gt; é uma palavra que &lt;em&gt;"exprime a noção de reflexão sobre si (ex.: metalinguagem)"&lt;/em&gt;. Dessa maneira, o termo &lt;strong&gt;&lt;em&gt;metadados&lt;/em&gt;&lt;/strong&gt; significa "dados sobre algum dado". Neste post, focamos nas imagens e seus metadados. Muitas vezes o sistema operacional do computador acessa esses dados para alguma finalidade.&lt;/p&gt;

&lt;h3&gt;
  
  
  Os metadados de uma imagem são obrigatórios?
&lt;/h3&gt;

&lt;p&gt;Assim como cada imagem pode ter um formato diferente, as informações que essas imagens carregam sobre si também podem mudar. Para ilustrar, nas respectivas extensões, os formatos GIF (lançado em 1987), JPEG (introduzida em 1982) e WebP (de 2010) possuem especificações diferentes para sua formação. A palavra hexadecimal usada para identificar cada um destes é diferente. Mais detalhes podem ser encontrados neste &lt;a href="https://en.wikipedia.org/wiki/List_of_file_signatures" rel="noopener noreferrer"&gt;link&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Além disso, as imagens podem ser geradas em contextos particulares e por dispositivos com característica diferentes, como imagens de satélite, imagens médicas, imagens geofísicas, etc. Isso adiciona maior complexidade para uma padronização única dos metadados de imagens. Os metadados de imagens costumam variar ou mesmo serem excluídos. O WhatsApp serve de exemplo de um aplicativo que costuma &lt;a href="https://www.editprivacy.com/whatsapp-photo-metadata/" rel="noopener noreferrer"&gt;remover alguns metadados&lt;/a&gt; de imagens enviadas através dele.&lt;/p&gt;




&lt;h2&gt;
  
  
  Como extrair os metadados usando Python?
&lt;/h2&gt;

&lt;p&gt;Agora veremos como extrair os metadados de uma imagem na prática. Para isso usaremos a biblioteca open source chamada &lt;u&gt;&lt;a href="https://hachoir.readthedocs.io/en/latest/index.html" rel="noopener noreferrer"&gt;&lt;code&gt;Hachoir&lt;/code&gt;&lt;/a&gt;&lt;/u&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flkrvuhbkeqf3pzelky9r.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flkrvuhbkeqf3pzelky9r.jpg" alt="Mata-Atlântica" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;small&gt;Fonte: &lt;a href="www.ecolibrary.org/page/DP43"&gt;&lt;/a&gt;&lt;a href="http://www.ecolibrary.org/page/DP43" rel="noopener noreferrer"&gt;www.ecolibrary.org/page/DP43&lt;/a&gt; | Licença: &lt;a href="https://creativecommons.org/licenses/by-nc/3.0/us/" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;a href="https://creativecommons.org/licenses/by-nc/3.0/us/" rel="noopener noreferrer"&gt;https://creativecommons.org/licenses/by-nc/3.0/us/&lt;/a&gt;&lt;/small&gt;&lt;/small&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Hachoir
&lt;/h3&gt;

&lt;p&gt;A ferramenta Hachoir é uma biblioteca Python que permite extrairmos metadados de arquivos como imagens, vídeos, áudio, entre outros. Sua &lt;a href="https://hachoir.readthedocs.io/en/latest/metadata.html#metadata" rel="noopener noreferrer"&gt;documentação&lt;/a&gt; traz exemplos interessantes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instalação:&lt;/strong&gt; Para instalá-lo &lt;code&gt;Hachoir&lt;/code&gt;, basta digitar o seguinte comando num terminal com Python 3 instalado:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;hachoir
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Código usado:&lt;/strong&gt; Para realizar a extração dos metadados usaremos o script abaixo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# extrai_metadados.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hachoir.metadata&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hachoir.parser&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;hachoir.core&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;HachoirConfig&lt;/span&gt;
&lt;span class="n"&gt;HachoirConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quiet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="n"&gt;img_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mata_atlantica.jpg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="c1"&gt;# www.ecolibrary.org/page/DP43
# license: https://creativecommons.org/licenses/by-nc/3.0/us/
&lt;/span&gt;
&lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img_path&lt;/span&gt;
&lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hachoir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;hachoir_metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hachoir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extractMetadata&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hachoir_metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exportDictionary&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Metadata&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note que a variável &lt;code&gt;img_path&lt;/code&gt; é onde definimos o caminho até o arquivo de imagem que escolhemos. No caso, foi escolhido uma foto em &lt;code&gt;JPG&lt;/code&gt;, mas poderíamos ter escolhido um outro formato.&lt;/p&gt;

&lt;p&gt;Podemos rodar o script &lt;code&gt;extrai_metadados.py&lt;/code&gt; mencionado acima com:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python extrai_metadados.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Saída esperada:&lt;/strong&gt; Após rodar esse script, obtemos como resposta um dicionário Python (semelhante a um &lt;code&gt;json&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"Author"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Dan L. Perlman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Image width"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1024 pixels"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Image height"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"768 pixels"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Image orientation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Horizontal (normal)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Bits/pixel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Pixel format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YCbCr"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Image DPI width"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"93 DPI"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Image DPI height"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"93 DPI"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Creation date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2008-03-20 16:53:27"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Camera focal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"19"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Camera exposure"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1/4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Camera brightness"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"5.16"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Camera model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"FinePixS2Pro"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Camera manufacturer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"FUJIFILM"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Compression"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"JPEG (Baseline)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Copyright"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"© Dan L. Perlman 2008
http://creativecommons.org/licenses/by-nc/3.0/us/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Thumbnail size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"8471 bytes"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"ISO speed rating"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"100"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"EXIF version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0220"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Date-time original"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2004-10-23 09:47:24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Date-time digitized"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2004-10-23 09:47:24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Compressed bits per pixel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"3.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Shutter speed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Aperture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"8.5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Exposure bias"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Focal length"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"18"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Flashpix version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0100"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Focal plane width"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.86e+03"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Focal plane height"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.86e+03"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Focal length in 35mm film"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"27"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Producer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Adobe Photoshop CS3 Windows"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Comment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"JPEG quality: 91% (approximate)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Format version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"JFIF 1.02"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"MIME type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"image/jpeg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Endianness"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Big endian"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Acima temos os metadados extraídos da &lt;a href="www.ecolibrary.org/page/DP43"&gt;imagem&lt;/a&gt; que escolhemos para este teste. Parte desses itens são extraídos do EXIF, &lt;a href="https://en.wikipedia.org/wiki/Exif" rel="noopener noreferrer"&gt;uma padronização&lt;/a&gt; que especifica formatos para imagens geradas por dispositivos como câmeras, smartphones, scanner, etc. Com o EXIF é possível obter dados de data e hora, informações do dispositivo usado na geração da imagem, geolocalização, métricas sobre os pixels e os direitos autorais da imagem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observações sobre os metadados extraídos
&lt;/h3&gt;

&lt;p&gt;Para a imagem &lt;code&gt;JPG&lt;/code&gt; que usamos (&lt;code&gt;mata_atlantica.jpg&lt;/code&gt;), a resposta do script &lt;code&gt;extrai_metadados.py&lt;/code&gt; fornece informações básicas como:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"Image width"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1024 pixels"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Image height"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"768 pixels"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Image orientation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Horizontal (normal)"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Bits/pixel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"24"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Pixel format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YCbCr"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"MIME type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"image/jpeg"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Endianness"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Big endian"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Também há informações sobre o dispositivo usado para gerar essa imagem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"Camera focal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"19"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Camera exposure"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1/4"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Camera brightness"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"5.16"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Camera model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"FinePixS2Pro"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Camera manufacturer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"FUJIFILM"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Shutter speed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Aperture"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"8.5"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Exposure bias"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Focal length"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"18"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Flashpix version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0100"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Focal plane width"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.86e+03"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Focal plane height"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.86e+03"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Focal length in 35mm film"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"27"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Curiosidade:&lt;/strong&gt; Pelos metadados, a máquina usada para tirar a foto foi uma &lt;a href="https://www.dpreview.com/reviews/fujis2pro" rel="noopener noreferrer"&gt;FUJIFILM FinePix S2 Pro&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Os metadados extraídos ainda trazem a data de criação, autoria e licença de uso da imagem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"Author"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Dan L. Perlman"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Creation date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2008-03-20 16:53:27"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Date-time original"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2004-10-23 09:47:24"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Date-time digitized"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2004-10-23 09:47:24"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"Copyright"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"© Dan L. Perlman 2008
http://creativecommons.org/licenses/by-nc/3.0/us/"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Nota:&lt;/strong&gt; Embora para &lt;code&gt;mata_atlantica.jpg&lt;/code&gt; foi possível a extração de inúmeros metadados, pode ser que isso mude conforme a imagem escolhida. Dependendo da caso, os dados extraídos podem ser diferentes dos apresentados neste exemplo.&lt;/p&gt;

&lt;p&gt;Lembremos ainda que os metadados são &lt;u&gt;editáveis&lt;/u&gt;. Ou seja, uma imagem pode ter informações detalhadas, ou não, sobre sua criação. Já o objeto real, usado para gerar a imagem digital, não muda (excluindo situações particulares).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Agradecemos a leitura! Vc conhece algum caso de uso que ache interessante e envolva metadados? Compartilhe, será bem-vindo!&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;center&gt;📖 . 🤖 .. ☕ ... 🐌 .... 🦥 ..... 👾 ..... 👻💡&lt;/center&gt;

&lt;p&gt;&lt;small&gt;&lt;small&gt;&lt;small&gt;* &lt;strong&gt;A imagem de capa deste post foi gerada com um text-to-image usando a frase &lt;em&gt;"Data about data about data about data about data ad infinitum in pt-br"&lt;/em&gt;.&lt;/strong&gt;&lt;/small&gt;&lt;/small&gt;&lt;/small&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>python</category>
      <category>braziliandevs</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Bot para assistir vídeos no YouTube</title>
      <dc:creator>msc2020</dc:creator>
      <pubDate>Sun, 24 Mar 2024 23:07:52 +0000</pubDate>
      <link>https://forem.com/msc2020/bot-para-assistir-videos-no-youtube-2e58</link>
      <guid>https://forem.com/msc2020/bot-para-assistir-videos-no-youtube-2e58</guid>
      <description>&lt;p&gt;Neste post, usando ferramentas Python, desenvolvemos um &lt;a href="https://en.wikipedia.org/wiki/Internet_bot" rel="noopener noreferrer"&gt;robô (&lt;u&gt;bot&lt;/u&gt;) 🤖&lt;/a&gt; que seja capaz de assistir vídeos no Youtube de forma automatizada.&lt;/p&gt;

&lt;p&gt;Iniciamos com a instalação do Selenium e falamos um pouco sobre WebDriver. Essas ferramentas são bastante usadas em &lt;a href="https://en.wikipedia.org/wiki/Software_testing" rel="noopener noreferrer"&gt;testes de software&lt;/a&gt;. Em seguida, apresentamos o script Python que fizemos para automatizar a tarefa de ver um vídeo no Youtube, ou algo do tipo, por N vezes em M janelas abertas simultaneamente. Falamos "algo do tipo" porque de forma muito semelhante ao que será feito conseguimos criar novas tarefas de acordo com a preferência de cada um.&lt;/p&gt;

&lt;p&gt;☕&lt;/p&gt;




&lt;h2&gt;
  
  
  Instalação do Selenium
&lt;/h2&gt;

&lt;p&gt;Em um computador com Python 3, devemos instalar o &lt;a href="https://www.selenium.dev/" rel="noopener noreferrer"&gt;Selenium&lt;/a&gt;. Para isso usamos o &lt;code&gt;pip install&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;selenium
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O Selenium é uma ferramentas para automatização de tarefas na web. Serve ainda para as mais diversas tarefas de automação no contexto da famosa área conhecida como &lt;a href="https://en.wikipedia.org/wiki/Robotic_process_automation" rel="noopener noreferrer"&gt;RPA (&lt;em&gt;Robotic Process Automation&lt;/em&gt;)&lt;/a&gt;. Ele está disponível para download gratuito em &lt;a href="https://www.selenium.dev/downloads/" rel="noopener noreferrer"&gt;algumas linguagens&lt;/a&gt; como Java e Ruby. Aqui usaremos a versão para linguagem Python.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdw8zluco5k8vmiemkzi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdw8zluco5k8vmiemkzi.png" alt="homepage selenium" width="637" height="434"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Chrome Driver
&lt;/h2&gt;

&lt;p&gt;Outra ferramenta bastante usada para automatizar testes na web é o &lt;a href="https://developer.mozilla.org/en-US/docs/Web/WebDriver" rel="noopener noreferrer"&gt;WebDriver&lt;/a&gt;. Com ele conseguimos executar inúmeras tarefas para diferentes navegadores web. Aqui usaremos o navegador Chrome.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pré-requisito:&lt;/strong&gt; Um pré-requisito para instalar o WebDriver é ter uma versão do &lt;a href="https://developer.chrome.com/" rel="noopener noreferrer"&gt;Chrome&lt;/a&gt; instalado ou algo equivalente a isso. Por ex., Firefox e o Webdriver do Firefox. Clique &lt;a&gt;aqui&lt;/a&gt; para saber mais.&lt;/p&gt;

&lt;p&gt;Agora devemos criar um diretório para salvar o WebDriver dentro dele. Chamaremos o diretório criado de &lt;code&gt;bot-youtube&lt;/code&gt;. Isso pode ser feito pelo terminal, com o código abaixo.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;bot-youtube
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Download do driver:&lt;/strong&gt; Para baixar o Chrome driver (WebDriver) correspondente ao seu navegador Google Chrome veja &lt;a href="https://chromedriver.chromium.org/downloads" rel="noopener noreferrer"&gt;https://chromedriver.chromium.org/downloads&lt;/a&gt; ou &lt;a href="https://googlechromelabs.github.io/chrome-for-testing/" rel="noopener noreferrer"&gt;https://googlechromelabs.github.io/chrome-for-testing/&lt;/a&gt; (para versões mais recentes, i.e., &lt;u&gt;versão &amp;gt; 115&lt;/u&gt;). Aqui usamos, e testamos, para a versão do drive para &lt;u&gt;Linux 64 bits (123.0.6312.58&lt;/u&gt;) e mesma versão do navegador Chrome. Clique &lt;a href="https://www.google.com/intl/pt-BR/chrome/" rel="noopener noreferrer"&gt;aqui&lt;/a&gt; para baixar o Chrome. Caso use Windows ou outro sistema operacional, basta baixar o driver e navegador Chrome adequados.&lt;/p&gt;

&lt;p&gt;Estrutura dos arquivos após o download do WebDriver:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bot-youtube/
│
├── chromedriver_linux64/
│   └── chromedriver

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Nota:&lt;/strong&gt; Se usarmos o WebDriver na versão Windows, o nome padrão da pasta &lt;code&gt;chromedriver_linux64&lt;/code&gt; será outro.&lt;/p&gt;




&lt;h2&gt;
  
  
  Script Python
&lt;/h2&gt;

&lt;p&gt;Para realizar as tarefas automatizadas, de forma que o bot veja um vídeo no Youtube, usaremos o seguinte script (&lt;code&gt;bot_youtube.py&lt;/code&gt;) em Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#bot_youtube.py
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;webdriver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.common.by&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;By&lt;/span&gt;

&lt;span class="n"&gt;PATH_DRIVER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./chromedriver_linux64/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="c1"&gt;# path para o driver
&lt;/span&gt;&lt;span class="n"&gt;DEFAULT_LINK&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://www.youtube.com/watch?v=XlGLf7cWOJA&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="c1"&gt;# link para o vídeo
&lt;/span&gt;&lt;span class="n"&gt;DEFAULT_TIME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;56&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="c1"&gt;# duração do vídeo em segundos (s)
&lt;/span&gt;&lt;span class="n"&gt;DEFAULT_N&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="c1"&gt;# número de vezes que o vídeo será visto
&lt;/span&gt;&lt;span class="n"&gt;DEFAULT_N_DRIVERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="c1"&gt;# número de janelas que serão abertas para ver o vídeo
&lt;/span&gt;
&lt;span class="c1"&gt;# input - link do vídeo
&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;1. Entre com o link do vídeo no YouTube &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[Sugestão: &amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DEFAULT_LINK&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, entrevista com Cornelius Lanczos]:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# obs: o vídeo do Lanczos tem 55:51 min = ~(56 x 60)s = ~3600 s
&lt;/span&gt;
&lt;span class="c1"&gt;# checa input
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_LINK&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; URL escolhida: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# input - tempo do vídeo
&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;2. Entre com a duração do vídeo em segundos&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[Por ex.: &amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DEFAULT_TIME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, para o vídeo do pq ele tem cerca de 1h, ou seja, 3600 s]:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# checa input
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_TIME&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; Tempo de duração escolhido: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# define o número de vezes que o bot verá o vídeo
&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;3. Entre aqui com o número inteiro N correspondente ao número de vezes que o bot irá abrir a(s) janela(s) no navegador para assistir o vídeo&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[Sugestão: &amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DEFAULT_N&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# checa input
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_N&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; N escolhido: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# define o número de janelas que abrirá
&lt;/span&gt;&lt;span class="n"&gt;n_drivers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;4. Entre com o número inteiro M que corresponderá ao total de janelas que serão abertas simultaneamente para assistir o vídeo&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[ Sugestão: &amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DEFAULT_N_DRIVERS&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# checa input
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;n_drivers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;n_drivers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_N_DRIVERS&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;n_drivers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_drivers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; M escolhido: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n_drivers&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# assistindo o vídeo N vezes em M janelas abertas
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;    
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;----- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; of &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; times -----&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;time_to_refresh&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;n_drivers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;drivers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Abrindo &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n_drivers&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; web drivers . . .&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_drivers&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;drivers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;webdriver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Chrome&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;executable_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;chromedriver&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;drivers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="n"&gt;video&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;drivers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;find_element&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;By&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;movie_player&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# hits k        
&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;#drivers[i].minimize_window()
&lt;/span&gt;        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;  &amp;gt; Driver aberto [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;  &amp;gt; Assistindo os vídeos [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;time_to_refresh&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; seconds] . . .&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time_to_refresh&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Fechando os &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n_drivers&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; web drivers . . .&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_drivers&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;drivers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;  &amp;gt; Finalizado o browser [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Arquivos do projeto após a criação do &lt;code&gt;bot_youtube.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bot-youtube/
│
├── chromedriver_linux64/
│   └── chromedriver
│
└── bot_youtube.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Rodando o bot
&lt;/h3&gt;

&lt;p&gt;Para rodar o script &lt;code&gt;bot_youtube.py&lt;/code&gt; que acabamos de criar, basta digitar o seguinte comando no terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python bot_youtube.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Após rodar o script, é necessário inserirmos as informações referentes ao vídeo que desejamos que nosso bot veja. A seguir temos a sequência de etapas para usar o bot:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1.&lt;/strong&gt; Insira o link para o vídeo no Youtube.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;2.&lt;/strong&gt; Entre com a duração do vídeo em &lt;u&gt;segundos (s)&lt;/u&gt;&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;3.&lt;/strong&gt; Escolha o número de vezes que o bot irá assistir o vídeo (por ex., 2x ou 3x para testes)&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;4.&lt;/strong&gt; Defina o número de janelas que deverão abrir para assistir o vídeo ao mesmo tempo&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;5.&lt;/strong&gt; Aguarde o bot assistir os vídeos ⏳&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Próximos passos:&lt;/strong&gt; Agora que sabemos fazer um bot ver um vídeo na web, será que, com "algumas linhas de código a mais", daria para fazer o bot executar ações como pausar, adiantar ou dar &lt;em&gt;likes&lt;/em&gt; nos vídeos?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0n1l0awtlquzca6umi4f.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0n1l0awtlquzca6umi4f.gif" alt="gif-bot-rodando" width="1024" height="1024"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;⚠️ Alerta:&lt;/strong&gt; Certas tarefas de automatização devem ser usadas com cautela!!! Uma boa discussão sobre o assunto pode ser encontrada &lt;a href="https://dev.to/crawlnow/is-web-scraping-legal-all-you-need-to-know-4ale"&gt;neste post &lt;/a&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Referências
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.selenium.dev/" rel="noopener noreferrer"&gt;https://www.selenium.dev/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stackoverflow.com/questions/67456411/youtube-automation-with-python-and-selenium" rel="noopener noreferrer"&gt;https://stackoverflow.com/questions/67456411/youtube-automation-with-python-and-selenium&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>bots</category>
      <category>rpa</category>
      <category>python</category>
      <category>braziliandevs</category>
    </item>
  </channel>
</rss>
