<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: vinay</title>
    <description>The latest articles on Forem by vinay (@vinaybommana7).</description>
    <link>https://forem.com/vinaybommana7</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F338981%2Fc1584fef-9967-4bb2-9901-ed41f51ee9a8.jpg</url>
      <title>Forem: vinay</title>
      <link>https://forem.com/vinaybommana7</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/vinaybommana7"/>
    <language>en</language>
    <item>
      <title>Creating our own color theme in vscode</title>
      <dc:creator>vinay</dc:creator>
      <pubDate>Mon, 24 May 2021 23:21:23 +0000</pubDate>
      <link>https://forem.com/vinaybommana7/creating-our-own-color-theme-in-vscode-2b9m</link>
      <guid>https://forem.com/vinaybommana7/creating-our-own-color-theme-in-vscode-2b9m</guid>
      <description>&lt;h1&gt;
  
  
  the dilemma
&lt;/h1&gt;

&lt;p&gt;We've all been there. The need to please our eyes when you see a particular block of code. You like some nuances from a colour scheme and some things you just don't like at all. You like a colour scheme, use it for a while, but still there is that voice telling you that this can be better, your experience with writing code still needs improvement.&lt;/p&gt;

&lt;p&gt;This led me to edit out some of the colors I just don't like in some of the color themes I was using. At first I was drawn to the simplicity of &lt;a href="https://github.com/jamiewilson/predawn" rel="noopener noreferrer"&gt;Predawn&lt;/a&gt;. but the oranges didn't work for me. The minimalistic choice of colors is fine, but not good enough for me. Then I found &lt;a href="https://marketplace.visualstudio.com/items?itemName=CrazyFluff.bettermaterialthemedarkerhighcontrast" rel="noopener noreferrer"&gt;material darker with high contrast&lt;/a&gt;. but ugh the color palette is not minimalistic like predawn.&lt;br&gt;
like the italics on the theme for comments is so icky. Talking about colour palettes I know a better dark palette waiting anyway. &lt;a href="https://www.nordtheme.com" rel="noopener noreferrer"&gt;Nord&lt;/a&gt;. So one unproductive Saturday morning I forced myself to edit out the colour palette in settings(json).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46dl6gqnlqyrwj5giypl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46dl6gqnlqyrwj5giypl.png" alt="Nord Palette"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsz7p0b3if3dro0ckjt9w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsz7p0b3if3dro0ckjt9w.png" alt="Nord Palette"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;workbench.colorCustomizations:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;like the cursor colour, background, taking out the italics and stuff. Down the rabbit hole, I was reading on how to create your own color-scheme from an existing one.&lt;/p&gt;

&lt;p&gt;The Material Themes darker high contrast theme looked like a starting path for me to start tweaking, since the UI indicators and separation of high contrast colors is &lt;em&gt;good enough&lt;/em&gt; for me. enough of small talk let's get into the three easy steps of creating a color scheme&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;apply your own favourite color scheme by hitting
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cmd+shift+p &amp;gt; Preferences: Color theme &amp;gt; &amp;lt;select the theme&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;convert the existing theme to json format
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cmd+shift+p &amp;gt; Developer: Generate Color theme from Current settings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;this will create an untitled file with particular color palette and settings for every customisable UI element in vscode.&lt;/p&gt;

&lt;p&gt;tweak the colors to your liking.&lt;/p&gt;

&lt;p&gt;some of the tweaks I've made are&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;colors: &lt;span class="o"&gt;{&lt;/span&gt;
// changed all blues to &lt;span class="c"&gt;#5E81AC like&lt;/span&gt;
&lt;span class="s2"&gt;"activityBarBadge.background"&lt;/span&gt;: &lt;span class="s2"&gt;"#5E81AC"&lt;/span&gt;,
// main editor background
&lt;span class="s2"&gt;"editor.background"&lt;/span&gt;: &lt;span class="s2"&gt;"#212121"&lt;/span&gt;,
// current line number to be more focused
&lt;span class="s2"&gt;"editorLineNumber.activeForeground"&lt;/span&gt;: &lt;span class="s2"&gt;"#eeffff"&lt;/span&gt;,
// list explorer items
&lt;span class="s2"&gt;"list.highlightForeground"&lt;/span&gt;: &lt;span class="s2"&gt;"#5E81AC"&lt;/span&gt;,
// didn&lt;span class="s1"&gt;'t like the terminal cursor colour
"terminalCursor.foreground": "#5E81AC",
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;removed every unnecessary italics (like in comments)&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;convert the json to vscode color theme extension.
for this step you'll &lt;code&gt;npm&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;install yo and code-generator
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; yo generator-code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;run &lt;code&gt;yo code&lt;/code&gt; and select color theme from list of options&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxsbjlzzw878qmzr2ezo8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxsbjlzzw878qmzr2ezo8.png" alt="yo code options"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the prompt will ask you if you want to create color theme from an existing one or start afresh.
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fie0j8je4zumtg2isbfyv.png" alt="afresh"&gt;
&lt;/li&gt;
&lt;li&gt;select create a fresh color theme and give some name to it.&lt;/li&gt;
&lt;li&gt;now go to &lt;code&gt;&amp;lt;theme-name&amp;gt;/themes/&amp;lt;theme-name&amp;gt;-color-theme.json&lt;/code&gt; and replace it with the &lt;code&gt;untitled&lt;/code&gt; file you've edited before.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that your extension is ready, you may need to enable the theme and check it out.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;copy the entire folder &lt;code&gt;&amp;lt;theme-name&amp;gt;&lt;/code&gt; to &lt;code&gt;~/.vscode/extensions/&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &amp;lt;theme-name&amp;gt; ~/.vscode/extensions/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;restart the editor, and hit &lt;code&gt;cmd+shift+p &amp;gt; Preferences: color theme &amp;gt;&lt;/code&gt; you'll see your &lt;code&gt;&amp;lt;theme-name&amp;gt;&lt;/code&gt; there.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The comprehensive way to create an extension can be found &lt;a href="https://code.visualstudio.com/api/get-started/your-first-extension" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;check out my color theme at &lt;a href="https://github.com/vinaybommana/predusk" rel="noopener noreferrer"&gt;predusk&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/vinaybommana" rel="noopener noreferrer"&gt;
        vinaybommana
      &lt;/a&gt; / &lt;a href="https://github.com/vinaybommana/predusk" rel="noopener noreferrer"&gt;
        predusk
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      predawn and material high contrast theme for vscode
    &lt;/h3&gt;
  &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;I'll try to publish the extension in vscode marketplace in the future, for now the theme lives on at Github.&lt;/p&gt;

&lt;p&gt;give a ❤️ if you like this article. let's discuss down below, which theme you are currently using ✨&lt;/p&gt;

</description>
      <category>vscode</category>
    </item>
    <item>
      <title>How I store Screenshot data in my Linux work environment</title>
      <dc:creator>vinay</dc:creator>
      <pubDate>Tue, 12 May 2020 20:26:07 +0000</pubDate>
      <link>https://forem.com/vinaybommana7/how-i-store-screenshot-data-in-my-linux-work-environment-3epd</link>
      <guid>https://forem.com/vinaybommana7/how-i-store-screenshot-data-in-my-linux-work-environment-3epd</guid>
      <description>&lt;p&gt;In my work environment, Screen capture and taking screenshots is a common thing to share the completed status. Ubuntu has the feature of screen capture in three ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;PrtScr will capture the entire screen and will save it to Pictures&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Shift + PrtScn will capture part of the screen my making the cursor to a plus sign (same as cmd + shift + 4 in mac) and will the screen capture to Pictures.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ctrl + Shift + PrtScr will save the screen selection to clipboard.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;The first two options save captured images to the local Pictures folder. which is good, but image name will be like this Screenshot from 2019-10-30 06-45-37.png , after a while you lose track of dates and Pictures folder will just be a mess of Screenshot from lying around without any info.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution
&lt;/h3&gt;

&lt;p&gt;Simple bash scripting and automating it with crontab for a particular time everyday solved this. First of all, I wanted to organise all the screenshots based on date, like every screenshot will be placed in a folder based on the date it was taken.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--y5VqxaZR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3704/1%2AkFx9ttGlN_t-roQB9D_j1A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--y5VqxaZR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3704/1%2AkFx9ttGlN_t-roQB9D_j1A.png" alt="screenshot seperator"&gt;&lt;/a&gt;&lt;em&gt;screenshot seperator&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I’ve placed this in crontab to run every 12 hours.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7-SNly5H--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2238/1%2A4fhF05WI5yO4YkORrPv7nA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7-SNly5H--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2238/1%2A4fhF05WI5yO4YkORrPv7nA.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Screenshots taken everyday will be a new folder and all the images of the particular date will be transferred to the respective folder. This leads to large number of folders every month. For minimalism, we can just organise these folders further to that &lt;em&gt;particular&lt;/em&gt; month, compress them to a tar file and store them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0BBRm0rb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/4096/1%2ARX2djJzjZMrRh02PuJRl_g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0BBRm0rb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/4096/1%2ARX2djJzjZMrRh02PuJRl_g.png" alt="compress old screenshots"&gt;&lt;/a&gt;&lt;em&gt;compress old screenshots&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This small snippets searches for Screenshots according to month, creates a folder, compresses the folder after moving all the dates of the month to that particular folder.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mf-A8UbK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2ApYDSlLpRSfFHmaefGgpNWw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mf-A8UbK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2ApYDSlLpRSfFHmaefGgpNWw.png" alt=""&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/vinaybommana/bashNotes"&gt;&lt;strong&gt;vinaybommana/bashNotes&lt;/strong&gt;&lt;br&gt;
github.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>linux</category>
      <category>bash</category>
    </item>
    <item>
      <title>Reading Manga with Python</title>
      <dc:creator>vinay</dc:creator>
      <pubDate>Tue, 12 May 2020 20:21:39 +0000</pubDate>
      <link>https://forem.com/vinaybommana7/reading-manga-with-python-c15</link>
      <guid>https://forem.com/vinaybommana7/reading-manga-with-python-c15</guid>
      <description>&lt;p&gt;Photo by Miika Laaksonen on Unsplash&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Manga ?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Manga&lt;/strong&gt; (漫画, &lt;em&gt;manga&lt;/em&gt;) are &lt;a href="https://en.wikipedia.org/wiki/Comics"&gt;comics&lt;/a&gt; or &lt;a href="https://en.wikipedia.org/wiki/Graphic_novel"&gt;graphic novels&lt;/a&gt; created in &lt;a href="https://en.wikipedia.org/wiki/Japan"&gt;Japan&lt;/a&gt; or using the &lt;a href="https://en.wikipedia.org/wiki/Japanese_language"&gt;Japanese language&lt;/a&gt; and conforming to a style developed in Japan in the late 19th century. They have a long and complex pre-history in earlier &lt;a href="https://en.wikipedia.org/wiki/Japanese_art"&gt;Japanese art&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;let’s say manga is Japanese comics which are more popular and interesting than most of the main stream comics.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scouting
&lt;/h3&gt;

&lt;p&gt;Let’s learn some WebScraping and get some value instead of just getting data, let us download some manga from internet and try to read it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Reading manga online is easy, you just go to some site like mangapanda.com search some comics and read it. what if you want to download the entire comic compress each chapter to a particular volume and read it offline.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;when we go to mangapanda.com and search for a particular comic like say naruto here’s what the URL we are directed to&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--C6n-igl1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AuVD-rmR0l2HYjPy-qA8qLg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--C6n-igl1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AuVD-rmR0l2HYjPy-qA8qLg.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice the naruto at the end of the URL, now if we go to the first chapter of naruto the URL transforms to &lt;a href="http://www.mangapanda.com/naruto/1"&gt;http://www.mangapanda.com/naruto/1&lt;/a&gt; that’s just great for us. Note that this doesn’t happen with all the manga sites out there and watch out for that before trying to scrape any other manga site. we are trying to download the images that exists in naruto chapter 1&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Mqd-bavR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/4096/1%2AyyVS6eGaKVRkm03KYZiePQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Mqd-bavR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/4096/1%2AyyVS6eGaKVRkm03KYZiePQ.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s write a small function to get the image from the URL&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kg9VCoVe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2556/1%2AC9T0jR6SuRLjC73qj4qSDg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kg9VCoVe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2556/1%2AC9T0jR6SuRLjC73qj4qSDg.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;OK, what is happening here. for the _download_image we are giving URL say mangapanda.com/naruto/1/3 according to our observation we are downloading naruto’s chapter 1 image 3 . let’s breakdown the function and understand what’s going on for each line.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;requests.get download the source of the given URL&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;convert the source code html document into lxml html tree this helps us to parse tags easily&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;get the tags with img with id=’img’ the expression, ensures that.&lt;/p&gt;

&lt;p&gt;".//img[&lt;a class="mentioned-user" href="https://dev.to/id"&gt;@id&lt;/a&gt;
='img']/@src"&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;after we get the image URL download the image with requests.get(URL).content&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Downloading the entire chapter
&lt;/h3&gt;

&lt;p&gt;It’s good that the chapters are in the format /chapter/page_number so how can we download all the images of a particular chapter if we don’t know the ending chapter number. if we know the ending chapter then we can simply using range and loop over the image number to download.&lt;/p&gt;

&lt;p&gt;if we see the source code there is this interesting tag.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vU0RDpMP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3804/1%2AUnxK9n429w-hB4ydJ3o0Gw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vU0RDpMP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3804/1%2AUnxK9n429w-hB4ydJ3o0Gw.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There wrote this so that users can select the page number in the form of a dropdown. we can use the lxml format tree for this .//*[&lt;a class="mentioned-user" href="https://dev.to/id"&gt;@id&lt;/a&gt;
=’pageMenu’]/option[last()]/text() and get the last occurence of the pageMenu id which is the end page of the chapter.&lt;/p&gt;

&lt;p&gt;let’s write wrap this up in a small function&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dafjys7c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3400/1%2AESbc2hiRJV8Ig91o86kA-A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dafjys7c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3400/1%2AESbc2hiRJV8Ig91o86kA-A.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;now, we know the page numbers of the chapters we are going to download. we can just get all the images from the chapter in parallel, sort them and then compress them to make a single volume.&lt;/p&gt;

&lt;p&gt;let’s use ThreadPoolExecutor and write an async function for the following job.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---q7lDCWP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3940/1%2AFJTHvAc0YoJAg_JaGJYQ9A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---q7lDCWP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3940/1%2AFJTHvAc0YoJAg_JaGJYQ9A.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;properties = json.load(open("configs.json"))

base_url = properties.get("base_url") + "/" + properties.get("manga_name")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;we can define manga_name and base_url in configs.json so that we don’t have to give name of the manga every time we download a chapter.&lt;/p&gt;

&lt;p&gt;download_chapter function creates directories based on the manga_name and chapter number&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;➜  naruto git:(master) ✗ tree
.
└── 1
    ├── 1.jpg
    ├── 10.jpg
    ├── 11.jpg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now that we’ve downloaded all the pages in the chapter. let’s compress it in CBZ format and ensure that the order of the page numbers is sorted properly&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yM36szLd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3332/1%2AL2SVuq1n6wVmb_1e97G07w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yM36szLd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3332/1%2AL2SVuq1n6wVmb_1e97G07w.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;we can wrap everything up with a classic &lt;strong&gt;main&lt;/strong&gt; so that if we give chapter number we will download the entire comic&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GJXtpVMI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3432/1%2AfS8K4iF2sRm4X9XGSddJsA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GJXtpVMI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3432/1%2AfS8K4iF2sRm4X9XGSddJsA.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  In action
&lt;/h3&gt;

&lt;p&gt;we can run the script in the following way&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--M0QVxTHy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2120/1%2AgamhdHl-KANkKYZTwIm-iA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--M0QVxTHy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2120/1%2AgamhdHl-KANkKYZTwIm-iA.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Disclaimer: this is for pure educational purpose only. Do not use this commercially for piracy or for attacking mangapanda.com&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>python</category>
      <category>manga</category>
    </item>
    <item>
      <title>Django + MySQL, How to port your web application from SQLite to MySQL</title>
      <dc:creator>vinay</dc:creator>
      <pubDate>Tue, 12 May 2020 20:16:29 +0000</pubDate>
      <link>https://forem.com/vinaybommana7/django-mysql-how-to-port-your-web-application-from-sqlite-to-mysql-3jnl</link>
      <guid>https://forem.com/vinaybommana7/django-mysql-how-to-port-your-web-application-from-sqlite-to-mysql-3jnl</guid>
      <description>&lt;h2&gt;
  
  
  Django’s Object Relational Mapping Pattern
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A model is the single, definitive source of data about your data. It contains the essential fields and behaviors of the data you’re storing. Generally, each model maps to a single database table.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;we’ve learnt from the official docs that models.py in Django’s folder structure in your web application is the source of data. it contains everything you want to store in your database. we generally define tables, pre and post save methods etc., in models&lt;/p&gt;

&lt;p&gt;we want a table with the following requirements, table should contains name, count, and timestamp of the metric. we’ll create a table in the following way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vAiCD0_2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2388/1%2A4X4FSVMziFu_i16k6xLHUw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vAiCD0_2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2388/1%2A4X4FSVMziFu_i16k6xLHUw.png" alt="SimpleMetricTable"&gt;&lt;/a&gt;&lt;em&gt;SimpleMetricTable&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Elephant in the Room
&lt;/h3&gt;

&lt;p&gt;After few months we’ve realised that the sqlite database which is the default when creating the project is not scaling. when you have multiple sources which create and modify your DataObjects from your SimpleMetricTable we need to move on.&lt;/p&gt;

&lt;h3&gt;
  
  
  MySQL to the rescue
&lt;/h3&gt;

&lt;p&gt;django.db.backends.sqlite3' this is how we tell django to use sqlite as backend db. we’ll configure mysql database backend first and then tell django to use django.db.backends.mysql.&lt;/p&gt;

&lt;p&gt;the proper configuration would in the following form&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yiwq0ZU4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3604/1%2ACVVLA7UYMLHL9-KSdIEm9A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yiwq0ZU4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3604/1%2ACVVLA7UYMLHL9-KSdIEm9A.png" alt="this can be done in settings.py"&gt;&lt;/a&gt;&lt;em&gt;this can be done in settings.py&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;login to your mysql database and create a database like metrics using CREATE DATABASE metrics; we’ll define this database in mysql.conf&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Xo7NMwUp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2456/1%2AUrFf-w3Ybl5OBpktcX9pmQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Xo7NMwUp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2456/1%2AUrFf-w3Ybl5OBpktcX9pmQ.png" alt="polls/configs/mysql.conf"&gt;&lt;/a&gt;&lt;em&gt;polls/configs/mysql.conf&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But hey 👋🏻 what about the data I’ve collected so far ? how do we port the old data from sqlite.db to our new shiny MySQL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dumping db to JSON in Django
&lt;/h3&gt;

&lt;p&gt;simply run the following command&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CW_l8E3L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2052/1%2A6_MBpEqs0Go1gWPwcsZAVw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CW_l8E3L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2052/1%2A6_MBpEqs0Go1gWPwcsZAVw.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that this should be done before changing the database from sqlite to MySQL in settings.py , after you’ve changed the database from sqlite we can simply run&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vYUcVa6t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AKWuhVT7TGeLiGXxlpjPCNg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vYUcVa6t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AKWuhVT7TGeLiGXxlpjPCNg.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After a bunch of IntegrityErrors and a couple of google searches using flags like --exclude auth.permission --exclude contenttypes while dumping data we’ve successfully ported our application to MySQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;you’ve patted yourself in the back making a good day at work, started packing up for the day while watching your api output in the log. 👁 your application starts throwing 500 from some of the write requests. you check the code and everything looks fine. some of the requests are fine but those 500 makes you sit again.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Somewhere at the back of your mind there is an itch that this is due to the MySQL change you’ve done today. you start checking what type of data is being stored and written into your database. a thousand google searches follow.&lt;/p&gt;

&lt;p&gt;Then you realise the mistake, the SimpleMetricTable you’ve ported from sqlite to MySQL has latin character sets and is not accepting utf-8 in MySQL&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT CCSA.character_set_name FROM information_schema.`TABLES` T,
       information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
  AND T.table_schema = "metrics"
  AND T.table_name = "metrics";

ALTER TABLE metrics CONVERT TO CHARACTER SET utf8;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;and you are good to go.&lt;/p&gt;

&lt;p&gt;footnotes:&lt;br&gt;
this was actually written a while ago in medium&lt;br&gt;
&lt;a href="https://medium.com/@vinaybommana7/django-mysql-how-to-port-your-web-application-from-sqlite-to-mysql-f7487428a0d0"&gt;medium link&lt;/a&gt;&lt;/p&gt;

</description>
      <category>django</category>
      <category>mysql</category>
      <category>orm</category>
    </item>
    <item>
      <title>Analyzing Twitter data with Python: Part 1</title>
      <dc:creator>vinay</dc:creator>
      <pubDate>Wed, 26 Feb 2020 14:19:15 +0000</pubDate>
      <link>https://forem.com/vinaybommana7/analyzing-twitter-data-with-python-part-1-hg</link>
      <guid>https://forem.com/vinaybommana7/analyzing-twitter-data-with-python-part-1-hg</guid>
      <description>&lt;h1&gt;
  
  
  The Question
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;What if we want to understand the impact of the tweet by a user on particular topic&lt;/strong&gt;. let's say a user tweeted about a particular product like shoe laces on twitter, how likely are his followers going to buy that product based on his tweet.&lt;/p&gt;

&lt;p&gt;let's analyze this scenario using machine learning by constructing a simple model. we'll get data from twitter directly and try to filter and clean the data to train our model. let's see how much can we learn from this.&lt;/p&gt;

&lt;p&gt;We'll break down the entire process into the following steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In &lt;code&gt;Part 1&lt;/code&gt; we'll focus on gathering and cleaning the data,&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Understanding the Flow
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Gathering Data&lt;/li&gt;
&lt;li&gt;Cleaning Data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Gathering Data &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The main aspect of analyzing twitter data is to &lt;em&gt;get the data&lt;/em&gt;. How can we get twitter data in large amount, like 10 million tweets on a particular topic.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;we can access twitter data from &lt;a href="https://developer.twitter.com/"&gt;Twitter's Developer&lt;/a&gt; access token authorization.&lt;/li&gt;
&lt;li&gt;we can scrape twitter directly and get the data.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Accessing from twitter's developer access token
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xJkKfCa3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/a8h6oge62lbr6dt4ekop.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xJkKfCa3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/a8h6oge62lbr6dt4ekop.png" alt="Twitter Developer Preview"&gt;&lt;/a&gt;&lt;br&gt;
you can simply &lt;a href="https://developer.twitter.com/en/application/use-case"&gt;apply&lt;/a&gt; and get access token, which is useful for getting tweets using twitter api. we can use &lt;a href="https://github.com/tweepy/tweepy"&gt;tweepy&lt;/a&gt; for that.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--iGquDPGy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/q64to5l2i6nzunumlp8f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--iGquDPGy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/q64to5l2i6nzunumlp8f.png" alt="Twitter API request"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h5&gt;
  
  
  The Problem
&lt;/h5&gt;

&lt;p&gt;The problem with using tweepy and twitter's api is, there is a rate limit of number of twitter calls from a particular user per hour. if we want large amount of data like a 10 million tweets this will take forever. Searching through tweets between a particular period was not effective while using twitter's api for me. Under these circumstances I've decided to scrape the twitter's data using an amazing library in python called &lt;a href="https://github.com/taspinar/twitterscraper"&gt;twitterscraper&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;
  
  
  Scraping Twitter directly
&lt;/h4&gt;

&lt;p&gt;let's install &lt;code&gt;twitterscraper&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--fhGSOcEo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/s6f40as4odtmw5cn67kq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fhGSOcEo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/s6f40as4odtmw5cn67kq.png" alt="twitterscraper"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The best thing about twitterscraper is we can give the topic name, period and limit of tweets and the output format in which the tweets are to be obtained.&lt;/p&gt;

&lt;p&gt;for the sake of understanding let's download 1000 tweets and try to clean them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# twitterscraper &amp;lt;topic&amp;gt; --limit &amp;lt;count&amp;gt; --lang &amp;lt;en&amp;gt; --output filename.json&lt;/span&gt;
twitterscraper python &lt;span class="nt"&gt;--limit&lt;/span&gt; 1000 &lt;span class="nt"&gt;--lang&lt;/span&gt; en &lt;span class="nt"&gt;--output&lt;/span&gt; ~/backups/today&lt;span class="se"&gt;\'&lt;/span&gt;stweets.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the output format from the &lt;code&gt;twitterscraper&lt;/code&gt; is in the form of &lt;code&gt;json&lt;/code&gt;. let's try to convert the data we've obtained into a &lt;code&gt;dataframe&lt;/code&gt; and clean it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cleaning Data &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  loading the downloaded &lt;code&gt;json&lt;/code&gt; to a &lt;code&gt;pandas dataframe&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;codecs&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chained_assignment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="c1"&gt;# this enables us for rewriting dataframe to previous variable
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;

&lt;span class="n"&gt;json_twitter_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&amp;lt;path to json file&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;json_twitter_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yFyxxHGI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/zrmn42a3y351fvmy0f35.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yFyxxHGI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/zrmn42a3y351fvmy0f35.png" alt="output-1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;let's clean the data now, from the &lt;code&gt;head()&lt;/code&gt; we can eliminate &lt;code&gt;url&lt;/code&gt;, &lt;code&gt;html&lt;/code&gt; and &lt;code&gt;replies&lt;/code&gt; and also &lt;code&gt;likes&lt;/code&gt; for now. we'll get back to &lt;code&gt;likes&lt;/code&gt; afterwards.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# dropping html, url, likes and replies
&lt;/span&gt;&lt;span class="n"&gt;json_twitter_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'html'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'url'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'likes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'replies'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We need to add &lt;code&gt;user&lt;/code&gt; and &lt;code&gt;fullname&lt;/code&gt; columns. and get &lt;code&gt;user_ids&lt;/code&gt; of the user.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# renaming column names
&lt;/span&gt;&lt;span class="n"&gt;json_twitter_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'fullname'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'Tweet_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'retweets'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'Tweet'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'Date'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;twitter_data_backup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json_twitter_data&lt;/span&gt;
&lt;span class="n"&gt;json_twitter_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6kMz0A77--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gkxjrmcr3zt4orbqrhj8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6kMz0A77--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gkxjrmcr3zt4orbqrhj8.png" alt="output-2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Note the &lt;code&gt;retweet&lt;/code&gt; column in the &lt;code&gt;dataframe&lt;/code&gt;
we can assume that the post having retweets will have larger impact on the users. so let's filter the tweets with tweets more than &lt;code&gt;zero&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;json_twitter_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json_twitter_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;json_twitter_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retweets&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;json_twitter_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hGj6MTJ2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/eqyocqqyibwnxzu19x64.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hGj6MTJ2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/eqyocqqyibwnxzu19x64.png" alt="output-3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;in the data we can have one user tweeting multiple tweets, we need to seperate users based on the tweet count.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# first remove  date column
&lt;/span&gt;&lt;span class="n"&gt;twitter_data_with_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json_twitter_data&lt;/span&gt;
&lt;span class="n"&gt;json_twitter_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Date'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'Tweet'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;json_twitter_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xMC8b0Ax--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/cacl4xmec0o6t0npruj3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xMC8b0Ax--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/cacl4xmec0o6t0npruj3.png" alt="output-4"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;now group the dataframe based on &lt;code&gt;users&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# rather than dropping duplicated we can `groupby` in pandas
# twitter_data.duplicated(subset='user', keep='first').sum()
&lt;/span&gt;&lt;span class="n"&gt;tweet_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;twitter_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;twitter_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="n"&gt;as_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# tweet_count['mastercodeonlin']
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tweet_count&lt;/code&gt; is simply a dictionary and we can access now, the tweets count of a particular user&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--250N94CZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/xx1ojyf5iel3p27e0mqg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--250N94CZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/xx1ojyf5iel3p27e0mqg.png" alt="code-tweet-count"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we can add the no of tweets column to the dataframe
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;json_twitter_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'no_of_tweets'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json_twitter_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;get_tweet_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;twitter_data_without_tweet_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json_twitter_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop_duplicates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keep&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"first"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;twitter_data_without_tweet_count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;twitter_data_without_tweet_count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xitCHjsa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/4798584cqwbtmf5711fe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xitCHjsa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/4798584cqwbtmf5711fe.png" alt="output-5"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the next part we'll focus on getting the user_ids of particular user, and analyzing the dataframe by converting it into numerical format.&lt;/p&gt;

&lt;p&gt;Stay tuned, we'll have some fun...&lt;/p&gt;

</description>
      <category>python</category>
      <category>twitter</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
