<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Prateek Gupta</title>
    <description>The latest articles on Forem by Prateek Gupta (@prateekguptaiiitk).</description>
    <link>https://forem.com/prateekguptaiiitk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F391114%2F255f4227-5ff9-43bb-b6dd-d99539527438.jpeg</url>
      <title>Forem: Prateek Gupta</title>
      <link>https://forem.com/prateekguptaiiitk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/prateekguptaiiitk"/>
    <language>en</language>
    <item>
      <title>Octograd 2020 - Resume Filtering System</title>
      <dc:creator>Prateek Gupta</dc:creator>
      <pubDate>Thu, 21 May 2020 07:31:51 +0000</pubDate>
      <link>https://forem.com/prateekguptaiiitk/octograd-2020-resume-filtering-system-1k4k</link>
      <guid>https://forem.com/prateekguptaiiitk/octograd-2020-resume-filtering-system-1k4k</guid>
      <description>&lt;p&gt;A smart resume filtering system which shows the best matching resumes according to a given job description.&lt;/p&gt;

&lt;h2&gt;
  
  
  Link to Code
&lt;/h2&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vJ70wriM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://practicaldev-herokuapp-com.freetls.fastly.net/assets/github-logo-ba8488d21cd8ee1fee097b8410db9deaa41d0ca30b004c0c63de0a479114156f.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/prateekguptaiiitk"&gt;
        prateekguptaiiitk
      &lt;/a&gt; / &lt;a href="https://github.com/prateekguptaiiitk/Resume_Filtering"&gt;
        Resume_Filtering
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      A resume filtering based on natural language processing
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/prateekguptaiiitk/Resume_Classifier/blob/develop/SkyBits-logo-small.png"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Q_GaUiYz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/prateekguptaiiitk/Resume_Classifier/raw/develop/SkyBits-logo-small.png" height="100" width="268"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
Resume Filtering Using Machine Learning&lt;/h1&gt;
&lt;p&gt;                                                                      Resume filtering on the basis of Job Descriptions(JDs). It was a summer                                                                      internship project with &lt;a href="http://sky-bits.com/" rel="nofollow"&gt;Skybits Technologies Pvt. Ltd.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://camo.githubusercontent.com/a68f9d11eee7149055ae212b7fbeeb27326e4893/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c616e67756167652d507974686f6e332d626c75652e737667"&gt;&lt;img src="https://camo.githubusercontent.com/a68f9d11eee7149055ae212b7fbeeb27326e4893/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c616e67756167652d507974686f6e332d626c75652e737667" alt="Language"&gt;&lt;/a&gt; &lt;a href="https://github.com/prateekguptaiiitk/Resume_Filtering/blob/develop/LICENSE"&gt;&lt;img src="https://camo.githubusercontent.com/0b0d14f7fc452c9c67ff2dba2ae6b37536794840/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f7072617465656b6775707461696969746b2f526573756d655f46696c746572696e672e737667" alt="GitHub License"&gt;&lt;/a&gt; &lt;a href="http://hits.dwyl.io/prateekguptaiiitk/Resume_Filtering" rel="nofollow"&gt;&lt;img src="https://camo.githubusercontent.com/416d6f461e8926eb6a58e798206de535e2dbb6e1/687474703a2f2f686974732e6477796c2e696f2f7072617465656b6775707461696969746b2f526573756d655f46696c746572696e672e737667" alt="HitCount"&gt;&lt;/a&gt; &lt;a href="https://mybinder.org/v2/gh/prateekguptaiiitk/Resume_Filtering/develop" rel="nofollow"&gt;&lt;img src="https://camo.githubusercontent.com/483bae47a175c24dfbfc57390edd8b6982ac5fb3/68747470733a2f2f6d7962696e6465722e6f72672f62616467655f6c6f676f2e737667" alt="Binder"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
Introduction&lt;/h2&gt;
&lt;p&gt;The main feature of the current project is that it searches the entire &lt;code&gt;resume&lt;/code&gt; database to select and display the resumes which fit the best for the provided &lt;code&gt;job description(JD)&lt;/code&gt;. This is, in its current form, achieved by assigning a score to each CV by intelligently comparing them against the corresponding Job Description. This reduces the window to a fraction of an original size of applicants. Resumes in the final window can be manually checked for further analysis. The project uses techniques in &lt;code&gt;Machine Learning&lt;/code&gt; and &lt;code&gt;Natural Language Processing&lt;/code&gt; to automate the process.&lt;/p&gt;
&lt;h2&gt;
Directory Structure&lt;/h2&gt;
&lt;pre&gt;
├── Data
│   ├── CVs
│   ├── collectCV.py
│   └── jd.csv
├── Model
│   ├── Model_Training.ipynb
│   ├── Sentence_Extraction.ipynb
│   ├── paragraph_extraction_from_posts.ipynb
│   ├── sample_bitcoin.stackexchange_paras.txt
│   ├── sample_bitcoin.stackexchange_sentences.txt&lt;/pre&gt;…&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/prateekguptaiiitk/Resume_Filtering"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;h2&gt;
  
  
  Project Introduction
&lt;/h2&gt;

&lt;p&gt;This is, in its current form, achieved by assigning a score to each CV by intelligently comparing them against the corresponding Job Description. This reduces the window to a fraction of an original size of applicants. Resumes in the final window can be manually checked for further analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Mainly three datasets were required.&lt;/li&gt;
&lt;li&gt;The Word2Vec Model using the StackOverflow data dump.&lt;/li&gt;
&lt;li&gt;Extracted sections from the CVs like Education, Experience etc.&lt;/li&gt;
&lt;li&gt;Finally, the CVs were awarded scores against each Job Descriptions available.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Collection
&lt;/h2&gt;

&lt;p&gt;Mainly three datasets were required:&lt;/p&gt;

&lt;h3&gt;
  
  
  StackExchange Network Posts
&lt;/h3&gt;

&lt;p&gt;This dataset was required to trains the word2vec model. Fortunately, StackExchange network dumps it's data in xml format under Creative Commons License. One can find a download link for the dataset(44 GB) on &lt;a href="https://archive.org/details/stackexchange"&gt;Internet Archive&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resume Dataset
&lt;/h3&gt;

&lt;p&gt;This dataset was required to test the trained word2vec model. Among these resumes, best matching resumes should be filtered out. Downloaded resumes from &lt;a href="https://www.indeed.com/"&gt;indeed.com&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Job Description Dataset
&lt;/h3&gt;

&lt;p&gt;This dataset was required to test the trained word2vec model. These job descriptions would be the basis of resume filtering. A &lt;a href="https://www.kaggle.com/c/job-salary-prediction/data"&gt;Kaggle dataset&lt;/a&gt; containing Job Descriptions for several job openings was used.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources Used
&lt;/h2&gt;

&lt;p&gt;spaCy Documentation: &lt;a href="https://spacy.io/"&gt;https://spacy.io/&lt;/a&gt;&lt;br&gt;
spaCy GitHub Issue Page: &lt;a href="https://github.com/explosion/spaCy/issues"&gt;https://github.com/explosion/spaCy/issues&lt;/a&gt;&lt;br&gt;
Gensim Word2Vec Documentation: &lt;a href="http://radimrehurek.com/gensim/models/word2vec.html"&gt;http://radimrehurek.com/gensim/models/word2vec.html&lt;/a&gt;&lt;br&gt;
Gensim Word2Vec GitHub repository: link&lt;br&gt;
Google Word2Vec: &lt;a href="https://code.google.com/archive/p/word2vec/"&gt;https://code.google.com/archive/p/word2vec/&lt;/a&gt;&lt;br&gt;
GitHub Repository for Doc2Vec Illustration: &lt;a href="https://github.com/linanqiu/word2vec-sentiments"&gt;https://github.com/linanqiu/word2vec-sentiments&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Thoughts
&lt;/h2&gt;

&lt;p&gt;It was a great learning experience through this project. My learning doesn't stop here, I will be creating and contributing more in the future. However, there is definitely room for improvements, the result is satisfactory enough for the first iteration of the project.&lt;/p&gt;

&lt;p&gt;Thank you octograd2020! Cheers🍻&lt;/p&gt;

</description>
      <category>octograd2020</category>
      <category>devgrad2020</category>
      <category>datascience</category>
      <category>gratitude</category>
    </item>
  </channel>
</rss>
