Forem: Adityaberi

spaCy and grammar go hand in hand!!! What do you think of it??

Adityaberi — Mon, 11 May 2020 11:20:03 +0000

Wow I was studying spaCy and found out how brilliantly it understands the difference between read in two differnt tenses

doc = nlp(u'I read books on NLP.')
r = doc[1]

print(f'{r.text:{10}} {r.pos_:{8}} {r.tag_:{6}} {spacy.explain(r.tag_)}')

Output-read VERB VBP verb, non-3rd person singular present

doc = nlp(u'I read a book on NLP.')
r = doc[1]

print(f'{r.text:{10}} {r.pos_:{8}} {r.tag_:{6}} {spacy.explain(r.tag_)}')

Output-read VERB VBD verb, past tense

In the first example, with no other cues to work from, spaCy assumed that read was present tense.

In the second example the present tense form would be I am reading a book, so spaCy assigned the past tense.

Wow that is something incredible I guess!!

What is your story of spaCy and what brilliance did you experience do let me know????

Python's Sum or NumPy's np.sum() ???I found a big difference in time!!

Adityaberi — Mon, 04 May 2020 14:33:28 +0000

According to me
Use python's methods (sum()) on python datatypes and use NumPy's methods on NumPy arrays (np.sum()).

massive_array=np.random.random(100000)
massive_array.size

100000

massive_array

array([0.81947279, 0.24254041, 0.76437261, ..., 0.15969415, 0.34502387,
0.15858268])

%timeit sum(massive_array) #Python sum
%timeit np.sum(massive_array) #Numpy sum

16 ms ± 494 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
50.6 µs ± 4.83 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

That's a massive difference!!

What do you guys think???

PRINCIPAL COMPONENT ANALYSIS - WITH HANDS ON MINI PROJECT !!

Adityaberi — Thu, 30 Apr 2020 14:36:17 +0000

Principal Component Analysis is an unsupervised statistical technique used to examine interrelations among a set of variables in order to identify the underlying structure of those variables.
It is a linear dimensionality reduction technique that can be utilized for extracting information from a high-dimensional space by projecting it into a lower-dimensional sub-space. It tries to preserve the essential parts that have more variation of the data and remove the non-essential parts with fewer variation

It is sometimes also known as general factor analysis.

As Regression which I have explained in previous article determines a line of best fit to a data set , factor analysis determines several orthogonal lines of best fit to the data set.

Orthogonal means at "right angles" i.e the lines are perpendicular to each other in any 'n' dimensional space.

Here we have some data plotted along two features x and y

We can add an orthogonal line. Components are a linear transformation that chooses a variable system for a data set such that the greatest variance of the data set comes to lie on the first axis

The second greatest variance on the second axis and so on ...
This process allows us to reduce the number of variables used in an analysis.
Also keep in mind that the components are uncorrelated , since in the sample space they are orthogonal to each other.

We can continue this analysis into higher dimensions

WHERE CAN WE APPLY PCA ???

Data Visualization: When working on any data related problem, the challenge in today's world is the sheer volume of data, and the variables/features that define that data. To solve a problem where data is the key, you need extensive data exploration like finding out how the variables are correlated or understanding the distribution of a few variables. Considering that there are a large number of variables or dimensions along which the data is distributed, visualization can be a challenge and almost impossible.
Hence, PCA can do that for you since it projects the data into a lower dimension, thereby allowing you to visualize the data in a 2D or 3D space with a naked eye.

Speeding Machine Learning (ML) Algorithm: Since PCA's main idea is dimensionality reduction, you can leverage that to speed up your machine learning algorithm's training and testing time considering your data has a lot of features, and the ML algorithm's learning is too slow.

Let us walk through the cancer data set and apply PCA

Breast Cancer
The Breast Cancer data set is a real-valued multivariate data that consists of two classes, where each class signifies whether a patient has breast cancer or not. The two categories are: malignant and benign.

The malignant class has 212 samples, whereas the benign class has 357 samples.

It has 30 features shared across all classes: radius, texture, perimeter, area, smoothness, fractal dimension, etc.

You can download the breast cancer dataset from
here
, or rather an easy way is by loading it with the help of the sklearn library.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline

from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
cancer.keys()

df = pd.DataFrame(cancer['data'],columns=cancer['feature_names'])
#(['DESCR', 'data', 'feature_names', 'target_names', 'target'])

PCA Visualization
As we've noticed before it is difficult to visualize high dimensional data, we can use PCA to find the first two principal components, and visualize the data in this new, two-dimensional space, with a single scatter-plot. Before we do this though, we'll need to scale our data so that each feature has a single unit variance.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(df)

scaled_data = scaler.transform(df)

PCA with Scikit Learn uses a very similar process to other preprocessing functions that come with SciKit Learn. We instantiate a PCA object, find the principal components using the fit method, then apply the rotation and dimensionality reduction by calling transform().

We can also specify how many components we want to keep when creating the PCA object.

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(scaled_data)

Now we can transform this data to its first 2 principal components.

x_pca = pca.transform(scaled_data)

scaled_data.shape
x_pca.shape

Great! We've reduced 30 dimensions to just 2! Let's plot these two dimensions out!

plt.figure(figsize=(8,6))
plt.scatter(x_pca[:,0],x_pca[:,1],c=cancer['target'],cmap='plasma')
plt.xlabel('First principal component')
plt.ylabel('Second Principal Component')

Clearly by using these two components we can easily separate these two classes.

This was just a small sneak peek into what PCA is . I hope you got an idea as how it works .
Feel free to respond below for any doubts and clarifications !!
I will soon be publishing this article on Geeksforgeeks

Logistic Regression!!!!

Adityaberi — Thu, 23 Apr 2020 19:01:16 +0000

We're going to start with the definition of regression and then we'll parse out what logistic regression is. So as a general term, regression is a statistical process for estimating the relationships among variables. This is often used to make a prediction about some outcome. Linear regression is one type of regression that is used when you have a continuous target variable. For instance in this case where we're trying to predict the number of umbrellas sold by the amount of rainfall and the algorithm definition for linear regression is y = mx + b. So that's linear regression. But let's come back to logistic regression.

Logistic regression is a form of regression where the target variable or the thing you're trying to predict is binary. So just zero or one, or true or false, or anything like that. why do we need two different algorithms for regression? Why won't linear regression work for a binary target variable? So imagine a plot where we're just using one x feature along the x axis to predict a binary y outcome. If we use linear regression for a binary target like this, with a best fit line that makes any sense. Linear regression will try to fit a line that fits all of the data and it will end up predicting negative values and values over one, which is impossible.

Logistic regression is built off of a logistic or sigmoid curve which looks like this S shape here that you see below. . This will always be between zero and one, and it makes it a much better fit for a binary classification problem. So we saw the equation that represents What does the equation look like for logistic regression? What does the equation look like for logistic regression? Basically, it just takes the linear regression algorithm for a line mx + b. And it tucks it up as a negative exponent for e. So our full equation is 1 over 1 + e to the negative mx + b. And that's what creates this nice sigmoid S curve that makes it a good fit for binary classification problems

So when to use Logistic Regression and when not to!!
WHEN TO USE
Binary target variable
Transparency is important or interested in significance of predictors.
Fairly well-behaved data
Need a quick initial benchmark..

WHEN NOT TO USE
Continuous target variable
Massive data
Performance is the only thing that matters

Let's make the Logistic Regression model, predicting whether a user will purchase the product or not.

Inputing Libraries

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
dataset = pd.read_csv('...\\User_Data.csv')

Now, to predict whether a user will purchase the product or not, one needs to find out the relationship between Age and Estimated Salary. Here User ID and Gender are not important factors for finding out this.

 input 
x = dataset.iloc[:, [2, 3]].values 

 output 
y = dataset.iloc[:, 4].values

Splitting the dataset to train and test. 75% of data is used for training the model and 25% of it is used to test the performance of our model.

from sklearn.cross_validation import train_test_split 
xtrain, xtest, ytrain, ytest = train_test_split( 
x, y, test_size = 0.25, random_state = 0)

Now, it is very important to perform feature scaling here because Age and Estimated Salary values lie in different ranges. If we don't scale the features then Estimated Salary feature will dominate Age feature when the model finds the nearest neighbor to a data point in data space.

from sklearn.preprocessing import StandardScaler 
sc_x = StandardScaler() 
xtrain = sc_x.fit_transform(xtrain) 
xtest = sc_x.transform(xtest) 

print (xtrain[0:10, :])


Output :
[[ 0.58164944 -0.88670699]
 [-0.60673761  1.46173768]
 [-0.01254409 -0.5677824 ]
 [-0.60673761  1.89663484]
 [ 1.37390747 -1.40858358]
 [ 1.47293972  0.99784738]
 [ 0.08648817 -0.79972756]
 [-0.01254409 -0.24885782]
 [-0.21060859 -0.5677824 ]
 [-0.21060859 -0.19087153]]

Here once see that Age and Estimated salary features values are sacled and now there in the -1 to 1. Hence, each feature will contribute equally in decision making i.e. finalizing the hypothesis.
Finally, we are training our Logistic Regression model.

from sklearn.linear_model import LogisticRegression 
classifier = LogisticRegression(random_state = 0) 
classifier.fit(xtrain, ytrain)

After training the model, it time to use it to do prediction on testing data.

y_pred = classifier.predict(xtest)
Let's test the performance of our model - Confusion Matrix
from sklearn.metrics import confusion_matrix 
cm = confusion_matrix(ytest, y_pred) 

print ("Confusion Matrix : \n", cm)
Confusion Matrix : 
 [[65  3]
 [ 8 24]]

Out of 100 :
TruePostive + TrueNegative = 65 + 24
FalsePositive + FalseNegative = 3 + 8
Performance measure - Accuracy

from sklearn.metrics import accuracy_score 
print ("Accuracy : ", accuracy_score(ytest, y_pred))




Accuracy :  0.89

I will soon be publishing this article on Geeksforgeeks

Understanding NLP

Adityaberi — Wed, 22 Apr 2020 01:28:32 +0000

Not sure how NLP works?
Read this blog to get clear with basic definitions related to NLP while also working on a mini-project.

Let’s begin with basic definitions:
Text corpus or corpora
The language data that all NLP tasks depend upon is called the text corpus or simply corpus. A corpus is a large set of text data that can be in one of the languages like English, French, and so on. The corpus can consist of a single document or a bunch of documents. The source of the text corpus can be social network sites like Twitter, blog sites, open discussion forums like Stack Overflow, books, and several others. In some of the tasks like machine translation, we would require a multilingual corpus. For example we might need both the English and French translations of the same document content for developing a machine translation model. For speech tasks, we would also need human voice recordings and the corresponding transcribed corpus.

Paragraph
A paragraph is the largest unit of text handled by an NLP task. Paragraph level boundaries by itself may not be much use unless broken down into sentences. Though sometimes the paragraph may be considered as context boundaries. Tokenizers that can split a document into paragraphs are available in some of the Python libraries.

Sentences
Sentences are the next level of lexical unit of language data. A sentence encapsulates a complete meaning or thought and context. It is usually extracted from a paragraph based on boundaries determined by punctuations like period. The sentence may also convey opinion or sentiment expressed in it. In general, sentences consists of parts of speech (POS) entities like nouns, verbs, adjectives, and so on. There are tokenizers available to split paragraphs to sentences based on punctuations.

Phrases and words
Phrases are a group of consecutive words within a sentence that can convey a specific meaning. For example, in the sentence Tomorrow is going to be a rainy day the part going to be a rainy day expresses a specific thought. Some of the NLP tasks extract key phrases from sentences for search and retrieval applications. The next smallest unit of text is the word. The common tokenizers split sentences into text based on punctuations like spaces and comma. One of the problems with NLP is ambiguity in the meaning of same words used in different context. We will later see how this is handled well when we discuss word embeddings.

N-grams
A sequence of characters or words forms an N-gram. For example, character unigram consists of a single character, a bigram consists of a sequence of two characters and so on. Similarly word N-grams consists of a sequence of n words. In NLP, N-grams are used as features for tasks like text classification.

Bag-of-words
Bag-of-words in contrast to N-grams does not consider word order or sequence. It captures the word occurrence frequencies in the text corpus. Bag-of-words is also used as features in tasks like sentiment analysis and topic identification.

Ready for a mini-project?
We will use the Yelp Review Data Set from Kaggle.
Each observation in this dataset is a review of a particular business by a particular user.

The “stars” column is the number of stars (1 through 5) assigned by the reviewer to the business. Higher number of stars is better. In other words, it is the rating of the business by the person who wrote the review.

Create a dataframe called yelp_class that contains the columns of yelp dataframe but for only the 1 or 5 star reviews:

yelp_class = yelp[(yelp.stars==1) | (yelp.stars==5)]

Create two objects X and y. X will be the ‘text’ column of yelp_class and y will be the ‘stars’ column of yelp_class (your features and target/labels):

X = yelp_class['text']
y = yelp_class['stars']
Import CountVectorizer and create a CountVectorizer object:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()

Use the fit_transform method on the CountVectorizer object and pass in X (the ‘text’ column). Save this result by overwriting X:

X = cv.fit_transform(X)

Train Test Split
Let’s now split our data into training and testing data.
Use train_test_split to split up the data into X_train, X_test, y_train, y_test. Use test_size=0.3 and random_state=101:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test train_test_split(X,y,test_size=0.3,random_state=101)

Training a Model
Time to train a model!
Import MultinomialNB and create an instance of the estimator and call it nb:

from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()

Now fit nb using the training data:

nb.fit(X_train,y_train)
MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

Predictions and Evaluations
Time to see how our model did!
Use the predict method off of nb to predict labels from X_test:

predictions = nb.predict(X_test)

Create a confusion matrix and classification report using these predictions and y_test:

from sklearn.metrics import confusion_matrix,classification_report
print(confusion_matrix(y_test,predictions))
print('\n')
print(classification_report(y_test,predictions))
[[159  69]
 [ 22 976]]
             precision    recall  f1-score   support

          1       0.88      0.70      0.78       228
          5       0.93      0.98      0.96       998
avg / total       0.92      0.93      0.92      1226

Great! Now let’s see what happens if we try to include TF-IDF to this process using a pipeline.

Using Text Processing
Import TfidfTransformer from sklearn.

from sklearn.feature_extraction.text import  TfidfTransformer
Import Pipeline from sklearn.
from sklearn.pipeline import Pipeline

Now create a pipeline with the following steps: CountVectorizer(), TfidfTransformer(), MultinomialNB():

pipeline = Pipeline([
('bow', CountVectorizer()),  # strings to token integer counts
('tfidf', TfidfTransformer()),  # integer counts to weighted TF-IDF scores
('classifier', MultinomialNB()),  # train on TF-IDF vectors w/ Naive Bayes classifier
])

Using the Pipeline

Time to use the pipeline. Remember this pipeline has all your pre-process steps in it already, meaning we’ll need to re-split the original data. Note that we overwrote X as the CountVectorized version. What we need is just the text.
Train Test Split
Redo the train test split on the yelp_class object:

X = yelp_class['text']
y = yelp_class['stars']
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3,random_state=101)

Now fit the pipeline to the training data. Remember you can’t use the same training data as last time because that data has already been vectorized. We need to pass in just the text and labels:
pipeline.fit(X_train,y_train)

Pipeline(steps=[('bow', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip_...f=False, use_idf=True)), ('classifier', MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True))])

Predictions and Evaluation
Now use the pipeline to predict from the X_test and create a classification report and confusion matrix. You should notice strange results:

predictions = pipeline.predict(X_test)
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test,predictions))
[[  0 228]
 [  0 998]]
             precision    recall  f1-score   support
          1       0.00      0.00      0.00       228
          5       0.81      1.00      0.90       998
avg / total       0.66      0.81      0.73      1226

TF-IDF actually made things worse!

Stepping Into Blogging

Adityaberi — Sat, 18 Apr 2020 18:54:59 +0000

There is a lot of buzz right now about blogging . But first, it’s important to understand the definition of blogging. It’s noisy out there online. There’s a ton of different types of content coming to your buyers from all sides. And your viewers are doing their own research, educating themselves through various platforms . And you want people to find you. It’s very difficult to be found in today’s crazy, noisy landscape. There’s so much content out there, so much online, that as your viewers are searching for different key terms, how are you making sure that you are the one being found? Blogging is the strategy of creating short-form content, i.e. blogs, to exist on one area of your website, such as your blog, in order to drive traffic to your website and create brand awareness for your company. Now that you know the definition of blogging, start thinking about all the ways that you can apply blogging to your own business.

BENEFITS OF BLOGGING
Blogging helps attract the different buyers and customers to your company before they even know who you are. It helps you to establish educational authority in your space, which is extremely important in today’s noisy world. You wanna put yourself out there as a thought leader and as an educator, so that your readers will learn to trust you, and a blog is a great way to get you there. And blogging helps you drive awareness with your audience. Your audience will most likely have never heard of you, and blogging is a great way to get somebody through the door. Blogs are short, they’re often funny, they have a humanistic quality to them.

HOW TO CREATE AND DEVELOP CONTENT
This is where you Ideate, this is where you come up with the ideas for your various blog posts. Here are some ways to get started as you start thinking about your ideas. Consider Industry, what’s hot right now within your own industry, what are people talking about? What about Non-industry trends? These could be trends that are happening out there in the world that might not necessarily be related to your industry, but that you can take and alter for the purposes of your blog. What are your publications priorities, what are the big messages that you’re trying to put forth as a company? What are your various personas, who are you writing for, and how many personas do you have for your blog? Take a look at Competitor content to see what they might be writing about. Most likely, you’re going to have some SEO, some Search Engine Optimization priorities, These are keywords that you’re trying to rank for in Google, that you want to continue to write blog posts about over time.

CREATE YOUR OUTLINE
This might be a step that you could be able to skip if you are a more seasoned blog writer. But I do want to add it in here, because if you are more of a beginner, you might want to create an outline. So, the way to create an outline for your blog, is to basically, Outline your main sections. Make sure that you Craft your main thesis, so, What is the point that you’re trying to put forth within your blog? I find that this is actually one of the most difficult areas of blog writing. I’ll get a ton of blogs, both internal and external guest blogs, that don’t actually have a thesis, and as I read through the blog, I ask myself, What is the point of this blog? You want to make sure that the point is extremely clear to your reader, and then, Determine your supporting content. What are you going to say throughout your blog that will really support that main thesis that you’ve crafted?

POST TITLE
Your Post Title for a blog has to be something that really draws the reader in, and that gets them to read the post. So, you want to write something that’s interesting, that’s engaging, that asks a question. So, listed here are various different techniques that you can use within your titling convention. Consider titles that start with How To…, Secrets of…, Why Your…, How Much Does…, The Worst…, The Truth About…, or Top 10…, these are all different ways to draw your reader in, your title is arguably one of the most important aspects of your blog writing.

Your First Paragraph
So, your first paragraph is a really a way to draw the reader in to get them to read your blog in its entirety. You might want to ask a question in the beginning of your first paragraph, or state something controversially, then. Consider discussing a pain point or noting a fact. Another way to get going is to tell a story. You can tell a story about your own personal experience, a friend’s experience, your company experience. As an example, take a look at a blog post that was created by Ann Hanley, a Content Marketing thought leader. Her title is: “50 Shades of Mediocrity: Does Content Have to be Good, or Just Good Enough?” In her opening paragraph is, “When a franchise like 50 Shades of Grey enjoys crazy success, is it a signal that content doesn’t have to be good to be crazy successful?” So, the opening line of her first paragraph not only notes something that was popular at the time, but it also discusses something controversial. It states that she feels that 50 Shades of Grey was not great content, and then, Does content even have to be great to be successful? So, by doing something like that, you really draw a reader in, and get them to be interested in reading the rest of your post to answer that question. And, a reader often takes time to formulate their own opinion, which really lends itself to social sharing, or engagement towards the end of the post.
Make Your Post Scannable.
I can’t say enough how important it is for your post to be scannable. With blogs, even though they’re short content, your audience most likely won’t read the whole thing. People are extremely adept at scanning, these days, and by making your blog post easily scannable, you make it easier for you audience to finish your post. So, your blog should be easy to read. They should include headings, bullets, or numbered lists, to break them upthrough.

Add a Conclusion.
It’s hard to conclude a post, but by adding something that invites the reader to interact, you can start getting some engagement at the bottom of your post, and a conversation going. So, a great way to conclude your blog is to invite that interaction, ask readers to comment, ask readers a question, or, point to another resource.

Edit, edit and edit some more!
Editing your blog post is incredibly important to maintain quality standards for your entire blog, and to make sure you’re not putting out there any silly errors. Remember that your blog is really a reflection of your publication . So, Review your blog post at least two times, Ask a peer to review your blog post. It’s one thing to go over your post yourself, but it’s quite another to have an extra set of eyes. Consider putting it through an online grammar tool. This is something that I’ve seen a lot of my peers do, there are many online grammar tools that you can actually just upload a blog to, and it’ll tell you if there’s any grammar mistakes. And also make sure you check for spelling mistakes. This seems like a no-brainer, but I can’t tell you how many times I’ve actually seen spelling mistakes on blogs, with or without spell check. So, just make sure you’re going through each post and looking for any grammar or spelling mistakes that might be apparent. Writing a blog post can be tough, but if you follow these steps, you can get started creating great content for your blog…

“Don’t focus on having a great blog. Focus on producing a blog that’s great for your readers.”
Best of luck…. for your journey on blogging………….

INTRO TO SIMPLE LINEAR REGRESSION!!!

Adityaberi — Fri, 17 Apr 2020 13:06:06 +0000

A sneak peek into what Linear Regression is and how it works.

Linear regression is a simple machine learning method that you can use to predict an observations of value based on the relationship between the target variable and the independent linearly related numeric predictive features.

For example: Imagine you have a data-set that describes key characteristics of a set of homes like land acreage, number of storeys, building area, and sales. Based on these features and the relationship with the sales price of these homes, you could build a multivariate linear model that predicts the price a house can be sold for based on its features.

Linear regression is a statistical machine learning method you can use to quantify and make predictions based on relationships between numerical variables which assumes that the data is free from missing values and outliers.

It also assumes that there’s a linear relationship between predictors and predictants & that all predictors and independent of each other.
Lastly, it assumes that residuals are normally distributed.

Ready for a mini-project?

We have all the libraries we need in our Jupyter Notebook. Now let’s set up our plotting perimeters. We want matplotlib to plot out inline within our Jupyter Notebook, so we will say percentage sign matplotlib inline and then let’s just set our dimensions for our data visualizations to be 10 inches wide and eight inches high.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn

from pylab import rcParams
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import scale
%matplotlib inline
rcParams['figure.figsize']=10,8

-So we’re just going to create some synthetic data in order to do a linear regression. Let’s first create a variable called ‘rooms’. We’re going to set rooms equal to two times a set of random numbers (so we’re going to need to call the random number generator which is going to be np.random.randn) and we’ll pass in how many ever values we want- in this case it will be 100. We pass in a one and say plus three. This is the equation we’re using to generate random values to populate the rooms field or to create a synthetic variable that represents the number of rooms in a home.

rooms=2*np.random.rand(100,1)+3
rooms[1:10]
array([[4.04467357],
       [3.77241135],
       [3.14321164],
       [4.48142986],
       [3.18493126],
       [3.8132922 ],
       [4.72655406],
       [3.08916389],
       [3.89772928]])

Now, let’s create a synthetic variable called ‘price’. We’ll say that price is equal to 265 plus six times the number of rooms plus the absolute values (we call the abs function). The absolute value of, & again we’re going to call a random number generator, so that’s np.random.randn and 100 values, a pass of one, and then let’s just take a look at the first 10 records, so we’ll say price one through 10, run this.

price=265+6*rooms +abs(np.random.randn(100,1))
price[1:10]
array([[290.20050075],
       [287.83631918],
       [284.26968068],
       [292.46209605],
       [285.20161696],
       [288.07388113],
       [293.77699261],
       [284.59783984],
       [289.71316513]])

Now, let’s create a scatter plot of our synthetic variables just so we get an idea of what they look like and the relationship between them. So to do that we’re going to call the plot function- plt.plot and we’ll pass in rooms in price. price is going to be on our y-axis and rooms is going to be on our x-axis. Let’s also pass a string that reads r hat, this specifies that a point plot should be generated instead of the default line plot.

plt.plot(rooms,price,'r.')
plt.xlabel("no. of rooms,2020 Average")
plt.ylabel("2020 Avg home price")
plt.show()

To see the plot see the cover image:) :)

What this plot says is, as the number of rooms increase, the price of the house increases.
Makes sense, right?
So now, let’s just do a really simple linear regression. So for our model here, we’re going to use rooms as the predictor, so we’re going to say, x is equal to rooms and we want to predict for the price, so y is going to be equal to price. Let’s instantiate a linear regression object, we’ll call it LinReg and then we’ll say LinReg is equal to LinearRegression and then we’ll fit the model to the data. So to do that we will say LinReg.fit and we’ll pass in our variables x and y.

X=rooms
y=price
LinReg= LinearRegression()
LinReg.fit(X,y)
print(LinReg.intercept_,LinReg.coef_)
[265.39215904] [[6.10708427]]

Holding all other features fixed, a 1 unit increase in Rooms is associated with an increase of 6.10708427 in price
The intercept (often labeled as constant) is the point where the function crosses the y-axis. In some analysis, the regression model only becomes significant when we remove the intercept, and the regression line reduces to Y = bX + error. A regression without a constant means that the regression line goes through the origin wherein the dependent variable and the independent variable is equal to zero.

print(LinReg.score(X,y))
0.9679030603885265

-Our linear regression model is performing really well! Our r squared value is close to 1 and that’s a good thing!
This was just a small sneak peek into what Linear Regression is. I hope you got an idea as to how Linear Regression works through the mini-project!
Feel free to respond to this blog below for any doubts and clarifications!

IMPOSTER SYNDORME

Adityaberi — Fri, 17 Apr 2020 11:15:47 +0000

Perfection is a delusion. Perfection requires someone else to be inferior in order to win. Shake off imposturous ideas like a deer that shakes off fear after a life threatening chase.”
― Deborah Bravandt

Impostor Syndrome is the overwhelming feeling that you don’t deserve your success. It convinces you that you’re not as intelligent, creative or talented as you may seem. It is the suspicion that your achievements are down to luck, good timing or just being in the “right place at the right time.” And it is accompanied by the fear that, one day, you’ll be exposed as a fraud.
So challenges that I face in my college or some people in the tech industry , I mean one of the big ones has just been everyday questioning whether I’m doing it right, whether I belong, whether there’s gonna be somebody walk into the room and pointing at me and saying you know, you don’t know what you’re doing. Dealing with that has been a constant process of paying attention to when people compliment me and believing that they actually mean it and being able to take that in. It’s also been a question of just continuing on and doing it because as I code more, I get more confident in what I’m doing. I see that I build things, they don’t break and I don’t have people walking up to me in the street and saying you don’t know what you’re doing. And I think also part of development is actually building code that other people are working with and looking at. So your code is publicly exposed when your posting it to something called GitHub. And if you’re not, every time you post getting a bunch of people sending you messages saying this is ridiculous and it doesn’t work, then if that’s not happening, you’re clearly doing something right and it’s important to take that in and use that as a tool to help yourself recognize that yes, you do know what you’re doing, you do have skills and experience to bear in this industry in the work that you’re doing in development. I’ve been in situations where I am sure on paper that I am the expert. But there’s somebody else in the room who seems like more of an expert. And I find myself second guessing my ability. You feel like an imposter sometimes.

In my experience, the developers who don’t feel that way ever, they’re often the absolute worst developers and the code they right might be really complex and advanced, but no one else can understand it. And if there’s a bug in it, there’s only one person that can fix it. And so that doesn’t mean they’re a good developer. That means they’re a bad developer ’cause they’re writing code that isn’t very useful to the rest of the team. So for me, I’d much rather have somebody, work with somebody who has less experience and maybe second guesses themself a little bit, but is willing to work on code that every person on the team can understand. I found that a lot of times it’s surprising how much there is that other people don’t know as well. So I’ve certainly had times where I go an talk to a developer or a senior and I say, you know what, can you help me with this. I don’t understand it. And they go straight to Google and look things up because they don’t know it either. They might be ahead of me. They know more than me so they’re able to refine the search to help me find what I need. But the reality is that often we can project on to other people a perfection that they don’t have either. And so I think when we get past that, when we can kinda pull back that veneer a little bit and be vulnerable to say I don’t know everything, that’s when we can start to build those relationships that help us overcome the limitations that we have and to work together as a team on accomplishing things as a team.

“You see, cuckoos are parasites. They lay their eggs in other birds’ nests. When the egg hatches, the baby cuckoo pushes the other baby birds out of the nest. The poor parent birds work themselves to death trying to find enough food to feed the enormous cuckoo child who has murdered their babies and taken their places.”
“Enormous?” said Jace. “Did you just call me fat?”
“It was an analogy.”
“I am not fat.”
― Cassandra Clare, City of Ashes

When I first started going to technical talks in gravitas and going to conferences in my college, I really felt like they were so much smarter than me because they stand up there and they have the stage and they have the mic and they looked like they’re just, it’s so intimidating. And when I started actually doing the meet up I realized that I spend weeks getting this PowerPoint slides and all my notes and everything in order. And I make sure that I understand it so that I can teach it. The only reason why those guys are so intimidating is because they spend all this time putting together their PowerPoint slides and their notes and they have everything in order. It’s not because they’re smarter than us. It’s because they prepared that talk. And when I prepared that talk, I learned that somebody would come along in my meet up and ask me to cover a specific topic. Well, I don’t really know that topic but none of us do and it’s an interesting topic. So I would go, and I would do some research, and I would put together a PowerPoint presentation. And I would get up there, and now I’ve learned it very well, so now i feel more confident with it.

Now I can actually use this. But the reason why I learned it was because some guy came up and asked me to teach it. But there’s no magic. It’s not because I’m smarter. That’s not it. It’s because I prepared that talk. What I would say to anyone who is feeling like they’re an imposter in their field is number one, most people feel that way. It’s very common to feel like you don’t know what you’re doing. And number two, that it does go away after a while. It will probably never disappear. I still feel it to this day when I’m writing or working on something that I think I’m perfect in and I lose confidence about being able to figure something out or I feel like it’s taking me too long or I feel like I just can’t make something work that I should be able to make work. But it does get better over time as you start to solve problems, as you start to get some constructive feedback on your work, as you start to build up a catalog of things that you’ve completed, you learn how to handle it and realize that the feeling is normal, but you actually can solve problems and you can find solutions…….