<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: NikitaLafinskiy</title>
    <description>The latest articles on Forem by NikitaLafinskiy (@nikitalafinskiy).</description>
    <link>https://forem.com/nikitalafinskiy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F989645%2F6f88f4cd-2a6e-4ec7-83b5-639a3a6748b2.jpg</url>
      <title>Forem: NikitaLafinskiy</title>
      <link>https://forem.com/nikitalafinskiy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/nikitalafinskiy"/>
    <language>en</language>
    <item>
      <title>Predicting edibility of mushrooms (classification) with Neural Networks</title>
      <dc:creator>NikitaLafinskiy</dc:creator>
      <pubDate>Tue, 13 Dec 2022 19:41:29 +0000</pubDate>
      <link>https://forem.com/nikitalafinskiy/predicting-edibility-of-mushrooms-classification-with-neural-networks-2mjo</link>
      <guid>https://forem.com/nikitalafinskiy/predicting-edibility-of-mushrooms-classification-with-neural-networks-2mjo</guid>
      <description>&lt;p&gt;Here we are going to build a classification model for the&lt;a href="https://www.kaggle.com/datasets/uciml/mushroom-classification"&gt;Mushroom Classification Dataset&lt;/a&gt;. The model is going to classify if the mushroom is edible (e) or poisonous (p). Other attributes are: &lt;br&gt;
You can find description of all the attributes following the kaggle &lt;a href="https://www.kaggle.com/datasets/uciml/mushroom-classification"&gt;link&lt;/a&gt;.&lt;br&gt;
First let's import pandas and use it to read the downloaded dataset. Then we will display the downloaded dataset&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;m_df = pd.read_csv("/content/mushrooms.csv")
m_df
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--aIwQNnJf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3avdrm98d15545fmlczv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--aIwQNnJf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3avdrm98d15545fmlczv.png" alt="Image description" width="880" height="236"&gt;&lt;/a&gt;&lt;br&gt;
Now we should explore the dataset:&lt;/p&gt;

&lt;p&gt;Check for unfilled column values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;m_df.isnull().sum()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class                       0
cap-shape                   0
cap-surface                 0
cap-color                   0
bruises                     0
odor                        0
gill-attachment             0
gill-spacing                0
gill-size                   0
gill-color                  0
stalk-shape                 0
stalk-root                  0
stalk-surface-above-ring    0
stalk-surface-below-ring    0
stalk-color-above-ring      0
stalk-color-below-ring      0
veil-type                   0
veil-color                  0
ring-number                 0
ring-type                   0
spore-print-color           0
population                  0
habitat                     0
dtype: int64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No columns have a missing value.&lt;/p&gt;

&lt;p&gt;Now check for the datatypes of the dataset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;m_df.dtypes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class                       object
cap-shape                   object
cap-surface                 object
cap-color                   object
bruises                     object
odor                        object
gill-attachment             object
gill-spacing                object
gill-size                   object
gill-color                  object
stalk-shape                 object
stalk-root                  object
stalk-surface-above-ring    object
stalk-surface-below-ring    object
stalk-color-above-ring      object
stalk-color-below-ring      object
veil-type                   object
veil-color                  object
ring-number                 object
ring-type                   object
spore-print-color           object
population                  object
habitat                     object
dtype: object
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As we can see there are no numerical columns.&lt;/p&gt;

&lt;p&gt;After analyzing the dataset we should split it into two: the labels dataset and the features dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;X = m_df.drop("class", axis=1)
y = m_df["class"] 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can not train the model and test the predicted results on the same dataset, just as a student should not be writing an exam he was allowed to see in advance, so we have to split the dataset into testing and training:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.model_selection import train_test_split

# The labels should be numerical so we replace all of the labels 
# with ones and zeros
y.replace("p", 0, inplace=True)
y.replace("e", 1, inplace=True)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=(0.2))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All the values in the dataset are in categorical form. The model can not learn anything from data in such form so we should onehot encode it - put in through a process of converting each categorical value into a new categorical column and assigning a binary value of 1 or 0 to those column. Also we might want to plot all of the mushroom samples so that we can better visualize the correlation of attributes to the labels. We can do that via dimension reduction, in this example - using PCA.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

cat_features = [*X.columns]
cat_transformer = Pipeline(steps=[("onehot", OneHotEncoder(sparse=False)), ("pca", PCA(n_components=2))])

ct = ColumnTransformer(transformers=[("categories", cat_transformer, cat_features)])
ct.fit(X)
X_train_t = ct.transform(X_train)
X_test_t = ct.transform(X_test)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plotting the PCA graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.scatter(X_train_t[:, 0], X_train_t[:, 1], alpha=0.4, color="lightblue");
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bBerVhZB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vdwhwgtcx5oylsbwvuk3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bBerVhZB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vdwhwgtcx5oylsbwvuk3.png" alt="Image description" width="370" height="249"&gt;&lt;/a&gt;&lt;br&gt;
We can also see correlations of different features to the labels on the heatmap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sns.heatmap(pd.get_dummies(m_df).corr())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7so_UrED--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rhwn6dvyso4aew2gvdcw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7so_UrED--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rhwn6dvyso4aew2gvdcw.png" alt="Image description" width="486" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we can finally start training our model. We only have to distinct labels, so we should be using the &lt;code&gt;sigmoid&lt;/code&gt; activation function, as it is going to be returning us values of the range from 0 to 1, as well as using &lt;code&gt;binary_crossentropy&lt;/code&gt;. We are also going to use the &lt;code&gt;LearningRateScheduler&lt;/code&gt;callback to try and find the ideal learning rate by analyzing loss and accuracy at different learning rates from &lt;code&gt;(1e-4 * (10**(1/15)))&lt;/code&gt; (0.000116591440118) to &lt;code&gt;(1e-4 * (10**(100/15)))&lt;/code&gt; (464.158883361)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation="relu"),
    tf.keras.layers.Dense(30, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

model.compile(loss="binary_crossentropy", optimizer=tf.keras.optimizers.Adam(), metrics=[tf.keras.metrics.binary_crossentropy, "accuracy"])

lr_cb = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-4 * (10**(epoch/15)))

history = model.fit(X_train_t, y_train, epochs=70, verbose=1,  callbacks=[lr_cb])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can visualize the &lt;code&gt;history&lt;/code&gt; of the training process of our model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hist = pd.DataFrame(history.history).drop("binary_crossentropy", axis=1)
hist.plot()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--H9UgXY60--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/p2xuwr8do7bj912dl3np.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--H9UgXY60--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/p2xuwr8do7bj912dl3np.png" alt="Image description" width="372" height="248"&gt;&lt;/a&gt;&lt;br&gt;
As we can see on the graph the accuracy of the model starts dropping when the learning rate increases beyond a certain point.&lt;br&gt;
Let's look at the correlation of the learning rate and the loss curve closer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lrs = 1e-4 * (10**(np.arange(0,70)/15)) # generate an array of
# learning rates with the same number of epochs
plt.semilogx(lrs, hist.loss)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Gc-L79Xo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/f64rh8hqn8y16tjkmpwy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Gc-L79Xo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/f64rh8hqn8y16tjkmpwy.png" alt="Image description" width="372" height="252"&gt;&lt;/a&gt;&lt;br&gt;
The loss is steeply decreasing from 10e-4 to 10e-3. We should take te value closer to the lowest point of the curve (10e-3), the value is going to be equal to around 0.0097 (Which is almost the exact same as the Adam's default learning rate).&lt;br&gt;
Now let's fit the model with the ideal learning rate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation="relu"),
    tf.keras.layers.Dense(30, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

model.compile(loss="binary_crossentropy", optimizer=tf.keras.optimizers.Adam(learning_rate=0.0097), metrics=[tf.keras.metrics.binary_crossentropy, "accuracy"])

history = model.fit(X_train_t, y_train, epochs=15, verbose=1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can use the testing dataset to make predictions with our model and draw a confusion matrix to better visualize the results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;y_pred = np.round(model.predict(X_test_t)[:, 0])
ConfusionMatrixDisplay.from_predictions(y_true=y_test, y_pred=y_pred);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vWDt-uA4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6axdaya3slutomcc39h8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vWDt-uA4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6axdaya3slutomcc39h8.png" alt="Image description" width="312" height="266"&gt;&lt;/a&gt;&lt;br&gt;
Now lets write a function that is going to return different metrics based on the model's predicitons and the actual values and use it to evaluate our model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def metrics(y_test, y_pred):
    f1 = f1_score(y_test, y_pred)
    rec = recall_score(y_test, y_pred)
    acc = accuracy_score(y_test,y_pred)
    prec = precision_score(y_test,y_pred)
    return f"Accuracy: {round(acc, 2)}, Precision: {round(prec, 2)}, Recall: {round(rec, 2)}, F1: {round(f1, 2)}"  
metrics(y_test, y_pred)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;code&gt;Accuracy: 0.94, Precision: 0.96, Recall: 0.93, F1: 0.94&lt;/code&gt;&lt;br&gt;
Now our model is finished! Hope this article was helpful and thank you for a read!&lt;/p&gt;

</description>
      <category>tensorflow</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
