<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Princess Mae Sanchez</title>
    <description>The latest articles on Forem by Princess Mae Sanchez (@cessamaeeee).</description>
    <link>https://forem.com/cessamaeeee</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3590371%2F73a7b464-d9d7-42c6-9ad1-3b73ae523e1e.png</url>
      <title>Forem: Princess Mae Sanchez</title>
      <link>https://forem.com/cessamaeeee</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/cessamaeeee"/>
    <language>en</language>
    <item>
      <title>Machine Learning Fundamentals: Everything I Wish I Knew When I Started</title>
      <dc:creator>Princess Mae Sanchez</dc:creator>
      <pubDate>Sun, 02 Nov 2025 06:33:09 +0000</pubDate>
      <link>https://forem.com/cessamaeeee/machine-learning-fundamentals-everything-i-wish-i-knew-when-i-started-41d5</link>
      <guid>https://forem.com/cessamaeeee/machine-learning-fundamentals-everything-i-wish-i-knew-when-i-started-41d5</guid>
      <description>&lt;p&gt;Hi! When I started learning about Machine Learning, I encountered a lot of unfamiliar terminology that left me feeling overwhelmed. As a beginner, I immediately dove into conceptual learning and practical coding. However, this approach had its limitations since it focused on only one aspect of ML. This led me to become curious about the wider scope of the field and made me realize how small my current knowledge was compared to how much I still needed to learn.&lt;/p&gt;

&lt;p&gt;With that realization, I decided to study the broader perspective and familiarize myself with concepts I might encounter in the future. If you want to understand Machine Learning fundamentals from the ground up, this blog is for you.&lt;/p&gt;

&lt;h1&gt;
  
  
  What is Machine Learning?
&lt;/h1&gt;

&lt;p&gt;Machine Learning (ML) is a branch of Artificial Intelligence that enables computers to learn patterns from data and make predictions or decisions without being explicitly programmed. Instead of writing detailed rules for every scenario, ML models discover relationships within data through mathematical algorithms and statistical methods.&lt;/p&gt;

&lt;p&gt;Think of it this way: rather than programming a computer with rules like "if the email contains the word 'lottery,' mark it as spam," machine learning lets the computer analyze thousands of emails and figure out the patterns of spam on its own.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Building Blocks of Machine Learning
&lt;/h1&gt;

&lt;p&gt;Every ML system consists of several core components working together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dataset&lt;/strong&gt; forms the foundation—this is the collection of data used to train and test models, whether it's CSV files, images, text, or sensor readings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Features (X)&lt;/strong&gt; are the input variables that help make predictions. These could be pixels in an image, words in a text, a person's age, or a product's price.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Labels or Targets (y)&lt;/strong&gt; represent the correct output that the model must learn to predict, such as whether an email is spam or not spam, a house's price, or a product's category.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Model or Algorithm&lt;/strong&gt; is the mathematical function that learns from the data. Popular examples include Linear Regression, Decision Trees, and Naive Bayes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loss Functions&lt;/strong&gt; measure how far predictions are from actual values, using metrics like Mean Squared Error or Cross-Entropy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimizers&lt;/strong&gt; improve model parameters during training, with Gradient Descent and Adam being common choices.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation Metrics&lt;/strong&gt; check how well the model performs, using measures like Accuracy, Precision, Recall, and F1-score.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  The Four Major Categories of Machine Learning
&lt;/h1&gt;

&lt;h2&gt;
  
  
  1. Supervised Learning: Learning with a Teacher
&lt;/h2&gt;

&lt;p&gt;Supervised learning is like studying with an answer key. The model learns from labeled data where you already know the correct answers. You're essentially teaching the model to map inputs to outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt; You know what you're predicting, labels exist, and you want to classify or predict something.&lt;/p&gt;

&lt;h3&gt;
  
  
  Supervised learning splits into two main types:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Classification&lt;/strong&gt; predicts discrete categories—spam or not spam, disease or no disease. Models like Logistic Regression, Naive Bayes, SVM, and Random Forest excel at this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regression&lt;/strong&gt; predicts continuous numbers—house prices, temperature forecasts. Linear Regression, Decision Tree Regressor, and Random Forest Regressor are go-to choices here.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Popular Supervised Learning Models
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Linear Regression&lt;/strong&gt; finds the best-fitting line through data points, making it great for simple numeric predictions like house prices based on square footage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logistic Regression&lt;/strong&gt; predicts binary categories using a sigmoid curve to output probabilities. Despite its name, it's used for classification, not regression.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decision Trees&lt;/strong&gt; split data based on rules, like playing twenty questions. They're easy to visualize but can overfit on small datasets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Random Forests&lt;/strong&gt; combine many decision trees voting together, providing more accuracy and stability than a single tree.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Support Vector Machines (SVM)&lt;/strong&gt; draw the best boundary separating classes, working well for small, clean datasets but struggling with large or noisy data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Naive Bayes&lt;/strong&gt; uses probability and Bayes' theorem, assuming features are independent. It works exceptionally well for text data like spam filters and sentiment analysis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;K-Nearest Neighbors (KNN)&lt;/strong&gt; predicts based on neighboring data points. It's simple but can be slow on large datasets.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. Unsupervised Learning: Finding Hidden Patterns
&lt;/h2&gt;

&lt;p&gt;Unsupervised learning is like exploring without a map. The model discovers patterns or structures in unlabeled data where no known outcomes exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt; You have no labels, and you want the model to explore or group your data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Clustering&lt;/strong&gt; groups similar data points together, useful for customer segmentation. K-Means, DBSCAN, and Agglomerative Clustering are common approaches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dimensionality Reduction&lt;/strong&gt; simplifies data while preserving key information, essential for visualizing high-dimensional data. PCA, t-SNE, and LDA are popular techniques.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Association&lt;/strong&gt; finds relationships between variables, like market basket analysis discovering that customers who buy milk often buy bread.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Reinforcement Learning: Learning Through Experience
&lt;/h2&gt;

&lt;p&gt;Reinforcement learning is how models learn by trial and error, receiving rewards for good actions and penalties for bad ones. There's no fixed dataset—instead, the model interacts with an environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt; Sequential decision problems where the model must learn from experience.&lt;br&gt;
The core concepts include an &lt;strong&gt;Agent&lt;/strong&gt; (the learner), an &lt;strong&gt;Environment&lt;/strong&gt; (the situation), &lt;strong&gt;Actions&lt;/strong&gt; (what the agent does), &lt;strong&gt;Rewards&lt;/strong&gt; (feedback), and a &lt;strong&gt;Policy&lt;/strong&gt; (the strategy learned).&lt;/p&gt;

&lt;p&gt;Real-world applications include self-driving cars (rewarded for staying on the road), game AIs (rewarded for winning), and robotics.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Deep Learning: The Power of Neural Networks
&lt;/h2&gt;

&lt;p&gt;Deep learning uses neural networks with many layers to handle complex data like images, text, and sound. It's essentially ML with neural networks—powerful but requiring lots of data and computing power.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt; Data is large and complex, and traditional models struggle to extract patterns.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Artificial Neural Networks (ANNs)&lt;/strong&gt; provide basic deep learning for tasks like stock trend prediction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Convolutional Neural Networks (CNNs)&lt;/strong&gt; capture spatial patterns, excelling at image recognition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recurrent Neural Networks (RNNs)&lt;/strong&gt; and &lt;strong&gt;LSTMs&lt;/strong&gt; capture sequential patterns for speech, text, and time series.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transformers&lt;/strong&gt; handle long text sequences, powering chatbots, translation, and models like GPT.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Essential Concepts Every ML Practitioner Should Know
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Data Preprocessing
&lt;/h2&gt;

&lt;p&gt;Before training any model, data must be cleaned and prepared. This involves handling missing values, normalizing or scaling numerical features, encoding categorical data, and splitting data into training and testing sets using functions like &lt;code&gt;train_test_split()&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Feature Extraction
&lt;/h2&gt;

&lt;p&gt;ML models require numerical data, so text or images must be converted into numbers. &lt;code&gt;CountVectorizer()&lt;/code&gt; converts text into a bag-of-words model using word counts. More advanced representations include TF-IDF, Word2Vec, GloVe, and BERT embeddings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Training and Evaluation
&lt;/h2&gt;

&lt;p&gt;During training, the model learns patterns from data. For example, &lt;code&gt;MultinomialNB()&lt;/code&gt; is a Naive Bayes classifier excellent for text classification based on Bayes' Theorem.&lt;/p&gt;

&lt;p&gt;After training, you must measure performance using metrics like &lt;code&gt;accuracy_score()&lt;/code&gt; (how often predictions match actual results) and &lt;code&gt;classification_report()&lt;/code&gt; (detailed metrics including precision, recall, and F1-score).&lt;/p&gt;

&lt;h1&gt;
  
  
  The Machine Learning Workflow
&lt;/h1&gt;

&lt;p&gt;A typical ML project follows these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Import libraries (pandas, numpy, sklearn)&lt;/li&gt;
&lt;li&gt;Load your dataset&lt;/li&gt;
&lt;li&gt;Explore and clean the data&lt;/li&gt;
&lt;li&gt;Engineer features and convert them to numeric format&lt;/li&gt;
&lt;li&gt;Split data into training and testing sets&lt;/li&gt;
&lt;li&gt;Choose and train your model&lt;/li&gt;
&lt;li&gt;Make predictions on test data&lt;/li&gt;
&lt;li&gt;Evaluate performance&lt;/li&gt;
&lt;li&gt;Tune and improve through iteration&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Choosing the Right Model for Your Problem
&lt;/h1&gt;

&lt;p&gt;The key questions to ask yourself:&lt;br&gt;
&lt;strong&gt;Do I have labeled data?&lt;/strong&gt; If yes, use supervised learning. If no, consider unsupervised learning.&lt;br&gt;
&lt;strong&gt;Am I predicting categories or numbers?&lt;/strong&gt; Categories call for classification models, while numbers need regression.&lt;br&gt;
&lt;strong&gt;Is my data text, images, or structured?&lt;/strong&gt; Text works well with Naive Bayes, images need CNNs, and structured data suits Random Forests or Gradient Boosting.&lt;br&gt;
&lt;strong&gt;How much data do I have?&lt;/strong&gt; Large datasets can support complex models like neural networks, while smaller datasets may need simpler approaches.&lt;/p&gt;

&lt;h1&gt;
  
  
  Popular Tools and Libraries
&lt;/h1&gt;

&lt;p&gt;The Python ecosystem offers powerful libraries for every ML need:&lt;br&gt;
&lt;strong&gt;NumPy&lt;/strong&gt; handles numerical operations and arrays efficiently. &lt;strong&gt;Pandas&lt;/strong&gt; excels at data manipulation and cleaning. &lt;strong&gt;Matplotlib and Seaborn&lt;/strong&gt; create stunning visualizations. &lt;strong&gt;Scikit-learn&lt;/strong&gt; provides core ML algorithms for classification, regression, and clustering. &lt;strong&gt;TensorFlow and PyTorch&lt;/strong&gt; power deep learning projects. &lt;strong&gt;XGBoost and LightGBM&lt;/strong&gt; deliver high-performance tree-based models.&lt;/p&gt;

&lt;h1&gt;
  
  
  Getting Started with Machine Learning
&lt;/h1&gt;

&lt;p&gt;Machine learning might seem intimidating at first, but it's more accessible than ever. Start with simple projects using structured data and supervised learning. Practice with real datasets, experiment with different algorithms, and gradually work your way up to more complex problems.&lt;/p&gt;

&lt;p&gt;Remember: every ML expert started as a beginner. The key is consistent practice, curiosity, and a willingness to learn from both successes and failures. Whether you're building a spam filter, predicting house prices, or creating recommendation systems, the fundamentals covered here will serve as your foundation.&lt;/p&gt;

&lt;p&gt;Ready to dive in? Pick a dataset that interests you, choose a simple model, and start experimenting. The world of machine learning awaits!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;What's your first machine learning project going to be? Share your ideas in the comments below!&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>beginners</category>
    </item>
    <item>
      <title>🚫📩 Build a Spam Message Classifier with Python (Step-by-Step for Beginners)</title>
      <dc:creator>Princess Mae Sanchez</dc:creator>
      <pubDate>Fri, 31 Oct 2025 12:58:07 +0000</pubDate>
      <link>https://forem.com/cessamaeeee/build-a-spam-message-classifier-with-python-step-by-step-for-beginners-2op6</link>
      <guid>https://forem.com/cessamaeeee/build-a-spam-message-classifier-with-python-step-by-step-for-beginners-2op6</guid>
      <description>&lt;p&gt;Hey there! 👋&lt;/p&gt;

&lt;p&gt;I recently finished &lt;strong&gt;Kaggle’s Intro to Machine Learning&lt;/strong&gt; course, and to put my new skills into practice, I built a &lt;strong&gt;Spam Message Classifier&lt;/strong&gt; — an AI that can tell whether a text message is spam or not.&lt;/p&gt;

&lt;p&gt;If you’ve ever wondered how Gmail filters spam emails automatically, this post will help you understand how that works (and how you can make one yourself)!&lt;/p&gt;

&lt;p&gt;Don’t worry if you’re starting from zero. I’ll explain everything line by line — no background knowledge required. 🧠✨&lt;/p&gt;

&lt;h2&gt;
  
  
  🎯 What You’ll Learn
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;How to train a simple AI model to detect spam messages&lt;/li&gt;
&lt;li&gt;How to clean and prepare a dataset&lt;/li&gt;
&lt;li&gt;How to evaluate your model’s performance&lt;/li&gt;
&lt;li&gt;Why learning this is useful and where you can go next&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Import the Libraries
&lt;/h2&gt;

&lt;p&gt;Let’s start by importing the tools we’ll need.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd 
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
print("✅ Libraries imported successfully!")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What’s Happening Here?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;pandas – lets us handle datasets easily (like working with Excel inside Python).&lt;/li&gt;
&lt;li&gt;numpy – for math and number operations.&lt;/li&gt;
&lt;li&gt;sklearn (scikit-learn) – our main machine learning library.&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;train_test_split()&lt;/code&gt; – divides data into training and testing parts.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CountVectorizer()&lt;/code&gt; – converts words into numbers (AI can’t read text directly).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MultinomialNB()&lt;/code&gt; – our machine learning model (Naive Bayes classifier).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;accuracy_score()&lt;/code&gt; &amp;amp; &lt;code&gt;classification_report()&lt;/code&gt; used to check how good our AI is.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 2: Load the Dataset
&lt;/h2&gt;

&lt;p&gt;We’ll use a public dataset from Kaggle called &lt;strong&gt;SMS Spam Collection&lt;/strong&gt;.&lt;br&gt;
This dataset contains thousands of real text messages labeled as either spam or ham (ham = not spam).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df = pd.read_csv('/kaggle/input/d/uciml/sms-spam-collection-dataset/spam.csv', encoding='latin-1')
df = df[['v1', 'v2']]
df.columns = ['label', 'message']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Understand the Data
&lt;/h2&gt;

&lt;p&gt;Before training any model, we must understand what our data looks like.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(f"Total emails: {len(df)}")
print(df.head())
print(df['label'].value_counts())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helps us know how many spam and ham messages exist — super important for checking balance in our dataset.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Convert Labels to Numbers
&lt;/h2&gt;

&lt;p&gt;AI works with numbers, not words. So we’ll map:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;spam → 1&lt;/li&gt;
&lt;li&gt;ham → 0
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df['label'] = df['label'].map({'spam': 1, 'ham': 0})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Split the Data
&lt;/h2&gt;

&lt;p&gt;We need to test the model on unseen data to see if it really learned, not memorized.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;X = df['message']
y = df['label']
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;80%&lt;/strong&gt; of the data is for training.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;20%&lt;/strong&gt; is for testing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 6: Convert Text to Numbers
&lt;/h2&gt;

&lt;p&gt;AI can’t “read” words. We need to represent each message as a vector of numbers using &lt;strong&gt;CountVectorizer&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 7: Train the Model
&lt;/h2&gt;

&lt;p&gt;Now comes the fun part — training the machine learning model!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model = MultinomialNB()
model.fit(X_train_vec, y_train)
print("✅ Model trained successfully!")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This teaches the model which words often appear in spam messages (like &lt;em&gt;“free”&lt;/em&gt;, &lt;em&gt;“win”&lt;/em&gt;, &lt;em&gt;“click”&lt;/em&gt;) and which appear in normal ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 8: Test the Model
&lt;/h2&gt;

&lt;p&gt;Let’s see how well it performs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;y_pred = model.predict(X_test_vec)
accuracy = accuracy_score(y_test, y_pred)
print(f"🎯 Model Accuracy: {accuracy * 100:.2f}%")
print(classification_report(y_test, y_pred, target_names=['Ham', 'Spam']))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your accuracy is around &lt;strong&gt;95–98%&lt;/strong&gt;, that’s great! 🎉&lt;br&gt;
Your model can now correctly identify most spam messages.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 9: Test It with New Messages
&lt;/h2&gt;

&lt;p&gt;Let’s make our own mini spam detector function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def predict_spam(message):
    message_vec = vectorizer.transform([message])
    prediction = model.predict(message_vec)[0]
    probability = model.predict_proba(message_vec)[0]

    result = "🚫 SPAM" if prediction == 1 else "✅ HAM (Not Spam)"
    confidence = probability[prediction] * 100

    print(f"Message: '{message}'")
    print(f"Prediction: {result}")
    print(f"Confidence: {confidence:.1f}%\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Try it out! 👇&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;test_messages = [
    "Congratulations! You've won a free iPhone. Click here now!",
    "Hey, are we still meeting for lunch tomorrow?",
    "URGENT: Your account will be closed. Verify now!",
    "Can you send me the project report by Friday?"
]

for msg in test_messages:
    predict_spam(msg)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2fwk1e7b5zcr1n3v8ibi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2fwk1e7b5zcr1n3v8ibi.png" alt=" " width="517" height="308"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You’ll see predictions with confidence levels — just like Gmail’s spam filter does behind the scenes!&lt;/p&gt;

&lt;h2&gt;
  
  
  Purpose: Why Learn This?
&lt;/h2&gt;

&lt;p&gt;Understanding how to build a spam classifier is your first step into practical AI.&lt;/p&gt;

&lt;p&gt;Here’s why it matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It teaches &lt;strong&gt;data preprocessing&lt;/strong&gt; (cleaning, labeling, transforming text).&lt;/li&gt;
&lt;li&gt;You learn &lt;strong&gt;how machine learning models actually learn&lt;/strong&gt; from patterns.&lt;/li&gt;
&lt;li&gt;It’s the foundation of &lt;strong&gt;Natural Language Processing (NLP)&lt;/strong&gt; — the same technology behind chatbots, Google Translate, and Siri!&lt;/li&gt;
&lt;li&gt;You can now &lt;strong&gt;deploy **this model in a small web app using **Flask&lt;/strong&gt;, so anyone can type a message and check if it’s spam.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;If you made it this far, congrats! You didn’t just copy code — you built your first working AI model.&lt;br&gt;
Keep experimenting, keep learning, and soon you’ll be deploying your own intelligent apps to the world. 🌍&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>kaggle</category>
    </item>
    <item>
      <title>My ML Learning Journey: From Confusion to Building a Working Model</title>
      <dc:creator>Princess Mae Sanchez</dc:creator>
      <pubDate>Fri, 31 Oct 2025 06:07:00 +0000</pubDate>
      <link>https://forem.com/cessamaeeee/my-ml-learning-journey-from-confusion-to-building-a-working-model-235p</link>
      <guid>https://forem.com/cessamaeeee/my-ml-learning-journey-from-confusion-to-building-a-working-model-235p</guid>
      <description>&lt;p&gt;I'm learning machine learning, and I want to share this journey with you. Not as an expert—I literally started Kaggle's &lt;strong&gt;"Intro to Machine Learning"&lt;/strong&gt; course last week—but as someone who just figured out how to build their first predictive model and wants to help others do the same.&lt;/p&gt;

&lt;p&gt;If you've been curious about AI and machine learning but thought it was too complicated, or if terms like "neural networks" and "algorithms" sound intimidating, this post is for you. Let me show you that it's actually way more approachable than you think!&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I'm Learning Machine Learning
&lt;/h2&gt;

&lt;p&gt;I've been fascinated by AI for a while now. Every time I see AI-powered recommendations on Netflix, autocomplete on my phone, or ChatGPT writing code, I wonder: &lt;strong&gt;"How does this actually work?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I wanted to go beyond just using AI tools—I wanted to understand the fundamentals. That's when I discovered Kaggle's free "Intro to Machine Learning" course, and honestly? It's been one of the best decisions I've made this year.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My goal:&lt;/strong&gt; Understand how machines learn from data and build my own models (even simple ones!)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why I'm sharing publicly:&lt;/strong&gt; Learning in public keeps me accountable, helps me remember concepts better by teaching them, and hopefully helps someone else who's just starting out.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned This Week
&lt;/h2&gt;

&lt;p&gt;Here are the key concepts I wrapped my head around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to load and explore data with Pandas&lt;/li&gt;
&lt;li&gt;The difference between features (X) and targets (y)&lt;/li&gt;
&lt;li&gt;Building a decision tree model&lt;/li&gt;
&lt;li&gt;Why you can't just test on training data (this was a big "aha!" moment)&lt;/li&gt;
&lt;li&gt;What overfitting and underfitting actually mean&lt;/li&gt;
&lt;li&gt;How Random Forests make better predictions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now let me teach you what I learned!&lt;/p&gt;

&lt;h2&gt;
  
  
  Tutorial: Build Your First Machine Learning Model (Seriously, You Can Do This!)
&lt;/h2&gt;

&lt;p&gt;Let me walk you through building a house price predictor using the Melbourne Housing dataset—the same project I just completed. We'll go step by step, and I'll explain everything in plain English.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What You'll Need&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Kaggle account (it's free!)&lt;/li&gt;
&lt;li&gt;Basic Python knowledge (if you know variables and functions, you're good)&lt;/li&gt;
&lt;li&gt;The Melbourne Housing dataset from Kaggle (it's already available when you start!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pro tip: I'm doing this directly in a Kaggle notebook - no setup required! Just click "New Notebook" on Kaggle and you're ready to code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Understanding the Problem
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Predict how much a house in Melbourne will cost based on its features (size, number of rooms, land size, etc.)&lt;br&gt;
**Think of it like this: **If I told you a Melbourne house has 4 bedrooms, 2 bathrooms, 500 sqm land, and 150 sqm building area, could you guess roughly what it costs? You'd probably compare it to other houses you know. That's exactly what we're teaching the computer to do!&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Setting Up - Import Libraries
&lt;/h2&gt;

&lt;p&gt;First, we need to bring in our tools:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr7tkr9fyn7v3r1cagk3m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr7tkr9fyn7v3r1cagk3m.png" alt=" " width="584" height="138"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What just happened?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pandas&lt;/code&gt; is like Excel for Python—it handles data tables&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sklearn&lt;/code&gt; (scikit-learn) contains all our machine learning tools&lt;/li&gt;
&lt;li&gt;We're importing specific tools we'll need for building and testing models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it like: Opening your toolbox before starting a project - we're grabbing the hammer, screwdriver, and wrench we'll need!&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Loading the Data
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fheinutv64iw2w64axtgl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fheinutv64iw2w64axtgl.png" alt=" " width="800" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important note for Kaggle users:&lt;/strong&gt; When you attach a dataset in Kaggle, it's stored in &lt;code&gt;/kaggle/input/[dataset-name]/&lt;/code&gt;. That's why we use that special path!&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Exploring the Data
&lt;/h2&gt;

&lt;p&gt;Before building any model, you need to understand your data:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70g0rdvvotfmhwbma4r7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70g0rdvvotfmhwbma4r7.png" alt=" " width="421" height="249"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this tells me:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;count&lt;/strong&gt;: How many houses have this information&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;mean&lt;/strong&gt;: The average value (e.g., average price is around $1M!)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;min/max&lt;/strong&gt;: The range of values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;50%&lt;/strong&gt;: The median (middle value) - helps spot outliers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missing values&lt;/strong&gt;: Some houses don't have all information filled in&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this matters&lt;/strong&gt;: The Melbourne dataset has 13,580 houses, but I noticed that some columns like BuildingArea only have 7,130 values. That means almost half the houses are missing this info! We need to handle this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Choosing and Cleaning Our Data
&lt;/h2&gt;

&lt;p&gt;Here's where we decide what to use and clean up the missing values:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7w11oq39xdwwp13coiy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7w11oq39xdwwp13coiy.png" alt=" " width="763" height="399"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Breaking this down:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Choosing features&lt;/strong&gt; - I picked characteristics that logically affect price:&lt;br&gt;
-&lt;strong&gt;&lt;em&gt;Rooms&lt;/em&gt;&lt;/strong&gt;: More rooms = usually more expensive&lt;br&gt;
-&lt;strong&gt;&lt;em&gt;Bathroom&lt;/em&gt;&lt;/strong&gt;: More bathrooms = usually more expensive&lt;br&gt;
-&lt;strong&gt;&lt;em&gt;Landsize&lt;/em&gt;&lt;/strong&gt;: Bigger lot = usually more expensive&lt;br&gt;
-&lt;strong&gt;&lt;em&gt;BuildingArea&lt;/em&gt;&lt;/strong&gt;: Bigger house = usually more expensive&lt;/p&gt;

&lt;p&gt;&lt;code&gt;.dropna()&lt;/code&gt;- This removes any house that's missing data in our chosen columns&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Started with 13,580 houses&lt;/li&gt;
&lt;li&gt;After cleaning: about 6,196 complete houses&lt;/li&gt;
&lt;li&gt;We lose some data, but the remaining data is reliable!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why X and y?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In math, &lt;strong&gt;X&lt;/strong&gt; represents input variables and** y** is what we're solving for&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;X&lt;/strong&gt; (features) → goes into the model → y (prediction) comes out&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-world analogy:&lt;/strong&gt; It's like doing a survey - you can only use complete responses, so you filter out any surveys with missing answers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: The Critical Step - Split Your Data!
&lt;/h2&gt;

&lt;p&gt;This is where I made my first mistake, so pay close attention!&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff78fkea1r3quhddpxh74.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff78fkea1r3quhddpxh74.png" alt=" " width="800" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why split the data? Here's the critical lesson:&lt;/strong&gt;&lt;br&gt;
Imagine you're studying for a test. You memorize 10 practice questions. Then the test has those EXACT 10 questions. You ace it! But did you actually learn the material, or did you just memorize?&lt;/p&gt;

&lt;p&gt;Same with ML models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Training data&lt;/strong&gt;(75%): The model learns patterns from these houses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation data&lt;/strong&gt;(25%): We test on houses the model has NEVER seen&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What's **&lt;code&gt;random_state=1&lt;/code&gt;&lt;/strong&gt;?**&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensures we get the same random split every time&lt;/li&gt;
&lt;li&gt;Makes results reproducible (crucial for debugging!)&lt;/li&gt;
&lt;li&gt;You can use any number (1, 42, 123, etc.)
This prevents *&lt;em&gt;overfitting *&lt;/em&gt;(memorization) and ensures our model actually learned patterns, not just memorized answers!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 7: Building Your First Model - Decision Tree
&lt;/h2&gt;

&lt;p&gt;Here's where the magic happens:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwicsuvptqrp26xh0kkda.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwicsuvptqrp26xh0kkda.png" alt=" " width="563" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Breaking it down:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;DecisionTreeRegressor-&lt;/code&gt; This is our model type. Think of it as a flowchart that asks questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Does the house have more than 3 rooms?"&lt;/li&gt;
&lt;li&gt;"Is the land size larger than 200 sqm?"&lt;/li&gt;
&lt;li&gt;"Is the building area larger than 100 sqm?"&lt;/li&gt;
&lt;li&gt;Based on the answers, it navigates to a prediction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;random_state=1&lt;/code&gt;- Ensures consistent results every time&lt;/p&gt;

&lt;p&gt;&lt;code&gt;.fit(train_X, train_y)&lt;/code&gt; - This is the training! The model studies the training houses and learns patterns&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 8: Measuring Accuracy - Mean Absolute Error (MAE)
&lt;/h2&gt;

&lt;p&gt;Now let's see how good our model is:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdyahh0gamizz5l3wk6wj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdyahh0gamizz5l3wk6wj.png" alt=" " width="579" height="114"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 9: Making It Better with Random Forest
&lt;/h2&gt;

&lt;p&gt;Single decision trees are okay, but Random Forests are MUCH better. Here's the concept:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analogy:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;One decision tree&lt;/em&gt;&lt;/strong&gt; = asking one real estate agent's opinion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Random Forest&lt;/em&gt;&lt;/strong&gt; = asking 100 agents and averaging their opinions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which would you trust more? The crowd wisdom!&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fna76aidfk9g2fps0tow7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fna76aidfk9g2fps0tow7.png" alt=" " width="590" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why is Random Forest better?&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Creates many different decision trees (typically 100 trees)&lt;/li&gt;
&lt;li&gt;Each tree is trained on a slightly different subset of data&lt;/li&gt;
&lt;li&gt;Each tree makes its own prediction&lt;/li&gt;
&lt;li&gt;Final prediction = average of all trees&lt;/li&gt;
&lt;li&gt;Result: More accurate, more stable, less prone to overfitting!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The comparison:&lt;/strong&gt;&lt;br&gt;
Just by switching algorithms, we improved by over $50,000 in prediction accuracy!&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 10: Comparing Models Side by Side
&lt;/h2&gt;

&lt;p&gt;Let's see the comparison clearly:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5fy0cgrywy8ad2tao9z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5fy0cgrywy8ad2tao9z.png" alt=" " width="567" height="214"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Random Forest is clearly the winner! 🏆&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 11: Seeing Real Predictions
&lt;/h2&gt;

&lt;p&gt;Let's look at how our best model (Random Forest) actually predicts on specific houses:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9p85m5l0cxw1dwxurou5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9p85m5l0cxw1dwxurou5.png" alt=" " width="800" height="541"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 12: Train Final Model on ALL Data
&lt;/h2&gt;

&lt;p&gt;Once you're satisfied with your model's performance, train it on ALL available data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm8nmf4fyu17iesslp3nt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm8nmf4fyu17iesslp3nt.png" alt=" " width="653" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why do this?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You already validated that Random Forest works well&lt;/li&gt;
&lt;li&gt;We were holding back 25% of data for validation&lt;/li&gt;
&lt;li&gt;Now we use ALL 6,196 houses to train&lt;/li&gt;
&lt;li&gt;This makes the model even more accurate for real predictions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Think of it like:&lt;/strong&gt; You practiced with 75% of your study materials and tested yourself on 25%. Now that you know you understand the material, you study ALL of it before the real exam.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 13: Predict New House Prices!
&lt;/h2&gt;

&lt;p&gt;Now comes the fun part - predicting prices for houses not in our dataset!&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz1yq77l1l86dfyshd9qo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz1yq77l1l86dfyshd9qo.png" alt=" " width="800" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🎉 You just built a machine learning model that can predict Melbourne house prices!&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: Finding the Optimal Model Complexity
&lt;/h2&gt;

&lt;p&gt;Want to see how different tree sizes affect accuracy? Here's a bonus experiment:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fto6hf4p21e4khqk62c4l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fto6hf4p21e4khqk62c4l.png" alt=" " width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this shows:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Too few nodes (5) = too simple, misses patterns&lt;/li&gt;
&lt;li&gt;Too many nodes (500) = memorizes training data&lt;/li&gt;
&lt;li&gt;Sweet spot around 100-250 nodes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the &lt;strong&gt;overfitting vs underfitting&lt;/strong&gt; tradeoff in action!&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources That Helped Me
&lt;/h2&gt;

&lt;p&gt;Here's what I found most useful:&lt;br&gt;
&lt;strong&gt;Free Courses&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://www.kaggle.com/learn/intro-to-machine-learning" rel="noopener noreferrer"&gt;Kaggle's Intro to Machine Learning&lt;/a&gt; - Where I started (highly recommend!)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Documentation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scikit-learn docs&lt;/strong&gt; - Super clear with tons of examples&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pandas docs&lt;/strong&gt; - Essential for data manipulation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tips for Absolute Beginners
&lt;/h2&gt;

&lt;p&gt;If you're just starting out like me, here's my advice:&lt;br&gt;
&lt;strong&gt;1. You Don't Need a Math PhD&lt;/strong&gt;&lt;br&gt;
I'm not a math genius. I haven't taken calculus in years. You can still learn ML! Start with the practical stuff, the math will make more sense later.&lt;br&gt;
&lt;strong&gt;2. Code Along, Don't Just Watch&lt;/strong&gt;&lt;br&gt;
I learn by doing. Watch a tutorial, then code it yourself. Change things. Break stuff. See what happens.&lt;br&gt;
&lt;strong&gt;3. Start with Kaggle&lt;/strong&gt;&lt;br&gt;
Kaggle gives you:- Free courses with interactive coding&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Datasets ready to use&lt;/li&gt;
&lt;li&gt;A community of learners&lt;/li&gt;
&lt;li&gt;Real competitions to test your skills&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Don't Get Stuck on Theory&lt;/strong&gt;&lt;br&gt;
I spent 2 days trying to understand decision tree math. Then I just built one and it clicked. Sometimes you need to do it to get it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Learn in Public&lt;/strong&gt;&lt;br&gt;
Sharing my learning journey:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keeps me accountable&lt;/li&gt;
&lt;li&gt;Helps me remember by teaching others&lt;/li&gt;
&lt;li&gt;Connects me with other learners&lt;/li&gt;
&lt;li&gt;Creates a portfolio of my progress&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;6. It's Okay to Be Confused&lt;/strong&gt;&lt;br&gt;
I was confused for 90% of this week. That's normal! Push through it. Things will click.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Truth About Learning ML
&lt;/h2&gt;

&lt;p&gt;Let me be real with you:&lt;br&gt;
&lt;strong&gt;It's not as hard as you think.&lt;/strong&gt; The basics of ML are surprisingly accessible. You don't need to understand complex math to build your first models.&lt;br&gt;
&lt;strong&gt;It's harder than it looks.&lt;/strong&gt; There's a lot of trial and error. Your first models will probably be bad. That's okay!&lt;br&gt;
&lt;strong&gt;It's incredibly rewarding.&lt;/strong&gt; When your model makes its first decent prediction, it feels like magic (even though you know it's not).&lt;br&gt;
&lt;strong&gt;It's an ongoing journey.&lt;/strong&gt; A week in, I've barely scratched the surface. There's so much more to learn, and that's exciting!&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;A week ago, I thought machine learning was this impossibly complex field. Today, I built a model that can predict house prices with reasonable accuracy.&lt;/p&gt;

&lt;p&gt;Is it perfect? No.&lt;br&gt;
Am I an expert? Definitely not.&lt;br&gt;
Did I learn a ton and have fun doing it? Absolutely!&lt;/p&gt;

&lt;p&gt;If you've been curious about ML but haven't taken the first step—this is your sign. &lt;strong&gt;Start today.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;You don't need to be ready, you don't need to know everything, you just need to begin.&lt;/p&gt;

&lt;p&gt;The best time to start learning ML was yesterday. The second best time is right now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let's do this! 🚀&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thanks for reading! Now go build something cool! 💻✨&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>kaggle</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>From Beginner to Cyber-Aware: Lessons from My First Cybersecurity Course</title>
      <dc:creator>Princess Mae Sanchez</dc:creator>
      <pubDate>Fri, 31 Oct 2025 01:08:54 +0000</pubDate>
      <link>https://forem.com/cessamaeeee/from-beginner-to-cyber-aware-lessons-from-my-first-cybersecurity-course-469l</link>
      <guid>https://forem.com/cessamaeeee/from-beginner-to-cyber-aware-lessons-from-my-first-cybersecurity-course-469l</guid>
      <description>&lt;h2&gt;
  
  
  I Just Completed Cisco's Introduction to Cybersecurity Course – Here's What I Learned 🔐
&lt;/h2&gt;

&lt;p&gt;Over the past few weeks, I dove deep into the world of cybersecurity through Cisco's Introduction to Cybersecurity course. As someone who believes in learning in public, I wanted to share my key takeaways and reflections from this journey.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Cybersecurity Matters (Now More Than Ever)
&lt;/h2&gt;

&lt;p&gt;Before this course, I knew cybersecurity was important, but I didn't realize just how critical it is at every level:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Personal Level:&lt;/strong&gt; Our identities, banking details, and private conversations are all digital now&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Organizational Level:&lt;/strong&gt; Companies face attacks that could destroy their reputation overnight&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Government Level:&lt;/strong&gt; National security and economic stability are at stake&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reality? We're all targets. But knowledge is power, and understanding threats is the first step to protection.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Concepts That Changed My Perspective
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The CIA Triad (Not That CIA!)
&lt;/h3&gt;

&lt;p&gt;One of the foundational concepts I learned is the CIA Triad:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Confidentiality&lt;/strong&gt; – Keeping information private&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrity&lt;/strong&gt; – Ensuring data isn't tampered with&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Availability&lt;/strong&gt; – Making sure authorized users can access what they need&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This simple framework now influences how I think about every digital interaction. Is my data confidential? Can I trust its integrity? Will it be available when I need it?&lt;/p&gt;

&lt;h3&gt;
  
  
  2. You're Only as Strong as Your Weakest Link
&lt;/h3&gt;

&lt;p&gt;The course highlighted something crucial: &lt;strong&gt;most attacks exploit human psychology, not just technology.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Social engineering attacks like phishing and pretexting work because they target our natural instincts to be helpful or our fear of authority. The best firewall in the world can't protect against someone who gives away their password to a convincing fake email.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This taught me that technical skills alone aren't enough – security awareness and critical thinking are equally important.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  3. The Malware Ecosystem is Terrifyingly Sophisticated
&lt;/h3&gt;

&lt;p&gt;I learned about the different types of malware and was genuinely shocked by how sophisticated they've become:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ransomware&lt;/strong&gt; that holds your data hostage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rootkits&lt;/strong&gt; that hide so deep they're nearly impossible to detect&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worms&lt;/strong&gt; that spread automatically without human interaction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💡 &lt;strong&gt;Fun fact:&lt;/strong&gt; The Code Red worm infected 300,000 servers in just 19 hours back in 2001. Imagine what today's malware can do with faster internet speeds and more connected devices!&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Defense in Depth is Not Optional
&lt;/h3&gt;

&lt;p&gt;One major lesson: there's no single "silver bullet" for cybersecurity. Organizations need multiple layers of protection:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Firewalls&lt;/li&gt;
&lt;li&gt;Intrusion Detection/Prevention Systems (IDS/IPS)&lt;/li&gt;
&lt;li&gt;Encryption&lt;/li&gt;
&lt;li&gt;Access controls&lt;/li&gt;
&lt;li&gt;User training&lt;/li&gt;
&lt;li&gt;Regular backups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it like a medieval castle: walls, moat, guards, and locked doors. If attackers breach one layer, others are there to stop them. 🏰&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Changes I'm Making
&lt;/h2&gt;

&lt;p&gt;This course wasn't just theoretical – it inspired immediate action:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implemented a password manager&lt;/strong&gt; – No more reusing passwords!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enabled Two-Factor Authentication (2FA)&lt;/strong&gt; on all critical accounts&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Started using a VPN&lt;/strong&gt; on public Wi-Fi&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Set up the 3-2-1 backup rule&lt;/strong&gt; – 3 copies of data, 2 different media types, 1 off-site&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Became more skeptical&lt;/strong&gt; of unsolicited emails and requests&lt;/p&gt;




&lt;h2&gt;
  
  
  What Surprised Me Most
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Penetration testing is a legitimate career!&lt;/strong&gt; Companies actually hire ethical hackers (white hat hackers) to break into their systems and find vulnerabilities before the bad guys do. The systematic approach – planning, scanning, gaining access, maintaining access, and reporting – is like a puzzle that helps organizations stay secure.&lt;/p&gt;

&lt;p&gt;Also, the legal and ethical considerations around cybersecurity are complex. Just because you &lt;em&gt;can&lt;/em&gt; hack something doesn't mean you &lt;em&gt;should&lt;/em&gt;. The skills learned must always be used within legal and ethical bounds.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaway: Cybersecurity is Everyone's Responsibility
&lt;/h2&gt;

&lt;p&gt;The biggest lesson? &lt;strong&gt;Cybersecurity isn't just for IT professionals.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Whether you're a developer, designer, marketer, or manager, you play a role in keeping your organization secure. Every employee is both a potential target and a line of defense.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next for Me?
&lt;/h2&gt;

&lt;p&gt;This course has sparked a genuine interest in cybersecurity. I'm planning to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Continue exploring more advanced Cisco courses&lt;/li&gt;
&lt;li&gt;Practice with hands-on labs and tools&lt;/li&gt;
&lt;li&gt;Stay updated on the latest threats and trends&lt;/li&gt;
&lt;li&gt;Share what I learn with my network&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I'm also thinking about how cybersecurity principles apply to my current role. How can I build security awareness into everything I do? How can I advocate for better security practices in my team?&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources That Helped Me
&lt;/h2&gt;

&lt;p&gt;For anyone interested in starting their cybersecurity journey:&lt;/p&gt;

&lt;p&gt;📚 &lt;a href="https://www.netacad.com/courses/introduction-to-cybersecurity?courseLang=en-US" rel="noopener noreferrer"&gt;Cisco's Introduction to Cybersecurity course&lt;/a&gt; (free and beginner-friendly!)&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;If you're on the fence about exploring cybersecurity, I encourage you to take the leap. The field is fascinating, constantly evolving, and increasingly critical.&lt;/p&gt;

&lt;p&gt;In a world where data breaches make headlines weekly and ransomware attacks can cripple entire organizations, understanding cybersecurity isn't optional – it's essential.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's your experience with cybersecurity? Have you taken any courses or certifications? Drop a comment below – I'd love to hear your thoughts!&lt;/strong&gt; 💬&lt;/p&gt;




&lt;p&gt;🔐 Stay safe online, everyone!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;P.S. If you found this helpful, feel free to bookmark it or share it with someone starting their cybersecurity journey!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cybersecurity</category>
      <category>cisco</category>
      <category>community</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
