<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Victor Chaba</title>
    <description>The latest articles on Forem by Victor Chaba (@chabavictor).</description>
    <link>https://forem.com/chabavictor</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F940330%2Fd8c57883-4eed-4db3-b825-ff14ae824a87.jpeg</url>
      <title>Forem: Victor Chaba</title>
      <link>https://forem.com/chabavictor</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/chabavictor"/>
    <language>en</language>
    <item>
      <title>Generic Folder Structure for your Machine Learning Projects.</title>
      <dc:creator>Victor Chaba</dc:creator>
      <pubDate>Mon, 28 Aug 2023 09:52:12 +0000</pubDate>
      <link>https://forem.com/luxdevhq/generic-folder-structure-for-your-machine-learning-projects-4coe</link>
      <guid>https://forem.com/luxdevhq/generic-folder-structure-for-your-machine-learning-projects-4coe</guid>
      <description>&lt;p&gt;A well-organized structure for machine learning projects facilitates comprehension and modification. Furthermore, employing a consistent structure across multiple projects minimizes confusion. Since there is no one-size-fits-all solution, we will look at three methods; a manual folder and file creation, a custom-made &lt;code&gt;template.py&lt;/code&gt; file and the &lt;a href="https://www.cookiecutter.io/templates" rel="noopener noreferrer"&gt;Cookiecutter&lt;/a&gt; package to establish a machine-learning project structure.&lt;/p&gt;

&lt;h5&gt;
  
  
  ... where human hands dance and minds orchestrate, we embark on a journey devoid of automation.
&lt;/h5&gt;

&lt;p&gt;&lt;u&gt;The manual execution, in short&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Project Root:&lt;/strong&gt; This is the main folder that contains your entire machine learning project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Data:&lt;/strong&gt; This folder is dedicated to storing your datasets and any relevant data files. It can be further divided into subfolders such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Raw:&lt;/em&gt; Contains the original, unprocessed data files.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Processed:&lt;/em&gt; Contains preprocessed data that has undergone cleaning, transformation, and feature engineering.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;External:&lt;/em&gt; Store any external data sources that you use for your project.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Notebooks:&lt;/strong&gt; This folder is for Jupyter notebooks or any other interactive notebooks you use for experimentation, analysis, and model development. You can organize it with subfolders like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; &lt;em&gt;Exploratory:&lt;/em&gt; Notebook(s) for data exploration and visualization.&lt;/li&gt;
&lt;li&gt; &lt;em&gt;Modeling:&lt;/em&gt; Notebook(s) for model development, training, and evaluation.&lt;/li&gt;
&lt;li&gt; &lt;em&gt;Inference:&lt;/em&gt; Notebook(s) for deploying and using trained models for predictions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Scripts:&lt;/strong&gt; This folder contains reusable code scripts or modules that you use in your project. It may include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Preprocessing:&lt;/em&gt; Scripts for data cleaning, transformation, and feature engineering.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Model:&lt;/em&gt; Scripts for defining and training machine learning models.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Evaluation:&lt;/em&gt; Scripts for model evaluation, metrics calculation, and validation.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Utilities:&lt;/em&gt; General-purpose utility scripts or helper functions.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Models:&lt;/em&gt; This folder is dedicated to storing trained models or model checkpoints. It can be further organized into subfolders based on different experiments, versions, or architectures.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Documentation:&lt;/strong&gt; Include any project-related documentation, such as README files, data dictionaries, or project specifications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results:&lt;/strong&gt; Store output files, reports, or visualizations generated by your models or experiments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Config:&lt;/strong&gt; Store configuration files or parameters used in your project, such as hyperparameters, model configurations, or experiment settings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Environment:&lt;/strong&gt; Include files related to the project environment, such as &lt;code&gt;requirements.txt&lt;/code&gt; or &lt;code&gt;environment.yml&lt;/code&gt;, specifying the dependencies and packages required to run your project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tests:&lt;/strong&gt; If you have unit tests or integration tests for your code, you can create a folder to store them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logs:&lt;/strong&gt; Store log files or output logs generated during training or inference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Saved Objects:&lt;/strong&gt; If your project involves saving intermediate objects or serialized data, such as pickled files or serialized models, you can create a folder to store them.&lt;/p&gt;

&lt;h5&gt;
  
  
  ...where machines command and algorithms dictate, we venture into a realm free from human intervention.
&lt;/h5&gt;

&lt;p&gt;&lt;u&gt;Template.py&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;template.py&lt;/code&gt; file serves as a foundational blueprint or starting point for developing Python code within a machine learning project. It typically contains a set of predefined structures, functions, and placeholders that you can customize and expand upon to build specific functionality.&lt;br&gt;
Below is an example that I commonly use. Copy the code, save it as template.py, then run it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
from pathlib import Path
import logging

logging.basicConfig(level=logging.INFO, format='[%(asctime)s]: %(message)s:')


project_name = "textSummarizer"

list_of_files = [
    ".github/workflows/.gitkeep",
    f"src/{project_name}/__init__.py",
    f"src/{project_name}/conponents/__init__.py",
    f"src/{project_name}/utils/__init__.py",
    f"src/{project_name}/utils/common.py",
    f"src/{project_name}/logging/__init__.py",
    f"src/{project_name}/config/__init__.py",
    f"src/{project_name}/config/configuration.py",
    f"src/{project_name}/pipeline/__init__.py",
    f"src/{project_name}/entity/__init__.py",
    f"src/{project_name}/constants/__init__.py",
    "config/config.yaml",
    "params.yaml",
    "app.py",
    "main.py",
    "Dockerfile",
    "requirements.txt",
    "setup.py",
    "research/trials.ipynb",

]


for filepath in list_of_files:
    filepath = Path(filepath)
    filedir, filename = os.path.split(filepath)

    if filedir != "":
        os.makedirs(filedir, exist_ok=True)
        logging.info(f"Creating directory:{filedir} for the file {filename}")


    if (not os.path.exists(filepath)) or (os.path.getsize(filepath) == 0):
        with open(filepath,'w') as f:
            pass
            logging.info(f"Creating empty file: {filepath}")



    else:
        logging.info(f"{filename} is already exists")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your folder structure should resemble something like this👇&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl7ovesmy710w6x4yakhv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl7ovesmy710w6x4yakhv.png" alt="template.py structure" width="800" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;a href="https://www.cookiecutter.io/templates" rel="noopener noreferrer"&gt;The cookiecutter&lt;/a&gt;&lt;/u&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Make sure that you have the latest python and pip installed in your environment. &lt;/li&gt;
&lt;li&gt;Install cookiecutter
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install cookiecutter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;3: Create a sample repository on github.com (e.g., my-test)&lt;/p&gt;

&lt;p&gt;Note: Don’t check any options under ‘Initialize this repository with:’ while creating a repository.&lt;/p&gt;

&lt;p&gt;4: Create a project structure&lt;/p&gt;

&lt;p&gt;Go to a folder where you want to set up the project in your local system and run the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the above command and it would ask you the following:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;You've downloaded \.cookiecutters\cookiecutter-data-science before. Is it okay to delete and re-download it? [yes]:yes&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;It will ask the following options:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;project_name [project_name]: my-testrepo_name [my-test]: my-testauthor_name [Your name (or your organization/company/team)]: Your namedescription [A short description of the project.]: This is a test projSelect open_source_license:&lt;br&gt;
1 - MIT&lt;br&gt;
2 - BSD-3-Clause&lt;br&gt;
3 - No license file&lt;br&gt;
Choose from 1, 2, 3 [1]: 1s3_bucket [[OPTIONAL] your-bucket-for-syncing-data (do not include 's3://')]:aws_profile [default]:Select python_interpreter:&lt;br&gt;
1 - python3&lt;br&gt;
2 - python&lt;br&gt;
Choose from 1, 2 [1]: 1&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You can ignore the ‘s3_bucket’ and ‘aws_profile’ options.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add project to the git repository&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code&gt;cd my-test// Initialize the git&lt;br&gt;
git init// Add all the files and folder&lt;br&gt;
git add .// Commit the files&lt;br&gt;
git commit -m "Initialized the repo with cookiecutter data science structure"// Set the remote repo URL&lt;br&gt;
git remote add origin https://github.com/your_user_id/my-test.git&lt;br&gt;
git remote -v// Push to changes from local repo to github&lt;br&gt;
git push origin master&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The final structure should look like below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrngbrozkwb9pz59y67f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrngbrozkwb9pz59y67f.png" alt="Coockiecutter temp" width="700" height="829"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The  &lt;em&gt;data&lt;/em&gt; folder  will be in your local folder and won’t appear in github. This is becous it will be in the &lt;code&gt;.gitignore&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;Remember, these are just but suggested structures, and you can modify them according to your specific needs and preferences. The key is to maintain a logical and organized layout that makes it easy to navigate and understand your project.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Cover photo from ccjk.com&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>folderstructure</category>
    </item>
    <item>
      <title>Understanding the Differences: Fine-Tuning vs. Transfer Learning</title>
      <dc:creator>Victor Chaba</dc:creator>
      <pubDate>Fri, 25 Aug 2023 13:31:10 +0000</pubDate>
      <link>https://forem.com/luxdevhq/understanding-the-differences-fine-tuning-vs-transfer-learning-370</link>
      <guid>https://forem.com/luxdevhq/understanding-the-differences-fine-tuning-vs-transfer-learning-370</guid>
      <description>&lt;p&gt;In the world of machine learning and deep learning, two popular techniques often used to leverage pre-trained models are fine-tuning and transfer learning. These approaches allow us to benefit from the knowledge and expertise captured in pre-existing models. In this article, we will delve into the details of both techniques, highlighting their differences and showcasing Python code snippets to help you understand their implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transfer Learning: A Brief Overview&lt;/strong&gt;&lt;br&gt;
Transfer learning involves using a pre-trained model as a starting point for a new task or domain. The idea is to leverage the knowledge acquired by the pre-trained model on a large dataset and apply it to a related task with a smaller dataset. By doing so, we can benefit from the general features and patterns learned by the pre-trained model, saving time and computational resources.&lt;/p&gt;

&lt;p&gt;Transfer learning typically involves two main steps:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Feature Extraction:&lt;/strong&gt;&lt;/em&gt; In this step, we use the pre-trained model as a fixed feature extractor. We remove the final layers responsible for classification and replace them with new layers that are specific to our task. The pre-trained model’s weights are frozen, and only the weights of the newly added layers are trained on the smaller dataset.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Fine-Tuning:&lt;/em&gt;&lt;/strong&gt; Fine-tuning takes the process a step further by unfreezing some of the pre-trained model’s layers and allowing them to be updated with the new dataset. This step enables the model to adapt and learn more specific features related to the new task or domain.&lt;br&gt;
Now, let’s take a closer look at the implementation of transfer learning using Python code snippets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Model

# Load the pre-trained VGG16 model
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# Freeze the weights of the pre-trained layers
for layer in base_model.layers:
    layer.trainable = False
# Add new classification layers
x = Flatten()(base_model.output)
x = Dense(256, activation='relu')(x)
output = Dense(num_classes, activation='softmax')(x)
# Create the new model
model = Model(inputs=base_model.input, outputs=output)
# Compile and train the model on the new dataset
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10, validation_data=(val_images, val_labels))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the code snippet above, we use the VGG16 model, a popular pre-trained model for image classification, as our base model. We freeze the weights of the pre-trained layers, add new classification layers on top of the base model, and compile the new model for training. The model is then trained on the new dataset, leveraging the pre-trained weights as a starting point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fine-Tuning: A Closer Look&lt;/strong&gt;&lt;br&gt;
While transfer learning involves freezing the pre-trained model’s weights and only training the new layers, fine-tuning takes it a step further by allowing the pre-trained layers to be updated. This additional step is beneficial when the new dataset is large enough and similar to the original dataset on which the pre-trained model was trained.&lt;/p&gt;

&lt;p&gt;Fine-tuning involves the following steps:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Feature Extraction:&lt;/strong&gt;&lt;/em&gt; Similar to transfer learning, we use the pre-trained model as a feature extractor. We replace the final classification layers with new layers specific to our task and freeze the weights of the pre-trained layers.&lt;br&gt;
&lt;em&gt;&lt;strong&gt;Fine-Tuning:&lt;/strong&gt;&lt;/em&gt; In this step, we unfreeze some of the pre-trained layers and allow them to be updated during training. This process enables the model to learn more task-specific features while preserving the general knowledge acquired from the original dataset.&lt;br&gt;
Now, let’s explore the implementation of fine-tuning using Python code snippets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Model

# Load the pre-trained VGG16 model
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# Freeze the initial layers and fine-tune the later layers
for layer in base_model.layers[:15]:
    layer.trainable = False
for layer in base_model.layers[15:]:
    layer.trainable = True
# Add new classification layers
x = Flatten()(base_model.output)
x = Dense(256, activation='relu')(x)
output = Dense(num_classes, activation='softmax')(x)
# Create the new model
model = Model(inputs=base_model.input, outputs=output)
# Compile and train the model on the new dataset
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10, validation_data=(val_images, val_labels))

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above code snippet, we again use the VGG16 model as our base model and follow the same steps as in transfer learning to replace the classification layers and freeze the initial layers. However, in fine-tuning, we unfreeze some of the later layers to allow them to be updated during training. This way, the model can learn more task-specific features while still benefiting from the pre-trained weights.&lt;/p&gt;

&lt;p&gt;Key Differences between Fine-Tuning and Transfer Learning&lt;br&gt;
Now that we have explored the implementation of both fine-tuning and transfer learning, let’s summarize the key differences between the two techniques:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training Approach:&lt;/strong&gt; In transfer learning, we freeze all the pre-trained layers and only train the new layers added on top. In fine-tuning, we unfreeze some of the pre-trained layers and allow them to be updated during training.&lt;br&gt;
&lt;strong&gt;Domain Similarity:&lt;/strong&gt; Transfer learning is suitable when the new task or domain is somewhat similar to the original task or domain on which the pre-trained model was trained. Fine-tuning is more effective when the new dataset is large enough and closely related to the original dataset.&lt;br&gt;
&lt;strong&gt;Computational Resources:&lt;/strong&gt; Transfer learning requires fewer computational resources since only the new layers are trained. Fine-tuning, on the other hand, may require more resources, especially if we unfreeze and update a significant number of pre-trained layers.&lt;br&gt;
&lt;strong&gt;Training Time:&lt;/strong&gt; Transfer learning generally requires less training time since we are training fewer parameters. Fine-tuning may take longer, especially if we are updating a larger number of pre-trained layers.&lt;br&gt;
&lt;strong&gt;Dataset Size:&lt;/strong&gt; Transfer learning is effective when the new dataset is small, as it leverages the pre-trained model’s knowledge on a large dataset. Fine-tuning is more suitable for larger datasets, as it allows the model to learn more specific features related to the new task.&lt;br&gt;
It’s important to note that the choice between fine-tuning and transfer learning depends on the specific task, dataset, and available computational resources. Experimentation and evaluation are key to determining the most effective approach for a given scenario.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
Fine-tuning and transfer learning are powerful techniques that allow us to leverage pre-trained models in machine learning and deep learning tasks. While transfer learning freezes all the pre-trained layers and only trains the new layers, fine-tuning goes a step further by allowing the pre-trained layers to be updated. Both techniques have their advantages and are suitable for different scenarios.&lt;/p&gt;

&lt;p&gt;By understanding the differences between these techniques, you can make informed decisions when applying them to your own machine learning projects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What Is Transfer Learning? [Examples &amp;amp; Newbie-Friendly Guide]&lt;br&gt;
&lt;a href="http://www.v7labs.com" rel="noopener noreferrer"&gt;www.v7labs.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hands-on Transfer Learning with Keras and the VGG16 Model&lt;br&gt;
&lt;a href="http://www.learndatasci.com" rel="noopener noreferrer"&gt;www.learndatasci.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Transfer Learning and Fine Tuning&lt;br&gt;
&lt;a href="http://www.scaler.com" rel="noopener noreferrer"&gt;www.scaler.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>finetuning</category>
      <category>deeplearning</category>
      <category>transferlearning</category>
    </item>
  </channel>
</rss>
