<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Mukul Wadhwa</title>
    <description>The latest articles on Forem by Mukul Wadhwa (@muhcool).</description>
    <link>https://forem.com/muhcool</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3679570%2F57704138-2512-4e2f-a0e0-c4260c3cd148.jpg</url>
      <title>Forem: Mukul Wadhwa</title>
      <link>https://forem.com/muhcool</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/muhcool"/>
    <language>en</language>
    <item>
      <title>Setting Up a Data Science Environment on the Cloud (Without the Usual Setup Pain)</title>
      <dc:creator>Mukul Wadhwa</dc:creator>
      <pubDate>Fri, 26 Dec 2025 12:31:07 +0000</pubDate>
      <link>https://forem.com/muhcool/setting-up-a-data-science-environment-on-the-cloud-without-the-usual-setup-pain-410a</link>
      <guid>https://forem.com/muhcool/setting-up-a-data-science-environment-on-the-cloud-without-the-usual-setup-pain-410a</guid>
      <description>&lt;p&gt;If you’ve worked with data science or machine learning, you already know this part is not fun: &lt;/p&gt;

&lt;p&gt;Installing Python packages &lt;br&gt;
Fixing dependency conflicts &lt;br&gt;
Matching library versions &lt;br&gt;
Repeating the same setup on every new machine &lt;br&gt;
Before you even write your first line of actual ML code, you’ve already burned an hour. &lt;/p&gt;

&lt;p&gt;In this post, I’ll walk through: &lt;/p&gt;

&lt;p&gt;what a practical data science environment actually needs &lt;br&gt;
common mistakes people make during setup &lt;br&gt;
and one clean way to avoid the whole mess on cloud VMs &lt;br&gt;
This is written from a hands-on infrastructure perspective. &lt;/p&gt;

&lt;p&gt;What a Real Data Science Environment Needs &lt;/p&gt;

&lt;p&gt;A usable data science setup is more than “Python installed”. &lt;/p&gt;

&lt;p&gt;At minimum, you usually need: &lt;/p&gt;

&lt;p&gt;Core data &amp;amp; numerical stack &lt;/p&gt;

&lt;p&gt;NumPy &lt;br&gt;
Pandas &lt;br&gt;
SciPy &lt;br&gt;
Visualization &lt;/p&gt;

&lt;p&gt;Matplotlib &lt;br&gt;
Seaborn &lt;br&gt;
Plotly &lt;br&gt;
Machine learning &lt;/p&gt;

&lt;p&gt;Scikit-learn &lt;br&gt;
XGBoost / LightGBM / CatBoost &lt;br&gt;
Deep learning (CPU or GPU) &lt;/p&gt;

&lt;p&gt;PyTorch &lt;br&gt;
TensorFlow / Keras &lt;br&gt;
Notebooks &amp;amp; dev tools &lt;/p&gt;

&lt;p&gt;JupyterLab &lt;br&gt;
IPython &lt;br&gt;
Requests, tqdm, etc. &lt;br&gt;
Database connectivity &lt;/p&gt;

&lt;p&gt;Most real projects also pull data from: &lt;/p&gt;

&lt;p&gt;PostgreSQL / MySQL &lt;br&gt;
MongoDB &lt;br&gt;
Which means you need client libraries, not just Python itself. &lt;/p&gt;

&lt;p&gt;Missing any of these usually leads to: &lt;br&gt;
“ModuleNotFoundError” &lt;br&gt;
“Version conflict” &lt;br&gt;
“Works on my machine” &lt;/p&gt;

&lt;p&gt;Why Local Setup Becomes Painful &lt;/p&gt;

&lt;p&gt;Local environments break down fast when: &lt;/p&gt;

&lt;p&gt;you switch machines &lt;br&gt;
you collaborate with others &lt;br&gt;
you need more RAM or CPU &lt;br&gt;
you reinstall your OS &lt;br&gt;
Conda helps, Docker helps, but both still require: &lt;/p&gt;

&lt;p&gt;learning curves &lt;br&gt;
maintenance &lt;br&gt;
debugging broken environments &lt;br&gt;
For many people, the problem isn’t coding — it’s environment reliability. &lt;/p&gt;

&lt;p&gt;A Cleaner Approach: Pre-Configured Cloud VMs &lt;/p&gt;

&lt;p&gt;One approach that’s worked well for me is using a pre-configured cloud VM where: &lt;/p&gt;

&lt;p&gt;the OS is already set up &lt;br&gt;
common data science and ML libraries are pre-installed &lt;br&gt;
database connectors are ready &lt;br&gt;
SSH access works out of the box &lt;br&gt;
You spin it up, SSH in, and start coding. &lt;/p&gt;

&lt;p&gt;No fighting pip. &lt;br&gt;
No rebuilding environments. &lt;br&gt;
No “let me install this first”. &lt;/p&gt;

&lt;p&gt;This is especially useful when: &lt;/p&gt;

&lt;p&gt;experimenting quickly &lt;br&gt;
onboarding new teammates &lt;br&gt;
running heavier workloads than a laptop can handle &lt;/p&gt;

&lt;p&gt;What to Look for in a Data Science VM &lt;/p&gt;

&lt;p&gt;If you go this route, make sure the VM actually provides: &lt;/p&gt;

&lt;p&gt;20+ commonly used Python data science and ML libraries &lt;br&gt;
SQL and MongoDB client connectors &lt;br&gt;
SSH access with full control &lt;br&gt;
Scalable CPU and RAM &lt;br&gt;
No forced managed services you didn’t ask for &lt;br&gt;
GPU support is a bonus — but only matters if you truly need it. &lt;/p&gt;

&lt;p&gt;A Practical Example (Disclosure) &lt;/p&gt;

&lt;p&gt;I recently set this up internally as a Data Science VM with: &lt;/p&gt;

&lt;p&gt;pre-installed Python data science and machine learning stack &lt;br&gt;
SQL and MongoDB connectors &lt;br&gt;
SSH access &lt;br&gt;
scalable resources &lt;br&gt;
If you’re curious what that looks like in practice, here’s a reference implementation: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://manage.digirdp.com/store/data-science-vm" rel="noopener noreferrer"&gt;https://manage.digirdp.com/store/data-science-vm&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Disclosure: this is a product from my own infrastructure setup. Linking for reference, not as a requirement. &lt;/p&gt;

&lt;p&gt;Final Thoughts &lt;/p&gt;

&lt;p&gt;Tooling should get out of your way, not slow you down. &lt;/p&gt;

&lt;p&gt;Whether you: &lt;/p&gt;

&lt;p&gt;build your own base image &lt;br&gt;
use a managed platform &lt;br&gt;
or run a pre-configured VM &lt;br&gt;
the goal is the same: &lt;/p&gt;

&lt;p&gt;Spend time on data and models, not environment firefighting. &lt;/p&gt;

&lt;p&gt;If you’ve found cleaner ways to manage data science environments, I’d love to hear them in the comments. &lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
