<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Luca Liu</title>
    <description>The latest articles on Forem by Luca Liu (@luca1iu).</description>
    <link>https://forem.com/luca1iu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg</url>
      <title>Forem: Luca Liu</title>
      <link>https://forem.com/luca1iu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/luca1iu"/>
    <language>en</language>
    <item>
      <title>Stop Using Spark for Your Small Data - Why Azure Functions is the Right Tool for the Job</title>
      <dc:creator>Luca Liu</dc:creator>
      <pubDate>Wed, 06 May 2026 09:22:57 +0000</pubDate>
      <link>https://forem.com/luca1iu/stop-using-spark-for-your-small-data-why-azure-functions-is-the-right-tool-for-the-job-4j66</link>
      <guid>https://forem.com/luca1iu/stop-using-spark-for-your-small-data-why-azure-functions-is-the-right-tool-for-the-job-4j66</guid>
      <description>&lt;p&gt;As a data analyst, my job is to get data from A to B, cleaned and ready for use. A common workflow for my team involves users uploading Excel files to a &lt;a href="https://www.microsoft.com/de-de/microsoft-365/onedrive/online-cloud-storage?market=de" rel="noopener noreferrer"&gt;OneDrive&lt;/a&gt; folder. A &lt;a href="//microsoft.com/de-de/power-platform/products/power-automate"&gt;Power Automate&lt;/a&gt; flow then syncs these files daily to a container in our &lt;a href="https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview" rel="noopener noreferrer"&gt;Azure Storage Account&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;From there, my responsibility begins:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read the new Excel file from Blob Storage using Python.&lt;/li&gt;
&lt;li&gt;Process the data (clean, transform, apply business logic).&lt;/li&gt;
&lt;li&gt;Write the final data to an Azure SQL Database.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I needed this to run on two triggers: a &lt;strong&gt;time schedule&lt;/strong&gt; (e.g., every morning at 7 AM) and an &lt;strong&gt;event-driven&lt;/strong&gt; trigger (i.e., as soon as a new file lands in the container).&lt;/p&gt;

&lt;p&gt;My first thought was to use the "big data" tools I'd heard of: &lt;a href="https://azure.microsoft.com/de-de/products/databricks" rel="noopener noreferrer"&gt;&lt;strong&gt;Azure Databricks&lt;/strong&gt;&lt;/a&gt; or &lt;a href="https://azure.microsoft.com/de-de/products/synapse-analytics" rel="noopener noreferrer"&gt;&lt;strong&gt;Azure Synapse Analytics&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  The "Big Tool" Trap
&lt;/h1&gt;

&lt;p&gt;On the surface, Databricks and Synapse are perfect.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They let me write Python in a &lt;strong&gt;Notebook&lt;/strong&gt;, which I'm very comfortable with.&lt;/li&gt;
&lt;li&gt;They have easy-to-use &lt;strong&gt;trigger&lt;/strong&gt; and &lt;strong&gt;monitoring&lt;/strong&gt; tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I set up a proof-of-concept, and it worked. But I quickly realized a problem. My Excel files are 10MB, not 10TB.&lt;/p&gt;

&lt;p&gt;Using a full Spark cluster (which is what both Databricks and Synapse Notebooks run on) was like &lt;strong&gt;using a sledgehammer to crack a nut&lt;/strong&gt;. I was paying for a powerful, multi-node cluster (which took 5-10 minutes to "cold start") just to run a Python script that finished in 30 seconds. The cost was going to be far too high for such a simple task.&lt;/p&gt;

&lt;h1&gt;
  
  
  The "Right Tool": Azure Functions
&lt;/h1&gt;

&lt;p&gt;After some research, I found the perfect tool for small-to-medium data tasks: &lt;strong&gt;Azure Functions&lt;/strong&gt;.&lt;br&gt;
Azure Functions, when used on a "Consumption Plan," is a true "serverless" service. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It's cheap:&lt;/strong&gt; You get a generous free grant every month, and after that, you pay &lt;em&gt;only&lt;/em&gt; for the seconds your code is actually running. For my task, the cost is practically $0.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's fast:&lt;/strong&gt; It starts in seconds (or less), not minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's perfect for triggers:&lt;/strong&gt; It has built-in triggers for exactly my needs (Timer and Blob Storage).&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  The (Small) Learning Curve
&lt;/h1&gt;

&lt;p&gt;The one trade-off is that it's &lt;em&gt;slightly&lt;/em&gt; more complex than a notebook. You can't just write and run your code in a web browser. The modern, recommended workflow is to use &lt;strong&gt;Visual Studio Code (VS Code)&lt;/strong&gt; to develop your code locally and then "deploy" (push) it to the cloud.&lt;/p&gt;

&lt;p&gt;This "local development" workflow is a best practice. It means you have a copy of your code, can use source control (like Git), and can test everything on your machine before it goes live.&lt;/p&gt;
&lt;h1&gt;
  
  
  More Than Just Timers
&lt;/h1&gt;

&lt;p&gt;My needs were simple, but Azure Functions has triggers for almost anything. The most popular ones include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Timer Trigger:&lt;/strong&gt; Runs on a schedule (e.g., &lt;code&gt;0 7 * * 1&lt;/code&gt; for 7 AM every Monday).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blob Trigger:&lt;/strong&gt; Runs when a new file is uploaded to a storage container.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP Trigger:&lt;/strong&gt; Runs when it receives a web request (creating a simple API).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Queue Trigger:&lt;/strong&gt; Runs when a new message is added to a storage queue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can see the full list on the official &lt;a href="https://learn.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings" rel="noopener noreferrer"&gt;Microsoft Azure Functions Triggers and Bindings documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Databricks and Synapse are amazing, powerful tools, but they are not the answer for everything. For our team's daily Excel processing, using them was costing us time and money.&lt;/p&gt;

&lt;p&gt;By investing a little time to learn the VS Code + Azure Functions workflow, we built a solution that is faster, more efficient, and costs a fraction of the price. &lt;strong&gt;Don't pay for a Spark cluster when all you need is a 30-second Python script.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Luca Liu&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>azure</category>
      <category>dataanalyst</category>
      <category>functions</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Data Analyst: Does Your Work Actually Matter?</title>
      <dc:creator>Luca Liu</dc:creator>
      <pubDate>Wed, 06 May 2026 09:22:37 +0000</pubDate>
      <link>https://forem.com/luca1iu/data-analyst-does-your-work-actually-matter-3in2</link>
      <guid>https://forem.com/luca1iu/data-analyst-does-your-work-actually-matter-3in2</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;I recently saw a question on Reddit that stopped me in my tracks: "Do you feel your work in data analysis is valuable to the organization you work for?"&lt;/p&gt;

&lt;p&gt;It is the question that haunts every data analyst.&lt;/p&gt;

&lt;p&gt;We spend hours cleaning data and building complex dashboards. We send them out into the void. And then... silence. We wonder: Is anyone actually reading this? Does this dashboard change anything?&lt;/p&gt;

&lt;p&gt;If you are just answering ad-hoc requests, the answer is often "no."&lt;/p&gt;

&lt;h1&gt;
  
  
  The Trap of "Saving Time"
&lt;/h1&gt;

&lt;p&gt;Many analysts get stuck in the "automation trap." A colleague from another department asks you to automate their manual workflow. You do it. They are happy because they save two hours a week.&lt;/p&gt;

&lt;p&gt;You feel useful. But does the company see the value?&lt;/p&gt;

&lt;p&gt;Often, they don't. From a management perspective, that colleague’s salary is already paid. Unless that saved time is directly used to generate new revenue, your automation didn't change the company's bottom line. You just made someone's life easier.&lt;/p&gt;

&lt;p&gt;That is nice, but it isn't necessarily &lt;em&gt;valuable&lt;/em&gt; in a way leaders notice.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Shift: Stop Doing Projects, Start Building Products
&lt;/h1&gt;

&lt;p&gt;If you want your work to matter, you need to stop acting like an IT support desk and start acting like a Product Owner.&lt;/p&gt;

&lt;p&gt;What is the difference?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A Data Project&lt;/strong&gt; has a start and an end date. It is usually a one-time request. The goal is "delivery." Once you hand over the dashboard or report, you are done. It quickly becomes outdated.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A Data Product&lt;/strong&gt; is a living tool. It doesn't just report the past; it helps shape future decisions. It evolves. Its goal is not "delivery," but measurable "business impact" (like saving money or reducing risk).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Real-World Example: The SpendCube
&lt;/h1&gt;

&lt;p&gt;Let’s look at a real example from my work with a purchasing department.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "Project" Approach:&lt;/strong&gt; &lt;br&gt;
The department asks for a report on last month's spending. I pull the data, send an Excel file, and close the ticket. &lt;br&gt;
&lt;em&gt;Result:&lt;/em&gt; They look at what happened. Nothing changes. The value is low.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "Product" Approach (The SpendCube Dashboard):&lt;/strong&gt; &lt;br&gt;
I build a live dashboard that doesn't just show &lt;em&gt;what&lt;/em&gt; was spent, but actively highlights &lt;em&gt;where&lt;/em&gt; we are overspending against budget in real-time. It identifies specific suppliers where we could negotiate better contracts tomorrow. &lt;br&gt;
&lt;em&gt;Result:&lt;/em&gt; The dashboard isn't just a report; it is a tool they use to actively save the company money. It contributes directly to the P&amp;amp;L (Profit and Loss).&lt;/p&gt;
&lt;h1&gt;
  
  
  How to Make Your Work Valuable
&lt;/h1&gt;

&lt;p&gt;If you are tired of wondering if your work matters, change your approach.&lt;/p&gt;

&lt;p&gt;Don't just accept tasks. When someone asks for a dashboard, ask them: "What decision will you make with this data?" If they can't answer, the dashboard probably isn't necessary.&lt;/p&gt;

&lt;p&gt;Move away from automating tasks and start building data products that solve real business problems. When your work directly helps the company save money or make money, you never have to ask if you are valuable. You already know the answer.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Luca Liu&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>career</category>
      <category>data</category>
      <category>dataanalyst</category>
    </item>
    <item>
      <title>How to Fix "command 'claude-vscode.editor.openLast' not found" in VS Code</title>
      <dc:creator>Luca Liu</dc:creator>
      <pubDate>Wed, 06 May 2026 08:06:22 +0000</pubDate>
      <link>https://forem.com/luca1iu/how-to-fix-command-claude-vscodeeditoropenlast-not-found-in-vs-code-13e9</link>
      <guid>https://forem.com/luca1iu/how-to-fix-command-claude-vscodeeditoropenlast-not-found-in-vs-code-13e9</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;When trying to use the Claude Code extension in VS Code, you might run into this error preventing it from opening (2.1.129):&lt;/p&gt;

&lt;p&gt;&lt;code&gt;command 'claude-vscode.editor.openLast' not found&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;The fix is simple: you need to downgrade the extension to a specific stable version (2.1.128).&lt;/p&gt;

&lt;p&gt;Here are the exact steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Uninstall your current Claude VS Code extension.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click the Gear (Settings) icon on the Claude extension page in VS Code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select "Install Another Version..." from the dropdown menu.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Choose version 2.1.128 from the list.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reload VS Code.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it! The error should be gone and Claude will work properly again.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vscode</category>
      <category>claude</category>
    </item>
    <item>
      <title>How to Store JSON and XML in SQL Databases</title>
      <dc:creator>Luca Liu</dc:creator>
      <pubDate>Fri, 13 Mar 2026 15:37:17 +0000</pubDate>
      <link>https://forem.com/luca1iu/how-to-store-json-and-xml-in-sql-databases-491m</link>
      <guid>https://forem.com/luca1iu/how-to-store-json-and-xml-in-sql-databases-491m</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the era of big data and diverse data formats, the ability to store and query semi-structured data like JSON (JavaScript Object Notation) and XML (eXtensible Markup Language) in SQL databases has become increasingly important. This article explores how to effectively store and manage JSON and XML data in SQL databases, along with the pros and cons of each approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding JSON and XML
&lt;/h2&gt;

&lt;h4&gt;
  
  
  JSON
&lt;/h4&gt;

&lt;p&gt;JSON is a lightweight data interchange format that is easy for humans to read and write, and easy for machines to parse and generate. It is often used in web applications for data exchange between clients and servers.&lt;/p&gt;

&lt;h4&gt;
  
  
  XML
&lt;/h4&gt;

&lt;p&gt;XML is a markup language that defines rules for encoding documents in a format that is both human-readable and machine-readable. It is widely used for data representation and exchange, especially in web services.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storing JSON in SQL Databases
&lt;/h2&gt;

&lt;p&gt;Many modern SQL databases, such as PostgreSQL, MySQL, and SQL Server, provide native support for JSON data types.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Store JSON
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Using JSON Data Type: Some databases allow you to define a column with a JSON data type.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;ProductID&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ProductData&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
   &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;Inserting JSON Data:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;Products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ProductID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ProductData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'{"name": "Laptop", "price": 999.99}'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Querying JSON Data
&lt;/h3&gt;

&lt;p&gt;You can use built-in functions to query JSON data.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ProductData&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ProductName&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Products&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ProductID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Storing XML in SQL Databases
&lt;/h2&gt;

&lt;p&gt;SQL databases also support XML data types, allowing you to store and query XML documents.&lt;/p&gt;
&lt;h4&gt;
  
  
  How to Store XML
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Using XML Data Type: Define a column with an XML data type.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;OrderID&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;OrderDetails&lt;/span&gt; &lt;span class="n"&gt;xml&lt;/span&gt;
   &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;Inserting XML Data:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OrderID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OrderDetails&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;order&amp;gt;&amp;lt;item&amp;gt;Book&amp;lt;/item&amp;gt;&amp;lt;quantity&amp;gt;2&amp;lt;/quantity&amp;gt;&amp;lt;/order&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Querying XML Data
&lt;/h4&gt;

&lt;p&gt;You can use XPath and XQuery to extract data from XML columns.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;OrderDetails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'(/order/item)[1]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'varchar(100)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ItemName&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;OrderID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Pros and Cons of Storing JSON and XML
&lt;/h2&gt;
&lt;h4&gt;
  
  
  Pros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Flexibility: Both JSON and XML allow for flexible data structures, making it easy to store complex data.&lt;/li&gt;
&lt;li&gt;Interoperability: They are widely used formats, making it easier to integrate with other systems and APIs.&lt;/li&gt;
&lt;li&gt;Schema-less: You can store data without a predefined schema, which is useful for evolving data models.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Cons
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Performance: Querying semi-structured data can be slower than querying structured data, especially for large datasets.&lt;/li&gt;
&lt;li&gt;Complexity: Managing and querying JSON and XML data can add complexity to your database operations.&lt;/li&gt;
&lt;li&gt;Storage Overhead: JSON and XML formats can consume more storage space compared to traditional relational data.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Storing JSON and XML in SQL databases provides a powerful way to handle semi-structured data. By leveraging the native support for these formats in modern SQL databases, you can efficiently store, query, and manage complex data structures. Understanding the advantages and limitations of each format will help you make informed decisions about how to best utilize them in your applications.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Luca Liu&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;





&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>tutorial</category>
      <category>data</category>
    </item>
    <item>
      <title>Fixing Azure SQL Connection Errors in Azure Scheduled Python Job</title>
      <dc:creator>Luca Liu</dc:creator>
      <pubDate>Fri, 27 Feb 2026 13:37:00 +0000</pubDate>
      <link>https://forem.com/luca1iu/fixing-azure-sql-connection-errors-in-azure-scheduled-python-job-3ldk</link>
      <guid>https://forem.com/luca1iu/fixing-azure-sql-connection-errors-in-azure-scheduled-python-job-3ldk</guid>
      <description>&lt;p&gt;As a Data Analyst, I recently faced a frustrating issue while automating a daily data processing task in Azure.&lt;/p&gt;

&lt;p&gt;The goal was simple: run a scheduled job every morning to process data and sync it to an Azure SQL Database. When I ran the code manually, it worked perfectly. But when the scheduled job (via Azure Functions or Synapse) triggered at 6:00 AM, it crashed immediately.&lt;/p&gt;

&lt;p&gt;Here is the solution to fixing the "Database not available" error without increasing your Azure bill.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Problem
&lt;/h1&gt;

&lt;p&gt;The job failed consistently with &lt;strong&gt;Error 40613&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(pyodbc.Error) ('HY000', "[HY000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Database 'xxxxxxx' on server 'xxxxxxxxxxxxxxxxxx' is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of '{...}'. (40613) (SQLDriverConnect)") (Background on this error at: https://sqlalche.me/e/20/dbapi)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Why this happens
&lt;/h2&gt;

&lt;p&gt;I am using the &lt;strong&gt;Azure SQL Database Serverless&lt;/strong&gt; tier. To save costs, this tier features &lt;strong&gt;Auto-pause&lt;/strong&gt;. If no one uses the database for a set period (e.g., 1 hour), Azure puts it to sleep.&lt;/p&gt;

&lt;p&gt;When my scheduled job runs in the morning, the database is cold. It takes approximately &lt;strong&gt;60 to 90 seconds&lt;/strong&gt; for Azure to spin the compute back up. The default Python connection string gives up before the database is ready.&lt;/p&gt;
&lt;h1&gt;
  
  
  The Expensive Fix (Don't do this)
&lt;/h1&gt;

&lt;p&gt;My first instinct was to disable Auto-pause.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;Azure Portal&lt;/strong&gt; &amp;gt; &lt;strong&gt;SQL Database&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Compute + storage&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Uncheck &lt;strong&gt;Enable auto-pause&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The result:&lt;/strong&gt; The error stopped, but my costs tripled. I was paying for compute 24/7 for a job that only runs for 10 minutes a day. This is not efficient.&lt;/p&gt;
&lt;h1&gt;
  
  
  The Smart Fix: Intelligent Retry Logic
&lt;/h1&gt;

&lt;p&gt;Instead of keeping the server running all night, we should write code that is patient enough to wait for the server to wake up.&lt;/p&gt;

&lt;p&gt;I wrote a custom wrapper for the SQLAlchemy engine that handles the specific behavior of Azure Serverless cold starts.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Code
&lt;/h3&gt;

&lt;p&gt;Here is the robust connection function. It attempts to connect, and if it detects the database is sleeping, it waits and retries until the server is back online.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.exc&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OperationalError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;InterfaceError&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;connect_sql_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delay_seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Attempts to connect to the database. 
    If the database is in serverless pause state, it retries until it wakes up.

    max_retries: Default 10. Covers ~5 minutes of startup time.
    delay_seconds: Default 30s. Wait time between attempts.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Replace with your credentials or use Environment Variables (Recommended)
&lt;/span&gt;    &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your-server.database.windows.net&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;database&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your-database&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your-username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your-password&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; 

    &lt;span class="c1"&gt;# LoginTimeout=30 gives the driver time to negotiate the handshake
&lt;/span&gt;    &lt;span class="n"&gt;connection_string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mssql+pyodbc://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;@&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;?driver=ODBC+Driver+18+for+SQL+Server&amp;amp;LoginTimeout=30&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create the engine with connection pooling enabled
&lt;/span&gt;    &lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;connection_string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;fast_executemany&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Optimized for bulk inserts
&lt;/span&gt;        &lt;span class="n"&gt;pool_pre_ping&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Checks connection health before usage
&lt;/span&gt;        &lt;span class="n"&gt;pool_recycle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1800&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Attempting to connect to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Try to execute a simple query to wake the DB
&lt;/span&gt;            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT 1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; Success: Database is connected and awake!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;

        &lt;span class="nf"&gt;except &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OperationalError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;InterfaceError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Attempt &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; failed. Database might be auto-paused.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error details: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Waiting &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;delay_seconds&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; seconds for wake-up...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delay_seconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# If we reach here, the database is genuinely down or credentials are wrong
&lt;/span&gt;    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; Failed to wake up the database after multiple attempts.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Loop:&lt;/strong&gt; It tries to run &lt;code&gt;SELECT 1&lt;/code&gt;. This is a lightweight query that forces Azure to trigger the resume process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Trap:&lt;/strong&gt; If it catches an &lt;code&gt;OperationalError&lt;/code&gt; (which covers the 40613 code), it pauses the script for 30 seconds using &lt;code&gt;time.sleep()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Success:&lt;/strong&gt; Once Azure allocates the compute (usually after attempt 2 or 3), the connection succeeds, and the function returns the active &lt;code&gt;engine&lt;/code&gt; object for your pipeline to use.&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;Don't change your infrastructure to fit your code; change your code to fit the infrastructure. By handling the "cold start" in Python, you keep the cost benefits of Serverless architecture while maintaining the reliability of a Production environment.&lt;/p&gt;

&lt;p&gt;Happy coding!&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Luca Liu&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;





&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🎃 Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>azure</category>
      <category>database</category>
      <category>automation</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Install Python Package in Azure Synapse for Apache Spark pools</title>
      <dc:creator>Luca Liu</dc:creator>
      <pubDate>Tue, 06 Jan 2026 21:58:00 +0000</pubDate>
      <link>https://forem.com/luca1iu/how-to-install-python-package-in-azure-synapse-for-apache-spark-pools-4pjj</link>
      <guid>https://forem.com/luca1iu/how-to-install-python-package-in-azure-synapse-for-apache-spark-pools-4pjj</guid>
      <description>&lt;h2&gt;
  
  
  Efficiently Installing Python Packages in Azure Synapse Analytics
&lt;/h2&gt;

&lt;p&gt;When working in Azure Synapse notebooks, you can use the %pip command (e.g., %pip install pandas) in a code cell to install packages. However, this method is temporary. The package is only installed for the current notebook session and must be re-installed every time the session starts.&lt;/p&gt;

&lt;p&gt;This repetition can lead to significant delays in notebook execution and is inefficient for frequently run jobs.&lt;/p&gt;

&lt;p&gt;A more permanent and efficient solution is to install packages directly onto the Apache Spark pool. This approach ensures the libraries are pre-installed and automatically available in every session attached to that pool.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Install Packages at the Spark Pool Level
&lt;/h2&gt;

&lt;p&gt;This method involves uploading a requirements.txt file that specifies the packages and versions you need.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to your Azure Synapse workspace in the Azure portal.&lt;/li&gt;
&lt;li&gt;Navigate to the "Manage" section on the left-hand side.&lt;/li&gt;
&lt;li&gt;Select "Apache Spark pools" under the "Analytics pools" section.&lt;/li&gt;
&lt;li&gt;Choose the Spark pool where you want to install the package.&lt;/li&gt;
&lt;li&gt;move your mouth to the three dots on the right side of the Spark pool and click on "Packages".&lt;/li&gt;
&lt;li&gt;upload &lt;code&gt;requirements.txt&lt;/code&gt; file which contains the list of packages you want to install. &lt;/li&gt;
&lt;li&gt;Click Apply to save the changes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Futjmsqs39tv57h4az884.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Futjmsqs39tv57h4az884.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Spark pool will update and automatically install the specified packages. This may take a few minutes. Once complete, all notebooks attached to this pool will have access to these libraries by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to generate &lt;code&gt;requirements.txt&lt;/code&gt; file
&lt;/h2&gt;

&lt;p&gt;The requirements.txt file is a simple text file that lists the packages to be installed. You can easily generate this file from your local Python environment.&lt;/p&gt;

&lt;p&gt;Open your terminal or command prompt and run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip freeze &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This command captures all packages and their exact versions from your current environment and saves them into a file named requirements.txt. Uploading this file ensures that the exact same package versions are installed in your Synapse environment, providing consistency and preventing dependency conflicts.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Luca Liu&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;





&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🎃 Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>azure</category>
      <category>tutorial</category>
      <category>python</category>
      <category>data</category>
    </item>
    <item>
      <title>How to Calculate a Dynamic Truncated Mean in Power BI Using DAX</title>
      <dc:creator>Luca Liu</dc:creator>
      <pubDate>Tue, 06 Jan 2026 21:57:00 +0000</pubDate>
      <link>https://forem.com/luca1iu/how-to-calculate-a-dynamic-truncated-mean-in-power-bi-using-dax-gij</link>
      <guid>https://forem.com/luca1iu/how-to-calculate-a-dynamic-truncated-mean-in-power-bi-using-dax-gij</guid>
      <description>&lt;h2&gt;
  
  
  Why You Need a Truncated Mean
&lt;/h2&gt;

&lt;p&gt;In data analysis, the standard AVERAGE function is a workhorse, but it has a significant weakness: it is highly susceptible to distortion from outliers. A single extreme value, whether high or low, can skew the entire result, misrepresenting the data's true central tendency.&lt;/p&gt;

&lt;p&gt;This is where the truncated mean becomes essential. It provides a more robust measure of average by excluding a specified percentage of the smallest and largest values from the calculation.&lt;/p&gt;

&lt;p&gt;While modern Power BI models have a built-in TRIMMEAN function, this function is often unavailable when using a Live Connection to an older Analysis Services (SSAS) model. This article provides a robust, manual DAX pattern that replicates this functionality and remains fully dynamic, responding to all slicers and filters in your report.&lt;/p&gt;

&lt;h2&gt;
  
  
  The DAX Solution for a Dynamic Truncated Mean
&lt;/h2&gt;

&lt;p&gt;This measure calculates a 20% truncated mean by removing the bottom 10% and top 10% of values before averaging the remaining 80%.&lt;/p&gt;

&lt;p&gt;You can paste this code directly into the "New Measure" formula bar.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Trimmed Mean (20%) = 
VAR TargetTable = 'FactTable'
VAR TargetColumn = 'FactTable'[MeasureColumn]
VAR LowerPercentile = 0.10 // Defines the bottom 10% to trim
VAR UpperPercentile = 0.90 // Defines the top 10% to trim (1.0 - 0.10)

// 1. Find the value at the 10th percentile
VAR MinThreshold =
    PERCENTILEX.INC(
        FILTER( 
            TargetTable, 
            NOT( ISBLANK( TargetColumn ) ) 
        ),
        TargetColumn,
        LowerPercentile
    )

// 2. Find the value at the 90th percentile
VAR MaxThreshold =
    PERCENTILEX.INC(
        FILTER( 
            TargetTable, 
            NOT( ISBLANK( TargetColumn ) ) 
        ),
        TargetColumn,
        UpperPercentile
    )

// 3. Calculate the average, including only values between the thresholds
RETURN
CALCULATE(
    AVERAGEX(
        FILTER(
            TargetTable,
            TargetColumn &amp;gt;= MinThreshold &amp;amp;&amp;amp;
            TargetColumn &amp;lt;= MaxThreshold
        ),
        TargetColumn
    )
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Deconstructing the DAX Logic
&lt;/h2&gt;

&lt;p&gt;This formula works in three distinct steps, all of which execute within the current filter context (e.g., whatever slicers the user has selected).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define Key Variables&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;TargetTable&lt;/code&gt; &amp;amp; &lt;code&gt;TargetColumn&lt;/code&gt;: We assign the table and column names to variables for clean, reusable code. You must change 'FactTable'[MeasureColumn] to match your data model.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LowerPercentile&lt;/code&gt; / &lt;code&gt;UpperPercentile&lt;/code&gt;: We define the boundaries. 0.10 and 0.90 mean we are trimming the bottom 10% and top 10%. To trim 5% from each end (a 10% total trim), you would use 0.05 and 0.95.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  2. Find the Percentile Thresholds
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;MinThreshold&lt;/code&gt; &amp;amp; &lt;code&gt;MaxThreshold&lt;/code&gt;: These variables store the actual values that correspond to our percentile boundaries.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PERCENTILEX.INC&lt;/code&gt;: We use this "iterator" function because it allows us to first FILTER the table.&lt;/li&gt;
&lt;li&gt;`FILTER(..., NOT(ISBLANK(...))): This is a crucial step. We calculate the percentiles only for rows where our target column is not blank. This prevents BLANK() values from skewing the percentile calculation.&lt;/li&gt;
&lt;li&gt;The result is that &lt;code&gt;MinThreshold&lt;/code&gt; holds the value of the 10th percentile (e.g., 4.5) and &lt;code&gt;MaxThreshold&lt;/code&gt; holds the value of the 90th percentile (e.g., 88.2) for the currently visible data.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3. Calculate the Final Average
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;RETURN CALCULATE(...)&lt;/code&gt;: The CALCULATE function is the key to making the measure dynamic. It ensures the entire calculation respects the filters applied by any slicers or visuals in the report.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;AVERAGEX(FILTER(...))&lt;/code&gt;: The core of the calculation. We use AVERAGEX to iterate over a table.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;FILTER(...)&lt;/code&gt;: We filter our TargetTable a final time. This filter is the "trim." It keeps only the rows where the value in TargetColumn is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Greater than or equal to&lt;/strong&gt; our MinThreshold&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AND&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Less than or equal to&lt;/strong&gt; our MaxThreshold&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;AVERAGEX(..., TargetColumn)&lt;/code&gt;: &lt;code&gt;AVERAGEX&lt;/code&gt; then calculates the simple average of &lt;code&gt;TargetColumn&lt;/code&gt; for only the rows that passed the filter.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;By implementing this DAX pattern, you create a robust, dynamic, and outlier-resistant KPI. This measure provides a more accurate picture of your data's central tendency and will correctly re-calculate on the fly as users interact with your Power BI report.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Luca Liu&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;





&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/Luca_DataTeam" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🎃 Connect with me on X&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>powerbi</category>
      <category>tutorial</category>
      <category>dax</category>
      <category>data</category>
    </item>
    <item>
      <title>Data Security in SQL: Encryption, Roles, and Permissions</title>
      <dc:creator>Luca Liu</dc:creator>
      <pubDate>Tue, 09 Dec 2025 16:45:00 +0000</pubDate>
      <link>https://forem.com/luca1iu/data-security-in-sql-encryption-roles-and-permissions-17g</link>
      <guid>https://forem.com/luca1iu/data-security-in-sql-encryption-roles-and-permissions-17g</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In today's digital age, data security is paramount. SQL databases often store sensitive information, making it crucial to implement robust security measures. This article explores three key strategies for securing data in SQL: encryption, roles, and permissions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Encrypting Sensitive Columns
&lt;/h2&gt;

&lt;p&gt;Encryption is the process of converting data into a coded format to prevent unauthorized access. In SQL, encrypting sensitive columns such as passwords and credit card data is essential.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Encrypt Data in SQL
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Choose an Encryption Algorithm&lt;/strong&gt;: Common algorithms include AES (Advanced Encryption Standard) and RSA (Rivest-Shamir-Adleman).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement Column-Level Encryption&lt;/strong&gt;: Use SQL commands to encrypt specific columns. For example:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;UserID&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;Username&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
       &lt;span class="n"&gt;Password&lt;/span&gt; &lt;span class="nb"&gt;varbinary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;ENCRYPTED&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;ENCRYPTION&lt;/span&gt;
   &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Manage Encryption Keys&lt;/strong&gt;: Store and manage encryption keys securely, using a key management system.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Using Roles and Permissions Effectively
&lt;/h2&gt;

&lt;p&gt;Roles and permissions control who can access or modify data within the database. Properly configured roles and permissions are vital for data security.&lt;/p&gt;
&lt;h4&gt;
  
  
  Setting Up Roles
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Define Roles: Identify different user roles (e.g., admin, user, guest) and their access needs.&lt;/li&gt;
&lt;li&gt;Create Roles in SQL:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="k"&gt;admin&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="k"&gt;user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Assigning Permissions
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Grant Permissions&lt;/strong&gt;: Assign specific permissions to roles. For example:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="k"&gt;user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
   &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="k"&gt;admin&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Review and Update Regularly&lt;/strong&gt;: Regularly audit permissions to ensure they align with current security policies.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Masking Sensitive Data with Views
&lt;/h2&gt;

&lt;p&gt;Data masking involves creating a version of the data that obscures sensitive information, allowing users to work with data without exposing sensitive details.&lt;br&gt;
Implementing Data Masking&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create Views: Use SQL views to present masked data. For example:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;    &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;MaskedUsers&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;UserID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'****'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Password&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;Control Access to Views: Ensure only authorized users can access the views.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Securing data in SQL databases requires a multi-faceted approach. By encrypting sensitive columns, using roles and permissions effectively, and masking data with views, you can significantly enhance your database's security. Implement these strategies to protect your data from unauthorized access and breaches.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Luca Liu&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;





&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>database</category>
      <category>tutorial</category>
      <category>sql</category>
      <category>data</category>
    </item>
    <item>
      <title>Stuck in a Version Trap - How I Used Azure ML to Deploy an Azure Function</title>
      <dc:creator>Luca Liu</dc:creator>
      <pubDate>Mon, 08 Dec 2025 09:52:00 +0000</pubDate>
      <link>https://forem.com/luca1iu/stuck-in-a-version-trap-how-i-used-azure-ml-to-deploy-an-azure-function-19ke</link>
      <guid>https://forem.com/luca1iu/stuck-in-a-version-trap-how-i-used-azure-ml-to-deploy-an-azure-function-19ke</guid>
      <description>&lt;p&gt;As a developer, there is no worse feeling than being completely blocked. This is the story of how I got stuck in a "version trap" between my company PC, VS Code, and Azure... and how I used a cloud VM to escape.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Date:&lt;/strong&gt; November 17, 2025&lt;/p&gt;

&lt;h1&gt;
  
  
  The Version Trap
&lt;/h1&gt;

&lt;p&gt;My goal was to create a new Azure Function in Python. I checked the Azure Portal, and I was excited to see that the Function App runtime now &lt;strong&gt;supports Python 3.13&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;My company laptop has Python 3.13 installed, so I thought this would be easy. I opened VS Code, installed the Azure Functions extension, and tried to create a new project.&lt;/p&gt;

&lt;p&gt;When the extension asked me to select my Python interpreter, I pointed it to my &lt;code&gt;Python313\python.exe&lt;/code&gt;. Immediately, I hit a wall:&lt;/p&gt;

&lt;p&gt;Error: &lt;code&gt;Python version 3.13.8 does not match supported versions...&lt;/code&gt; &lt;/p&gt;

&lt;p&gt;The problem is that the &lt;strong&gt;cloud runtime&lt;/strong&gt; (in Azure) is updated &lt;em&gt;before&lt;/em&gt; the &lt;strong&gt;local development tools&lt;/strong&gt; (the VS Code extension and Core Tools). My local tools were out of sync with the cloud and didn't recognize 3.13 as valid yet.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Real-World Constraint: The Corporate PC
&lt;/h1&gt;

&lt;p&gt;The standard solution is simple: "Just install a supported version, like Python 3.11."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My problem:&lt;/strong&gt; I can't. This is a locked-down company laptop. Installing new software requires a multi-day approval process with the IT department. (My &lt;em&gt;other&lt;/em&gt; local Python 3.11 installation was also broken and missing key modules like &lt;code&gt;pip&lt;/code&gt; and &lt;code&gt;venv&lt;/code&gt;, but I couldn't get admin rights to fix it.)&lt;/p&gt;

&lt;p&gt;I was completely blocked. I couldn't develop locally.&lt;/p&gt;

&lt;h1&gt;
  
  
  The "Aha!" Moment: Use a Cloud Dev Box
&lt;/h1&gt;

&lt;p&gt;As a Data Analyst, I already have access to an &lt;strong&gt;Azure ML (Machine Learning) Compute Instance&lt;/strong&gt;. I realized: &lt;em&gt;that compute instance is just a fully-featured Linux VM in the cloud that I control.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What if I treated my Azure ML instance as my &lt;em&gt;new&lt;/em&gt; "local" development machine?&lt;/p&gt;

&lt;h1&gt;
  
  
  The Solution: Deploying from Azure ML to Azure Functions
&lt;/h1&gt;

&lt;p&gt;This workflow completely bypassed my locked-down company PC and was surprisingly simple.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Connect VS Code to the Azure ML Instance&lt;/strong&gt; This is the most important step. In VS Code, I installed the &lt;strong&gt;Azure Machine Learning&lt;/strong&gt; extension. In its panel, I found my Compute Instance, right-clicked, and selected "Connect to Compute Instance." VS Code reloaded in a "Remote SSH" session, and my VS Code terminal was now a terminal &lt;em&gt;inside my cloud VM&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Create the Project &lt;em&gt;on the ML Instance&lt;/em&gt;&lt;/strong&gt; Now, inside this remote session, I opened a folder &lt;em&gt;on the ML instance&lt;/em&gt; and ran the &lt;code&gt;F1&lt;/code&gt; &amp;gt; &lt;code&gt;Azure Functions: Create New Project...&lt;/code&gt; command. The VM already had Python 3.10 installed, so the tools were perfectly happy. I also created my &lt;code&gt;TimerTrigger&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Set Up the Environment (The "F5" Fix)&lt;/strong&gt; My code needs &lt;code&gt;pandas&lt;/code&gt; and &lt;code&gt;pyodbc&lt;/code&gt;. I opened the VS Code terminal (which is connected to my ML instance) and ran these commands to create a virtual environment and install my packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a virtual environment using the VM's Python 3.10&lt;/span&gt;
python3.10 &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv

&lt;span class="c"&gt;# Activate it&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate

&lt;span class="c"&gt;# Install my packages&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Step 4: Debug "Remotely"&lt;/strong&gt; This is the magic part. I pressed &lt;strong&gt;F5&lt;/strong&gt;. The code &lt;em&gt;ran on the ML instance&lt;/em&gt;, but the debugger connected to my local VS Code. I could set breakpoints and inspect variables just as if it were running on my own laptop. I successfully debugged my function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Deploy from Cloud to Cloud&lt;/strong&gt; Once I was happy with my code, I clicked on the Azure extension icon (inside my remote VS Code session). I found my target Function App, right-clicked, and selected &lt;strong&gt;"Deploy to Function App..."&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;VS Code packaged all the code &lt;em&gt;from my Azure ML instance&lt;/em&gt; and deployed it directly &lt;em&gt;to my Azure Functions app&lt;/em&gt;. My local PC was just a "thin client" for the whole process.&lt;/p&gt;
&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Don't let a locked-down corporate PC block you from getting work done. If your local tools are out of date or broken, you can use any cloud VM (like an Azure ML Compute Instance) as a powerful, modern development environment. By using the VS Code Remote-SSH features, you can get the best of both worlds.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Luca Liu&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>azure</category>
      <category>dataanalyst</category>
      <category>dataengineering</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>10 Essential Data Science Algorithms &amp; Techniques</title>
      <dc:creator>Luca Liu</dc:creator>
      <pubDate>Mon, 08 Dec 2025 09:51:00 +0000</pubDate>
      <link>https://forem.com/luca1iu/10-essential-data-science-algorithms-techniques-58bp</link>
      <guid>https://forem.com/luca1iu/10-essential-data-science-algorithms-techniques-58bp</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;The world of data science can seem intimidating, filled with complex equations and advanced statistical concepts. Many aspiring data scientists feel they need to be a "math master" before even beginning. But here's a secret: while a deep understanding of the mathematical foundations of every algorithm is certainly powerful, it's not a prerequisite to becoming an effective data scientist.&lt;/p&gt;

&lt;p&gt;What truly matters is developing an intuitive understanding of what these powerful algorithms do, when to unleash them, and why one might be chosen over another. Think of it less like building an engine from scratch, and more like knowing which tool to pick from a well-stocked toolbox to get the job done right. This article will cut through the jargon and introduce you to 10 essential algorithms and techniques—the workhorses of data science—equipping you with the practical knowledge you need to start building intelligent solutions today.&lt;/p&gt;

&lt;h1&gt;
  
  
  I. Foundational Supervised Learning
&lt;/h1&gt;

&lt;p&gt;Supervised Learning is the most common type of machine learning. It's like learning with a teacher or flashcards. You give the algorithm a dataset where you already know the correct answers (called "labels").&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Linear Regression
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is&lt;/strong&gt;: Linear Regression is a fundamental algorithm that finds the best-fit straight line showing the relationship between variables. Its goal is to predict a continuous numerical value (e.g., a house price, a person's weight, or sales) based on one or more input features (e.g., house size, a person's height, or ad spending).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When your goal is to predict a continuous number (e.g., forecasting sales, estimating a price).&lt;/li&gt;
&lt;li&gt;When you need to understand the strength and direction of the relationship between variables (e.g., "How much does ad spending really impact sales?").&lt;/li&gt;
&lt;li&gt;As a simple, fast baseline to compare against more complex models.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of Linear Regression immediately when your primary question is "How much...?" or "What value...?" and you have a numerical target to predict. If you suspect the relationship between your inputs and output is relatively simple (e.g., "more square footage = higher house price"), and you value speed and interpretability (it's easy to explain why it made a prediction), it's your perfect starting point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LinearRegression&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features (e.g., [[square_feet, num_bedrooms]])
# y = your target (e.g., [price])
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Check the relationship (e.g., the slope of the line)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Coefficients: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;coef_&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  2. Logistic Regression
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; Despite its name, Logistic Regression is used for classification tasks. Its goal is to predict the probability that an input belongs to a specific category(e.g., spam vs. not spam, disease vs. no disease) based on input features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When your goal is to predict a category(e.g., spam/not spam, fraud/not fraud, pass/fail). This is most common for binary problems.&lt;/li&gt;
&lt;li&gt;When you need the probability of an outcome(e.g., what is the likelihood this customer will click the ad?).&lt;/li&gt;
&lt;li&gt;As a simple, fast and highly interpretable baseline for classification. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Data Scientist's "Sense": You should think of Logistic Regression immediately when your primary question is "Is it A or B?" "Will this happen?" or "What's the probability of...?" for a categorical outcome. It's the classification equivalent of Linear Regression—your first, most straightforward tool for the job. Its ability to provide probabilities makes it more useful than just a "yes" or "no" answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features (e.g., [[hours_studied, past_failures]])
# y = your target (e.g., [pass, fail])
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions (e.g., 'pass' or 'fail')
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Get the probabilities
&lt;/span&gt;&lt;span class="n"&gt;probabilities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  3. K-Nearest Neighbors (KNN)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; KNN is a simple and intuitive algorithm that classifies a new data point based on its 'neighbors', it finds the 'k' closest data points from the training set and makes a prediction based on their majority vote. If K=5 and 3 out of 5 neighbors are 'spam', the new point is classified as 'spam'.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For classification (and regression) tasks where the underlying data relationships are complex but "similarity" is a good predictor (e.g., "birds of a feather flock together").&lt;/li&gt;
&lt;li&gt;As a simple, "non-parametric" or "lazy" model, meaning it makes no assumptions about the underlying data distribution. It doesn't "learn" a line; it just memorizes the data.&lt;/li&gt;
&lt;li&gt;For tasks like recommendation engines (e.g., "users similar to you also liked...").&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense"&lt;/strong&gt;: You should think of KNN when your features are in a similar scale (e.g., all numbers from 1-10) and you believe the core idea "tell me who your friends are, and I'll tell you who you are" applies to your data. It's great when you have well-defined, distinct clusters in your data. It's often outperformed by more advanced models but is a fantastic, simple baseline, especially if you don't have a lot of features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.neighbors&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;KNeighborsClassifier&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features
# y = your target classes
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model (e.g., we'll look at 5 neighbors)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KNeighborsClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_neighbors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model (it just stores the data)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  4. Support Vector Machines (SVM)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; SVM is a powerful classification algorithm that finds the optimal "hyperplane" (a boundary line) that best separates data points into different classes. Its main goal is to find the line that has the largest possible "margin" or buffer zone between the closest points of each class. These closest points are called the "support vectors."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For complex classification tasks where classes are well-defined but may not be separable by a simple straight line.&lt;/li&gt;
&lt;li&gt;In high-dimensional spaces (data with many features), such as text classification (where every word is a feature) or image recognition.&lt;/li&gt;
&lt;li&gt;When you need a model that is robust against overfitting, especially in cases with many features.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of SVM when you need a highly accurate classifier and believe a clear separating boundary exists, even if it's complex. If Logistic Regression is too simple, but a Neural Network seems like overkill, SVM is your strong, sophisticated middle-ground. It's particularly powerful for text classification and other "wide" data problems (more columns/features than rows).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.svm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SVC&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features
# y = your target classes
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model
# (kernel='linear' is a straight line, 'rbf' is more complex)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SVC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rbf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  II. Ensemble Methods(The Power-Players)
&lt;/h1&gt;

&lt;p&gt;Ensemble Methods are techniques that combine multiple machine learning models to produce one, superior model. Instead of relying on a single "expert," this method gets the "opinion" (prediction) from a diverse group of models and combines them.&lt;/p&gt;
&lt;h2&gt;
  
  
  5. Decision Trees
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A Decision Tree is an intuitive algorithm that works like a flowchart. It asks a series of sequential "if-then-else" questions about your data's features, splitting the data at each step. This process continues until it reaches a "leaf node" that provides a final prediction (either a class or a numerical value).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For both classification (e.g., "survived" or "died") and regression (e.g., "predict price") tasks.&lt;/li&gt;
&lt;li&gt;When the most important requirement is interpretability. You can visually see and explain every step the model took to reach its decision.&lt;/li&gt;
&lt;li&gt;As the fundamental building block for more powerful ensemble models like Random Forests and XGBoost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of a Decision Tree whenever a non-technical stakeholder needs to understand why a prediction is being made. It's the "white-box" model. While often not the most accurate on its own (it can easily "overfit" or memorize the data), it's the perfect tool for explaining complex relationships in a simple, visual way and serves as a great baseline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# For classification:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.tree&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DecisionTreeClassifier&lt;/span&gt;

&lt;span class="c1"&gt;# For regression:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.tree&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DecisionTreeRegressor&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features
# y = your target classes
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model (e.g., limit depth to prevent overfitting)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  6. Random Forests
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A Random Forest is an ensemble algorithm. It builds a large number of individual Decision Trees during training. For a new prediction, each tree "votes," and the Random Forest outputs the most popular class (for classification) or the average (for regression) from all the trees. It uses randomness when building the trees to ensure they are all different, which makes the combined model much more powerful and accurate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For both classification and regression tasks where you need high accuracy and robustness.&lt;/li&gt;
&lt;li&gt;When you want to prevent overfitting, which is a common problem with single Decision Trees.&lt;/li&gt;
&lt;li&gt;To get a good "out-of-the-box" model with very little tuning required.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; This is the go-to, workhorse algorithm. You should think of Random Forest when a single Decision Tree isn't accurate enough. It's the "wisdom of the crowd" approach—one tree might be wrong, but the average of 1,000 trees is highly reliable. It's almost always a strong first choice when you need a high-performance model and don't want to spend a lot of time on complex tuning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# For classification:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomForestClassifier&lt;/span&gt;

&lt;span class="c1"&gt;# For regression:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomForestRegressor&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features
# y = your target
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model (e.g., build 100 trees)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RandomForestClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  7. Gradient Boosting Machines (GBM)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; GBM is a powerful ensemble technique that builds models (typically decision trees) sequentially. Unlike Random Forest which builds trees independently, GBM builds one tree at a time, where each new tree's job is to correct the errors and weaknesses of all the trees that came before it. It's a "boosting" method because it incrementally "boosts" the model's performance by focusing on its past mistakes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For classification and regression tasks where high accuracy is the top priority.&lt;/li&gt;
&lt;li&gt;When you are willing to spend more time tuning parameters to get the best possible performance.&lt;/li&gt;
&lt;li&gt;When a Random Forest model is performing well, but you need an extra performance boost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of GBM when "good" isn't good enough and you need "great." It's the "team of experts" approach: the first tree makes a guess, the second tree corrects the first tree's mistakes, the third corrects the remaining mistakes, and so on. It's extremely powerful but can overfit if not tuned carefully (e.g., by limiting the number of trees or their depth). It's the direct predecessor to XGBoost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# For classification:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GradientBoostingClassifier&lt;/span&gt;

&lt;span class="c1"&gt;# For regression:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GradientBoostingRegressor&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features
# y = your target
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model (e.g., build 100 trees sequentially)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GradientBoostingClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  8. XGBoost(Extreme Gradient Boosting)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; XGBoost is not a new algorithm, but a specific implementation of Gradient Boosting (GBM) that has been heavily optimized for speed, efficiency, and performance. Like GBM, it builds trees sequentially to correct errors, but it includes several clever tricks (like parallel processing and built-in "regularization") that make it faster and generally more accurate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When maximum predictive accuracy is the absolute top priority.&lt;/li&gt;
&lt;li&gt;On structured or tabular data (like spreadsheets or database tables).&lt;/li&gt;
&lt;li&gt;In data science competitions (like Kaggle), where it is famous for being a dominant, winning algorithm.&lt;/li&gt;
&lt;li&gt;When you need a model that's both high-performing and computationally efficient (faster than standard GBM).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of XGBoost as the default "go-to" algorithm for high-performance modeling on tabular data. It's the "race car" version of Gradient Boosting. If your Random Forest or basic GBM model is good, XGBoost is what you use to make it great. It's the first thing most data scientists try when they are serious about winning a competition or squeezing every last drop of accuracy out of their data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: It uses its own dedicated library, xgboost.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;xgboost&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;

&lt;span class="c1"&gt;# For classification:
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;XGBClassifier&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# For regression:
# model = xgb.XGBRegressor()
&lt;/span&gt;
&lt;span class="c1"&gt;# X = your features
# y = your target
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model
# (XGBoost has many tuning parameters, but defaults work well)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;XGBClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_label_encoder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eval_metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;logloss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get class predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  III. Unsupervised Learning &amp;amp; Deep Learning
&lt;/h1&gt;

&lt;p&gt;Unsupervised Learning is a type of machine learning where the algorithm is given data without any labels or correct answers. It's like "learning without a teacher."&lt;/p&gt;

&lt;p&gt;Deep Learning is a specific, advanced subfield of machine learning that uses "deep" Neural Networks—networks with many layers. These layers allow the model to learn incredibly complex, hierarchical patterns directly from raw data&lt;/p&gt;
&lt;h2&gt;
  
  
  9. K-Means Clustering
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; K-Means is the most popular unsupervised algorithm. This means it's used when you don't have a target variable or pre-defined labels. Its goal is to find hidden structures in data by automatically grouping similar data points into "K" (a number you choose) distinct clusters. It works by finding "centroids" (the center point of a cluster) and assigning each data point to the nearest one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When you have unlabeled data and want to discover its natural groupings.&lt;/li&gt;
&lt;li&gt;For customer segmentation (e.g., finding different types of shoppers).&lt;/li&gt;
&lt;li&gt;For anomaly detection (points far from any cluster center can be outliers).&lt;/li&gt;
&lt;li&gt;To simplify a dataset by grouping similar items.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of K-Means immediately when your primary question is "What are the natural groups in my data?" or "How can I segment this?" It's not for predicting a known answer, but for discovering unknown patterns. It's the go-to tool for exploratory analysis when you need to understand your data's inherent structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most common tool is Scikit-learn (sklearn).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.cluster&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;KMeans&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features (unlabeled data)
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model (e.g., we want to find 3 clusters)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KMeans&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_clusters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Train the model (it finds the clusters)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Get the cluster labels for each data point
&lt;/span&gt;&lt;span class="n"&gt;cluster_labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;labels_&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Get the center point of each cluster
&lt;/span&gt;&lt;span class="n"&gt;centroids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster_centers_&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  10. Neural Networks
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A Neural Network is a powerful algorithm inspired by the structure of the human brain. It's built from layers of interconnected "nodes" or "neurons" that process information. "Deep Learning" simply refers to Neural Networks that have many layers ("deep" networks), allowing them to learn extremely complex, hierarchical patterns from vast amounts of data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When working with unstructured data like images (e.g., object recognition), text (e.g., translation, sentiment analysis), and audio (e.g., speech-to-text).&lt;/li&gt;
&lt;li&gt;For highly complex problems where other models (like XGBoost) are not powerful enough.&lt;/li&gt;
&lt;li&gt;When peak performance is the primary goal, and "explainability" (interpretability) is less of a concern.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Data Scientist's "Sense":&lt;/strong&gt; You should think of Neural Networks as your heavy-duty, specialized tool. While XGBoost dominates on tabular (spreadsheet) data, Deep Learning is the undisputed champion for perception and language tasks. If your problem involves "seeing" (images), "hearing" (audio), or "understanding" (text), a Neural Network is almost always the right choice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python Package &amp;amp; Code: The most popular libraries are Keras (often with TensorFlow) and PyTorch.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# A simple example using Keras (with TensorFlow backend)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Sequential&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras.layers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dense&lt;/span&gt;

&lt;span class="c1"&gt;# X = your features
# y = your target
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Create the model (a simple, sequential stack of layers)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],)))&lt;/span&gt; &lt;span class="c1"&gt;# Input layer
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;                            &lt;span class="c1"&gt;# Hidden layer
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sigmoid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;                          &lt;span class="c1"&gt;# Output layer (for classification)
&lt;/span&gt;
&lt;span class="c1"&gt;# 2. Compile the model (set up the learning process)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;adam&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;binary_crossentropy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Train the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Get predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;We've journeyed through 10 essential algorithms and techniques, from the foundational simplicity of Linear Regression to the advanced power of Deep Learning. Remember, the goal isn't to become a theoretical mathematician overnight, but to cultivate a practical intuition for these tools.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Luca Liu&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>algorithms</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Essential Services for Newcomers in Germany - Personal Recommendations</title>
      <dc:creator>Luca Liu</dc:creator>
      <pubDate>Tue, 18 Nov 2025 15:44:00 +0000</pubDate>
      <link>https://forem.com/luca1iu/essential-services-for-newcomers-in-germany-personal-recommendations-4612</link>
      <guid>https://forem.com/luca1iu/essential-services-for-newcomers-in-germany-personal-recommendations-4612</guid>
      <description>&lt;p&gt;Before diving into essential services in Germany, you might want to learn how to obtain the German Opportunity Card. I've previously written a detailed guide on &lt;a href="https://blog.luca-liu.com/article/opportunity-card-germany-my-first-hand-experience-and-complete-guide" rel="noopener noreferrer"&gt;The Opportunity Card Germany: My First-Hand Experience and Complete Guide&lt;/a&gt;, which shares the application process, required documents, and experiences after arriving in Germany. If you're planning to apply for the Opportunity Card or have already been approved, the recommended services below will help you settle smoothly in Germany.&lt;/p&gt;

&lt;h1&gt;
  
  
  Essential Services for Newcomers in Germany: My Personal Recommendations
&lt;/h1&gt;

&lt;p&gt;If you've recently arrived in Germany with an Opportunity Card or are planning to come soon, you'll need to set up essential services to make your transition smoother. Based on my personal experience, I've compiled a list of recommended services that will help you get settled quickly. By using my referral links, we both can benefit from special bonuses!&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Expatrio - Your Blocked Account Solution
&lt;/h2&gt;

&lt;p&gt;When preparing for your visa application, you'll need to deposit your financial proof into a &lt;strong&gt;Blocked Account&lt;/strong&gt;. I personally used Expatrio's services, which made the process incredibly convenient. After arriving in Germany, Expatrio automatically transfers the monthly unfrozen amount to your designated account.&lt;/p&gt;

&lt;p&gt;Sign up using my link to get started: &lt;a href="https://www.expatrio.com/?f=xianjingl1" rel="noopener noreferrer"&gt;https://www.expatrio.com/?f=xianjingl1&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. N26 - Modern Digital Banking
&lt;/h2&gt;

&lt;p&gt;N26 is a leading digital bank in Germany that offers a fully mobile banking experience. Their services include a free basic account, easy international transfers, and investment options for stocks and funds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My experience:&lt;/strong&gt; N26 has been extremely convenient - you can open an account remotely via video verification, link it to Apple Pay and Google Pay, and even invest in stocks and funds. I use N26 as my salary account, and the standard plan meets all my basic needs. Transfers between friends are instant!&lt;/p&gt;

&lt;p&gt;Join N26 today using my invitation link: &lt;a href="https://n26.com/r/xianjinl1671" rel="noopener noreferrer"&gt;https://n26.com/r/xianjinl1671&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Telekom - Premium Mobile and Internet Service
&lt;/h2&gt;

&lt;p&gt;Telekom is Germany's largest telecommunications provider, offering mobile plans, home internet, and TV services with excellent coverage throughout the country.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My experience:&lt;/strong&gt; I use Telekom's network and the signal quality is outstanding. Their coverage extends to rural areas where other providers might have weak signals.&lt;/p&gt;

&lt;p&gt;Use my referral link to sign up and we both can receive a cash bonus of up to €90: &lt;a href="https://www.telekom-empfehlen.de/PcT4hPGk" rel="noopener noreferrer"&gt;https://www.telekom-empfehlen.de/PcT4hPGk&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Ostrom - Smart Green Energy Provider
&lt;/h2&gt;

&lt;p&gt;Ostrom is an innovative green energy provider offering flexible monthly electricity contracts without long-term commitments. Their smart app allows you to track your energy usage in real-time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My experience:&lt;/strong&gt; The mobile app makes it extremely convenient to monitor electricity usage, and their green energy focus aligns with my environmental values.&lt;/p&gt;

&lt;p&gt;You can save up to 35% on your electricity bill (approximately €500 per year on average) with Ostrom. Sign up with my referral code to receive a €50 bonus or €100 store credit: &lt;a href="https://join.ostrom.de/?referralCode=XIANEJCXJC" rel="noopener noreferrer"&gt;https://join.ostrom.de/?referralCode=XIANEJCXJC&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Payback - Germany's Popular Loyalty Program
&lt;/h2&gt;

&lt;p&gt;Payback is Germany's largest loyalty program, partnering with numerous retailers including supermarkets, drug stores, gas stations, and online shops. You collect points with every purchase that can be redeemed for cash or rewards.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My experience:&lt;/strong&gt; This is a money-saving essential in Germany! The program includes many stores you'll visit regularly. Simply scan your Payback code after the cashier scans your items to collect points instantly. I earn about €200 cash back per year through this program.&lt;/p&gt;

&lt;p&gt;Register using my link to receive 200 bonus points: &lt;a href="https://www.payback.de/anmelden/freunde-werben?mgm-ref=c6d1ccf5-362e-4fc0-8adf-6677707797c6&amp;amp;excid=mgm&amp;amp;incid=mgm" rel="noopener noreferrer"&gt;https://www.payback.de/anmelden/freunde-werben?mgm-ref=c6d1ccf5-362e-4fc0-8adf-6677707797c6&amp;amp;excid=mgm&amp;amp;incid=mgm&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  6. American Express - Premium Credit Cards
&lt;/h2&gt;

&lt;p&gt;American Express offers various credit cards in Germany with benefits ranging from travel insurance to rewards points and exclusive offers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My experience:&lt;/strong&gt; I got the rose gold metal card after finding employment in Germany. While the €20 monthly fee isn't cheap, the points can be exchanged for airline miles and the card itself is beautifully designed.&lt;/p&gt;

&lt;p&gt;Apply through my referral link: &lt;a href="https://americanexpress.com/de-de/referral/gold?ref=xIAOYQMATA&amp;amp;XL=MIANS" rel="noopener noreferrer"&gt;https://americanexpress.com/de-de/referral/gold?ref=xIAOYQMATA&amp;amp;XL=MIANS&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  7. American Express Payback Card - No Annual Fee Option
&lt;/h2&gt;

&lt;p&gt;This American Express card is co-branded with Payback, allowing you to collect additional points on your Payback account with every purchase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My experience:&lt;/strong&gt; The biggest advantage is that this card has no annual fee. You can earn extra points when shopping at Payback partner stores - 3 euros equals 1 point. Using my link, you can receive an additional 2000 Payback points (equivalent to €20).&lt;/p&gt;

&lt;p&gt;Apply for the Amex Payback card using my link: &lt;a href="https://americanexpress.com/de-de/referral/payback?ref=xIANJL6aY9&amp;amp;XL=MIMNS" rel="noopener noreferrer"&gt;https://americanexpress.com/de-de/referral/payback?ref=xIANJL6aY9&amp;amp;XL=MIMNS&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Setting up these essential services will make your transition to life in Germany much smoother. Using referral links not only helps you get started quickly but also provides additional bonuses for both of us. Welcome to Germany, and I hope these recommendations help you settle in comfortably!&lt;/p&gt;




&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;


&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Luca Liu&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>germany</category>
      <category>job</category>
      <category>career</category>
    </item>
    <item>
      <title>English Speaking Companies in Germany</title>
      <dc:creator>Luca Liu</dc:creator>
      <pubDate>Tue, 18 Nov 2025 15:43:00 +0000</pubDate>
      <link>https://forem.com/luca1iu/english-speaking-companies-in-germany-2loe</link>
      <guid>https://forem.com/luca1iu/english-speaking-companies-in-germany-2loe</guid>
      <description>&lt;h2&gt;
  
  
  English-Speaking Companies in Germany: A Guide for Job Seekers
&lt;/h2&gt;

&lt;p&gt;As a foreigner who spent 8 months searching for a job in Germany, I understand the challenges of finding English-speaking opportunities in a predominantly German-speaking country. After applying to over 500 positions and interviewing with numerous companies, I've compiled this list of companies where English is the primary working language, to help fellow job seekers who aren't fluent in German.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. &lt;a href="https://jobs.sap.com/" rel="noopener noreferrer"&gt;SAP&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;SAP is one of Germany's largest software companies and a global leader in enterprise application software. With approximately 107,000 employees worldwide and headquartered in Walldorf, Baden-Württemberg, SAP has a market value of over €160 billion.&lt;/p&gt;

&lt;p&gt;My experience: During my interview process, English was used throughout. The manager explicitly mentioned that while German language skills are a plus, they are not mandatory. SAP's international environment makes it an excellent choice for non-German speakers in the tech industry.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. &lt;a href="https://www.google.com/aclk?sa=l&amp;amp;ai=DChsSEwjm4Nv60oKPAxUOmIMHHR1IDNIYACICCAEQABoCZWY&amp;amp;co=1&amp;amp;ase=2&amp;amp;gclid=Cj0KCQjwqebEBhD9ARIsAFZMbfxLCq0jSjIE5ffwcWHYV_KrZoZDeaxFkNs5k2Fa596spe5OrxdCS1EaAuVuEALw_wcB&amp;amp;category=acrcp_v1_48&amp;amp;sig=AOD64_0_7Ny472YFvJtBkUmJ9CO76aqTWA&amp;amp;q&amp;amp;nis=4&amp;amp;adurl&amp;amp;ved=2ahUKEwiAsdP60oKPAxUs6wIHHQ7CJNIQ0Qx6BAgOEAE" rel="noopener noreferrer"&gt;ALDI Süd&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;ALDI Süd is a global discount supermarket chain with its headquarters in North Rhine-Westphalia. With over 6,500 stores worldwide and approximately 155,000 employees, it's one of the largest retailers in Germany.&lt;/p&gt;

&lt;p&gt;My experience: I had two interviews with ALDI Süd, and both were conducted entirely in English. The interviewers didn't even ask if I preferred English or German, indicating their comfort with an international workforce. Their IT department particularly operates in an English-speaking environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. &lt;a href="https://kaufland-ecommerce.com/karriere/jobs/" rel="noopener noreferrer"&gt;Kaufland E-commerce&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Kaufland is a German hypermarket chain with a growing e-commerce division. Part of the Schwarz Group (which also owns Lidl), Kaufland has approximately 132,000 employees across Europe and a strong presence in the digital retail space.&lt;/p&gt;

&lt;p&gt;My experience: I met their recruiters and team members at the ITCS Tech Conference in Cologne. When I inquired about German language requirements, the manager confirmed they operate in an English-speaking environment. Although I didn't receive an interview opportunity, this information was surprising and valuable.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. &lt;a href="https://www.free-now.com/career/jobs/" rel="noopener noreferrer"&gt;Freenow&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Freenow is a mobility service provider headquartered in Hamburg. With around 1,000 employees, it's one of Europe's leading mobility platforms operating in over 100 European cities.&lt;/p&gt;

&lt;p&gt;Freenow has established itself as a technology-driven company with a diverse, international team. Their working language is English, making it accessible for international tech professionals. The company offers various roles in software development, data science, and product management.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. &lt;a href="https://www.allianzgi.com/en/our-firm/career" rel="noopener noreferrer"&gt;&lt;strong&gt;Allianz Global Investors&lt;/strong&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Allianz Global Investors (AllianzGI) is a major asset management company headquartered in Frankfurt. As part of the Allianz Group, one of the world's largest financial services providers, AllianzGI manages approximately €582 billion in assets for institutional and retail investors worldwide. With over 25 offices globally and around 2,500 employees, the company has a truly international presence.&lt;/p&gt;

&lt;p&gt;My experience: I advanced to the second round of interviews. The first round was with an HR representative located in Romania, and the second round involved three managers based in Germany, two of whom didn't speak German. This confirms their full English working environment, especially in their investment division.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. &lt;a href="https://www.holidaycheckgroup.com/karriere/" rel="noopener noreferrer"&gt;HolidayCheck&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;HolidayCheck is a leading online travel agency and review site headquartered in Munich. With approximately 300 employees, it's a significant player in the European travel tech sector.&lt;/p&gt;

&lt;p&gt;As one of Germany's most popular travel platforms, HolidayCheck maintains an English-speaking work environment to accommodate its international team. The company focuses on technology and user experience, offering various positions for software engineers, product managers, and data specialists.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. &lt;a href="https://www.uniper.energy/de/karriere/stellenangebote?gad_source=1&amp;amp;gad_campaignid=18308266038&amp;amp;gbraid=0AAAAADf56GV-Af-xtlRYYldCJn0ESg-a-&amp;amp;gclid=Cj0KCQjwqebEBhD9ARIsAFZMbfxv6ACF8orvRFkGiwehWGwIF6EarDgz3NgrczDeKC5xXQo4doj3BfUaAsm8EALw_wcB" rel="noopener noreferrer"&gt;Uniper&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Uniper is a German energy company that focuses on power generation, global energy trading, and energy services. Uniper’s headquarters is located in Düsseldorf, Germany. The company operates across multiple countries, with key markets in Europe, Russia, and other parts of the globe. Uniper employs roughly 7,000 people worldwide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Supporting Each Other in the German Job Market
&lt;/h2&gt;

&lt;p&gt;Finding a job in Germany without fluent German skills can be challenging. I hope this list helps reduce uncertainty in your job search journey.&lt;/p&gt;

&lt;p&gt;If you know other English-speaking companies in Germany, please share in the comments. Together, we can build a comprehensive resource for international job seekers navigating the German job market.&lt;/p&gt;

&lt;p&gt;Your contributions could significantly ease someone else's job search in Germany's economy!&lt;/p&gt;




&lt;h2&gt;
  
  
  Explore more
&lt;/h2&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__user ltag__user__id__1230121"&gt;
    &lt;a href="/luca1iu" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1230121%2F2521cc84-ad7d-458c-99e5-b4d82f625a88.jpg" alt="luca1iu image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/luca1iu"&gt;Luca Liu&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/luca1iu"&gt;Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;




&lt;p&gt;Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/lucaliu-data" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;🚀 Connect with me on LinkedIn&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>germany</category>
      <category>job</category>
      <category>career</category>
    </item>
  </channel>
</rss>
