<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ivica Kolenkaš</title>
    <description>The latest articles on Forem by Ivica Kolenkaš (@ivicak).</description>
    <link>https://forem.com/ivicak</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1048060%2Fdafc17e0-8f37-43ee-ba98-257a69ae5f98.jpeg</url>
      <title>Forem: Ivica Kolenkaš</title>
      <link>https://forem.com/ivicak</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ivicak"/>
    <language>en</language>
    <item>
      <title>Unable to emit metadata to DataHub GMS with Airflow - a solution</title>
      <dc:creator>Ivica Kolenkaš</dc:creator>
      <pubDate>Thu, 14 Aug 2025 05:47:04 +0000</pubDate>
      <link>https://forem.com/ivicak/unable-to-emit-metadata-to-datahub-gms-with-airflow-a-solution-22ke</link>
      <guid>https://forem.com/ivicak/unable-to-emit-metadata-to-datahub-gms-with-airflow-a-solution-22ke</guid>
      <description>&lt;p&gt;&lt;a href="https://datahub.com/" rel="noopener noreferrer"&gt;DataHub&lt;/a&gt; is a popular open-source data catalog  and its &lt;a href="https://datahub.com/blog/data-lineage-what-it-is-and-why-it-matters/" rel="noopener noreferrer"&gt;Lineage feature&lt;/a&gt; is one of its highlights.&lt;/p&gt;

&lt;p&gt;Doing ingestion or data processing with &lt;a href="https://airflow.apache.org/" rel="noopener noreferrer"&gt;Airflow&lt;/a&gt;, a very popular open-source platform for developing and running workflows, is a fairly common setup. &lt;a href="https://docs.datahub.com/docs/lineage/airflow#automatic-lineage-extraction" rel="noopener noreferrer"&gt;DataHub's automatic lineage extraction&lt;/a&gt; works great with Airflow - provided you configure the Airflow connection to DataHub correctly.&lt;/p&gt;

&lt;p&gt;This article shows how to resolve the infamous &lt;code&gt;Unable to emit metadata to DataHub GMS&lt;/code&gt; when using the &lt;a href="https://pypi.org/project/acryl-datahub-airflow-plugin/" rel="noopener noreferrer"&gt;DataHub Airflow plugin&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;datahub.configuration.common.OperationalError: (
'Unable to emit metadata to DataHub GMS', 
{
  'message': '404 Client Error: Not Found for url: &amp;lt;a href="https://my-datahub-host.net/aspects?action=ingestProposal"
})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;URL-encode the &lt;code&gt;host&lt;/code&gt; portion of your Airflow connection string:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;correct: &lt;code&gt;datahub-rest://my-datahub-host.net%2Fapi%2Fgms&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;incorrect: &lt;code&gt;datahub-rest://my-datahub-host.net/api/gms&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The problem - 404, wrong URL
&lt;/h2&gt;

&lt;p&gt;The problem is very obvious from the error message; &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/404" rel="noopener noreferrer"&gt;404 - Not Found&lt;/a&gt;, indicating that the URL does not exist or is wrong.&lt;/p&gt;

&lt;p&gt;A quick glance at the DataHub API docs shows that the REST API is available at &lt;code&gt;https://my-datahub-host.net/api/gms&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Compare that to the URL reported in the error message above and it is obvious that &lt;strong&gt;&lt;code&gt;/api/gms&lt;/code&gt;&lt;/strong&gt; is missing from our Airflow connection string - woohooo!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvzurpnblf1he1opy8vme.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvzurpnblf1he1opy8vme.webp" alt="A developer celebrating a new error message" width="800" height="773"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So I quickly look at HashiCorp Vault, which we use as the &lt;a href="https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/secrets-backend/index.html" rel="noopener noreferrer"&gt;external connections store&lt;/a&gt; in our Airflow deployments and the connection string looks just fine to me - &lt;code&gt;/api/gms&lt;/code&gt; is there.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;datahub-rest://https://:TOKEN@https%3A%2F%2Fmy-datahub-host.net/api/gms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's check how Airflow "understands" the connection because that is what it will use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;airflow connections get datahub_rest_default &lt;span class="nt"&gt;--output&lt;/span&gt; yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which outputs&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# shortened for brevity&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;conn_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;datahub_rest_default&lt;/span&gt;
  &lt;span class="na"&gt;conn_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;datahub_rest&lt;/span&gt;
  &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://my-datahub-host.net&lt;/span&gt;
  &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;api/gms'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two issues pop out immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;schema&lt;/code&gt; should only have &lt;code&gt;http&lt;/code&gt; or &lt;code&gt;https&lt;/code&gt; in it, not &lt;code&gt;/api/gms&lt;/code&gt; (&lt;a href="https://airflow.apache.org/docs/apache-airflow-providers-http/stable/connections/http.html" rel="noopener noreferrer"&gt;source&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;host&lt;/code&gt; value is missing the &lt;code&gt;/api/gms&lt;/code&gt; path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To understand why the connection is not parsed properly, lets look at how &lt;a href="https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/connections.html" rel="noopener noreferrer"&gt;Connections&lt;/a&gt; look like under the hood.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anatomy of an Airflow connection
&lt;/h2&gt;

&lt;p&gt;An Airflow &lt;code&gt;Connection&lt;/code&gt; object (&lt;a href="https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/definitions/connection.py#L34" rel="noopener noreferrer"&gt;source&lt;/a&gt;) is very well documented so I won't repeat that; use the source, Luke:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Connection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    A connection to an external data source.

    :param conn_id: The connection ID.
    :param conn_type: The connection type.
    :param description: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The connection description.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    :param host: The host.
    :param login: The login.
    :param password: The password.
    :param schema: The schema.
    :param port: The port number.
    :param extra: Extra metadata. Non-standard data such as private/SSH keys can be saved here. JSON
        encoded object.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is good to know that a Connection can also be represented as a &lt;a href="https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/definitions/connection.py#L62" rel="noopener noreferrer"&gt;connection string (also called an URI)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Because we are dealing with an &lt;a href="https://airflow.apache.org/docs/apache-airflow-providers-http/stable/connections/http.html" rel="noopener noreferrer"&gt;HTTP connection&lt;/a&gt;, the &lt;code&gt;HOST&lt;/code&gt; consists of the full URL, including the path; for example, a connection URI for Google Images page would look like &lt;code&gt;http://https://google.com/imghp&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Airflow connection parsing
&lt;/h2&gt;

&lt;p&gt;DataHub's &lt;code&gt;DatahubRestHook&lt;/code&gt; (&lt;a href="https://github.com/datahub-project/datahub/blob/master/metadata-ingestion-modules/airflow-plugin/src/datahub_airflow_plugin/hooks/datahub.py#L24" rel="noopener noreferrer"&gt;source&lt;/a&gt;) is based on Airflow's &lt;code&gt;BaseHook&lt;/code&gt; (&lt;a href="https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/bases/hook.py#L30" rel="noopener noreferrer"&gt;source&lt;/a&gt;) so it inherits this method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@classmethod&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conn_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Connection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Get connection, given connection id.

    :param conn_id: connection id
    :return: connection
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# shortened for brevity
&lt;/span&gt;    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ConnectionModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_connection_from_secrets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From it, and several levels down the code path, we find a &lt;a href="https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/connection.py#L211" rel="noopener noreferrer"&gt;function that parses the URI&lt;/a&gt; to turn it into a &lt;code&gt;Connection&lt;/code&gt; object. To simplify the demo below, I've extracted a couple of functions that parse the URI this into &lt;a href="https://gist.github.com/ivica-k/71bd20f8af1bd7c68c94237b23481874" rel="noopener noreferrer"&gt;this Gist&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Using that code standalone (without Airflow) shows exactly what's wrong with the connection:&lt;/p&gt;

&lt;p&gt;Connection string without URL-encoding it first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow_connection_parse&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;parse_from_uri&lt;/span&gt;

&lt;span class="nf"&gt;parse_from_uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;datahub-rest://https://:TOKEN@my-datahub-host.net/api/gms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;my&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;datahub&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt;
&lt;span class="n"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;gms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Connection string with the URL (&lt;code&gt;host&lt;/code&gt;) being URL-encoded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow_connection_parse&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;parse_from_uri&lt;/span&gt;

&lt;span class="nf"&gt;parse_from_uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;datahub-rest://https://:TOKEN@my-datahub-host.net%2Fapi%2Fgms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;my&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;datahub&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;gms&lt;/span&gt;
&lt;span class="n"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;schema&lt;/code&gt; is empty and the &lt;code&gt;host&lt;/code&gt; contains &lt;code&gt;/api/gms&lt;/code&gt; as it should according to the &lt;a href="https://docs.datahub.com/docs/lineage/airflow#configuration" rel="noopener noreferrer"&gt;DataHub's Airflow integration&lt;/a&gt; docs.&lt;/p&gt;




&lt;p&gt;URL-encode your connection strings if you're creating them outside of the Airflow ecosystem - using the Airflow UI (not recommended for production) or the &lt;code&gt;airflow&lt;/code&gt; CLI will take care of that for you.&lt;/p&gt;

&lt;p&gt;In our case, all the connections are managed with Terraform and the code for it missed a simple &lt;code&gt;urlencode(MY_HOST_HERE)&lt;/code&gt; function call.&lt;/p&gt;

</description>
      <category>airflow</category>
      <category>dataengineering</category>
      <category>datahub</category>
      <category>data</category>
    </item>
    <item>
      <title>Data and analytics reimagined - platform architecture</title>
      <dc:creator>Ivica Kolenkaš</dc:creator>
      <pubDate>Fri, 25 Jul 2025 14:10:11 +0000</pubDate>
      <link>https://forem.com/ivicak/data-and-analytics-reimagined-platform-architecture-3f9j</link>
      <guid>https://forem.com/ivicak/data-and-analytics-reimagined-platform-architecture-3f9j</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/ivicak/data-and-analytics-reimagined-with-terraform-and-devops-principles-1g0b"&gt;The previous article&lt;/a&gt; in the series introduces the business, its data-related challenges and a vision of a data platform to tackle them.&lt;/p&gt;

&lt;p&gt;This article goes over the architecture of the platform in its most basic form - arrows and rectangles.&lt;/p&gt;




&lt;p&gt;A business analyst trying to answer "&lt;em&gt;How many white shirts do I need?&lt;/em&gt;" will have their work made easier by having all the relevant sales data in a uniform shape and in the same place. The reality is, the sales data from previous years is fragmented in several data silos.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpdgnkqg5ctib8fmmupz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpdgnkqg5ctib8fmmupz.png" alt="Relevant data - wish vs. reality" width="800" height="586"&gt;&lt;/a&gt;&lt;br&gt;Relevant data - wish vs. reality
  &lt;/p&gt;

&lt;p&gt;These data silos will have differently shaped data (databases, files, or worse), their security practices (if any) will be different and ownership may or may not be known.&lt;/p&gt;

&lt;p&gt;Having all the data be uniform and in the same place is a hard nut to crack for a large organization with highly autonomous teams. A good alternative is to have the data in a similar-enough place, and uniform-enough shape so that it appears "the same".&lt;/p&gt;

&lt;p&gt;To get it into a similar-enough place and a uniform-enough shape, we put a fence around it and it became a &lt;a href="https://www.getdbt.com/blog/data-domains" rel="noopener noreferrer"&gt;data domain&lt;/a&gt;. Very similar to &lt;a href="https://en.wikipedia.org/wiki/Containerization" rel="noopener noreferrer"&gt;containerization&lt;/a&gt; of shipped goods and &lt;a href="https://en.wikipedia.org/wiki/Containerization_(computing)" rel="noopener noreferrer"&gt;software products&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data domains
&lt;/h2&gt;

&lt;p&gt;"&lt;em&gt;What is a data domain on our platform?&lt;/em&gt;" is a question that has a different answer depending on who you ask. For a data engineer, a data domain is a grouping of similar data; for example, all sales data for the wholesale sales channel belongs to a WHOLESALE data domain.&lt;/p&gt;

&lt;p&gt;A security specialist would argue that a data domain is a security boundary, while a data platform engineer will say that a data domain is a collection of - spoiler alert - AzureAD groups and Snowflake objects.&lt;/p&gt;

&lt;p&gt;All three engineers are correct; a data domain is a grouping of data that belongs together, forms a security boundary, and is formed by, in our case, Snowflake objects and AzureAD groups.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwfmc60npg7h8rz9rnq05.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwfmc60npg7h8rz9rnq05.png" alt="Relevant data in a similar-enough place; a data domain" width="800" height="337"&gt;&lt;/a&gt;&lt;br&gt;Relevant data in a similar-enough place; a data domain
  &lt;/p&gt;

&lt;p&gt;Encapsulating data in a data domain helps us tackle the three main challenges of the existing data landscape:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;clear ownership&lt;/li&gt;
&lt;li&gt;defined security guidelines&lt;/li&gt;
&lt;li&gt;defined shape of data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data domain ownership
&lt;/h3&gt;

&lt;p&gt;Each data domain must have an owner and a self-sufficient data domain team behind it.&lt;/p&gt;

&lt;p&gt;Being self-sufficient means that they own and manage the data lifecycle and the domain fully. Ingestion of raw data, its transformation or serving as data products is entirely up to them.&lt;/p&gt;

&lt;p&gt;"&lt;em&gt;Want to make another data product?&lt;/em&gt;" Sure. "&lt;em&gt;Want to delete all of them?&lt;/em&gt;" Absolutely.&lt;/p&gt;

&lt;p&gt;Ownership over a data domain can be split into two:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;business (administrative) ownership&lt;/li&gt;
&lt;li&gt;technical ownership&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both types of owners have an overlapping right and responsibility; to manage the access to the data they own, while being aware of any sensitive data. They are expected to reject data access requests that do not conform to rules.&lt;/p&gt;

&lt;p&gt;The main difference is that a business owner manages access for people to perform data exploration, while the technical owner will manage it for automated processes using code (more on that in the next article!).&lt;/p&gt;

&lt;p&gt;For everyone outside of the domain, domain owners serve as a point of contact regarding the data they own.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security guidelines
&lt;/h3&gt;

&lt;p&gt;One selling point of our data platform, &lt;a href="https://dev.to/ivicak/data-and-analytics-reimagined-with-terraform-and-devops-principles-1g0b"&gt;our curated collection of tools, standards and processes&lt;/a&gt; is exactly that - standards.&lt;/p&gt;

&lt;p&gt;Same security standards being applied to every data domain makes accessing the data sets in those domains a seamless experience. No matter where the data set you need is located, getting to it is technically the same.&lt;/p&gt;

&lt;p&gt;For people accessing data, this means having one of the standardized domains roles that grant read or write permissions.&lt;/p&gt;

&lt;p&gt;For machines (think scheduled jobs, automated processes etc.) this means having a service principal that authenticates with a private key, among other things.&lt;/p&gt;

&lt;p&gt;Our security guidelines span all the systems and tools we offer on the platform. Deviations must have a very strong business case - after all, the platform should be &lt;a href="https://dev.to/ivicak/data-and-analytics-reimagined-with-terraform-and-devops-principles-1g0b#tenets"&gt;&lt;em&gt;the way&lt;/em&gt; but not in the way&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shape of data products
&lt;/h3&gt;

&lt;p&gt;This aspect of the platform is the trickiest to tackle from the platform perspective since the shape of data products is ultimately chosen by the owning data domain team.&lt;/p&gt;

&lt;p&gt;We can however standardize on basic rules that the shape of a data product is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;managed with code ("&lt;a href="https://www.atlassian.com/git/tutorials/why-git" rel="noopener noreferrer"&gt;Why Git&lt;/a&gt;" from Atlassian)&lt;/li&gt;
&lt;li&gt;described with a data contract&lt;/li&gt;
&lt;li&gt;true to that data contract&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data products are meant to be used and they provide an interface. In the same way that you know how to operate a door by its interface - &lt;a href="https://uxdesign.cc/intro-to-ux-the-norman-door-61f8120b6086" rel="noopener noreferrer"&gt;unless they're Norman doors&lt;/a&gt; - you should know how to use a data product by seeing its interface, its contract.&lt;/p&gt;

&lt;p&gt;Data domain teams are responsible for defining data contracts and making sure their data products adhere to them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data mesh
&lt;/h2&gt;

&lt;p&gt;Data domains on their own are powerful, but their superpower is in their ability to interconnect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxiwrgof5rjane7my4px3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxiwrgof5rjane7my4px3.png" alt="Data mesh" width="800" height="393"&gt;&lt;/a&gt;&lt;br&gt;Data mesh
  &lt;/p&gt;

&lt;p&gt;A great definition of a data mesh architecture: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A data mesh architecture is a decentralized approach that enables domain teams to perform cross-domain data analysis on their own. At its core is the domain with its responsible team and its operational and analytical data. The domain team ingests operational data and builds analytical data models as data products to perform their own analysis. It may also choose to publish data products with data contracts to serve other domains’ data needs. &lt;a href="https://www.datamesh-architecture.com/#how-to-design-a-data-mesh" rel="noopener noreferrer"&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Data domains are the nodes in this mesh. What makes it a proper &lt;em&gt;mesh&lt;/em&gt; are the (data) connections between these domains.&lt;/p&gt;

&lt;p&gt;A business analyst trying to answer "&lt;em&gt;How many white shirts do I need?&lt;/em&gt;" now has a data mesh at their disposal. A data mesh made up of clearly defined data domains, each with an owner, with described and maintained data products that adhere to a contract and have the same security guidelines.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnf633hojx2ig61xbmuja.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnf633hojx2ig61xbmuja.png" alt="Business analyst using a data mesh" width="800" height="553"&gt;&lt;/a&gt;&lt;br&gt;Business analyst using a data mesh
  &lt;/p&gt;

&lt;p&gt;They know who to contact regarding data access (they can even request it themselves!), and once that access is given, the data is stored in a uniform place and is secured in a uniform way.&lt;/p&gt;

&lt;p&gt;By building data domains and by organizing them into a data mesh we have established a framework for organizing and connecting data on our platform. More importantly, we have set the groundwork for success and hugely improved our data landscape.&lt;/p&gt;




&lt;p&gt;This article went over the base architecture of our platform and certain standards applied to objects in it.&lt;/p&gt;

&lt;p&gt;But what are &lt;a href="https://diablo.fandom.com/wiki/Tools_of_the_Trade_(Quest)" rel="noopener noreferrer"&gt;the tools of the trade&lt;/a&gt;? Which processes are standardized on the platform? The third article in the series goes over the tools chosen to make up the platform and to maintain it.&lt;/p&gt;

</description>
      <category>datamesh</category>
      <category>snowflake</category>
      <category>terraform</category>
      <category>platformengineering</category>
    </item>
    <item>
      <title>Data and analytics reimagined with Terraform and DevOps principles</title>
      <dc:creator>Ivica Kolenkaš</dc:creator>
      <pubDate>Wed, 16 Jul 2025 18:19:41 +0000</pubDate>
      <link>https://forem.com/ivicak/data-and-analytics-reimagined-with-terraform-and-devops-principles-1g0b</link>
      <guid>https://forem.com/ivicak/data-and-analytics-reimagined-with-terraform-and-devops-principles-1g0b</guid>
      <description>&lt;p&gt;"&lt;em&gt;How many white shirts do I need?&lt;/em&gt;" is a very simple question to answer for you and me. Answering that same question as a demand planner in a fashion enterprise requires a data driven approach, because the consequences of being wrong are far reaching. To be data driven, you must have actionable data.&lt;/p&gt;

&lt;p&gt;The following blog series describes the path that &lt;a href="https://bestseller.com/" rel="noopener noreferrer"&gt;BESTSELLER's&lt;/a&gt; Data &amp;amp; Analytics Platform started 2.5 years ago in an effort to produce actionable data. It will reason about the choices we made, focus on key challenges we faced, and celebrate the big wins that we got from it.&lt;/p&gt;




&lt;p&gt;The first step on this path is to understand the business and the platform we're building for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The business
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://bestseller.com/" rel="noopener noreferrer"&gt;BESTSELLER&lt;/a&gt; is a family-owned fashion business that is a home to more than 22000 colleagues across design, logistics and tech. They work for more than 20 brands, including JACK &amp;amp; JONES, ONLY and VERO MODA, across 75 countries.&lt;/p&gt;

&lt;p&gt;Each brand in this multi-brand matrix organisation operates with a high degree of independence, which allows them to remain agile in their decision making process. Brand's identities are different, but their operational practices remain the same. That's why shared functions, such as the Data &amp;amp; Analytics platform provide the common building blocks for the brands to use.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdeeuc037joe6yau8jgqc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdeeuc037joe6yau8jgqc.png" alt="data analytics as a shared function of the business" width="800" height="567"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;During daily operations, clothes are sold through multiple sales channels (retail, wholesale, online etc.) which produces large amounts of operational data. It makes every sense to use this data for analytical purposes - to understand the past and predict the future. But with the high degree of independence comes the responsibility to exercise it responsibly. Over the years, several data silos formed, each with its own governance practices, levels of data maturity and ownership structures.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv1hbizb3q3h1wr0eqgka.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv1hbizb3q3h1wr0eqgka.png" alt="a web of point-to-point connection" width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These silos made answering the question "&lt;em&gt;How many white shirts do I need?&lt;/em&gt;" very difficult, because the data you need to answer it is scattered, possibly unavailable and under unclear ownership. Data producers and consumers started to become connected in an ever-growing web of point-to-point connections. Even worse, those connections were between clients and databases located on-premise or in the cloud, semi-accessible data stores, semi-structured files and even data stores on personal laptops.&lt;/p&gt;

&lt;p&gt;The business is ambitious and we have clear goals:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We want to open one retail store each working day in 2026.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To live up to the expectations of the business, and to answer "&lt;em&gt;How many white shirts do I need?&lt;/em&gt;" reliably, across 20 brands and across multiple sales channels, we needed a structured solution that provides actionable data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The platform
&lt;/h2&gt;

&lt;p&gt;The data &amp;amp; analytics platform (the platform) we're building has a very clear goal - to enable data and machine learning (ML) engineers to create data products and make them readily available to various parts of the business. These data products can be schemas or views in a database, (semi)structured files unloaded to blob storage, various reports compiled for executives or anything else in between.&lt;/p&gt;

&lt;p&gt;Data products are used by various departments in the company to understand the past and predict the future. When used by business analysts and decision makers, these data products help in demand planning, supply chain optimizations, understanding the environmental impact and so on. They augment the decision making process with data.&lt;/p&gt;

&lt;p&gt;When describing it to prospective stakeholders, we describe the platform as a highway, which enables you to drive from point A to point B. You could be driving a small electric car or a diesel-powered truck; the highway serves the same purpose. The signage is uniform, the speed limits are known in advance, and the rules apply to every vehicle.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tenets
&lt;/h3&gt;

&lt;p&gt;We are building our platform to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;The&lt;/em&gt; way, but not in the way.&lt;/li&gt;
&lt;li&gt;Flexible, while having general rules. &lt;/li&gt;
&lt;li&gt;In service of those who are using it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These three tenets shape our vision, decisions and priorities.&lt;/p&gt;




&lt;p&gt;If I had to describe the platform in a single sentence, I would say that it is a curated collection of tools, standards and processes to ingest, store, transform and serve data.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/ivicak/data-and-analytics-reimagined-platform-architecture-3f9j"&gt;second article in the series&lt;/a&gt; explains the architecture chosen for the platform and what challenges it is meant to solve. &lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>snowflake</category>
      <category>terraform</category>
      <category>devops</category>
    </item>
    <item>
      <title>Validating Terraform configuration just got easier</title>
      <dc:creator>Ivica Kolenkaš</dc:creator>
      <pubDate>Mon, 30 Dec 2024 22:20:37 +0000</pubDate>
      <link>https://forem.com/ivicak/validating-terraform-configuration-just-got-easier-1ioh</link>
      <guid>https://forem.com/ivicak/validating-terraform-configuration-just-got-easier-1ioh</guid>
      <description>&lt;p&gt;Upgrading provider versions is essential for keeping your infrastructure managed with Terraform up-to-date and feature-rich.&lt;/p&gt;

&lt;p&gt;In software engineering, it is inevitable that components (functions, APIs, implementations) will become deprecated and be phased out. Luckily, Terraform has a &lt;a href="https://developer.hashicorp.com/terraform/language/providers/requirements#version-constraints" rel="noopener noreferrer"&gt;robust way of managing provider versions&lt;/a&gt; and &lt;a href="https://developer.hashicorp.com/terraform/cli/commands/validate" rel="noopener noreferrer"&gt;validating your configuration&lt;/a&gt;, so that you can understand which resources are deprecated or misconfigured at the moment.&lt;/p&gt;

&lt;p&gt;Working with the output of Terraform's &lt;code&gt;validate&lt;/code&gt; command is not always convenient, considering that it can easily be over 50000 (yes, fifty thousand) lines.&lt;/p&gt;

&lt;h2&gt;
  
  
  A bit on &lt;code&gt;terraform validate&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;I was recently in such a situation; my Terraform state has close to 3600 resource instances, 2075 of which were deprecated - a cool 57% of all resource instances 😄&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;│ This resource is deprecated and will be removed in a future major version release.
│ 
│ (and 4133 more similar warnings elsewhere)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;terraform validate&lt;/code&gt; (&lt;a href="https://developer.hashicorp.com/terraform/cli/commands/validate" rel="noopener noreferrer"&gt;docs&lt;/a&gt;) is a great tool - it shows you all the details about deprecated and misconfigured resource instances that need your attention:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "format_version": "1.0",
  "valid": true,
  "error_count": 0,
  "warning_count": 2075,
  "diagnostics": [
    {
      "severity": "warning",
      "summary": "Deprecated Resource",
      "detail": "This resource is deprecated and will be removed in a future major version release. Please use CDEF instead.",
      "address": "module.some.module.address.abcd.name",
      "range": {
        "filename": ".terraform/modules/some.address/main.tf",
        "start": {
          "line": 539,
          "column": 71,
          "byte": 13195
        },
        "end": {
          "line": 539,
          "column": 72,
          "byte": 13196
        }
      }
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The only issue is that the output file of &lt;code&gt;terraform validate -json&lt;/code&gt; has more than 50000 lines and is not very convenient to work with. &lt;code&gt;terraform-validate-explorer&lt;/code&gt; to the rescue!&lt;/p&gt;

&lt;h2&gt;
  
  
  terraform-validate-explorer
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;terraform-validate-explorer&lt;/code&gt; is a tool that helps you search and filter resource instances from the output of &lt;code&gt;terraform validate -json&lt;/code&gt;. Get it from this &lt;a href="https://github.com/ivica-k/terraform-validate-explorer/tree/main" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The idea for this tool came from a situation at work: the state file has many Snowflake resources, and the &lt;a href="https://registry.terraform.io/providers/Snowflake-Labs/snowflake" rel="noopener noreferrer"&gt;Terraform provider for Snowflake&lt;/a&gt; has undergone many changes in the past year, leading to plenty of deprecations.&lt;/p&gt;

&lt;p&gt;Version &lt;code&gt;1.x&lt;/code&gt; of the Snowflake provider became available and I wanted to upgrade the provider, meaning that I had to deal with 2075 resource instances that were deprecated. Some of these manage account role grants and I don't want to break those. As a matter of fact, I don't want to break anything for my stakeholders, so I decided to take things slowly.&lt;/p&gt;

&lt;h3&gt;
  
  
  "contains" filter
&lt;/h3&gt;

&lt;p&gt;Upgrading these resources one-by-one means that I have to find them first, and this is where the "contains" filter helped me:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsh01rfi9o4r4qhkro7q8.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsh01rfi9o4r4qhkro7q8.jpg" alt="contains tables_future_read" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The screenshot above shows a search for all resources that have &lt;code&gt;tables_future_read&lt;/code&gt; in the name (Snowflake's &lt;a href="https://docs.snowflake.com/en/sql-reference/sql/grant-privilege#future-grants-on-database-or-schema-objects" rel="noopener noreferrer"&gt;"future" grants&lt;/a&gt; are amazing btw!)&lt;/p&gt;

&lt;h3&gt;
  
  
  "does not contain" filter
&lt;/h3&gt;

&lt;p&gt;To verify that only &lt;code&gt;snowflake_&lt;/code&gt; resources are deprecated, I filtered all the warnings that do not contain the word &lt;code&gt;snowflake&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp5y0ba905v6y449ryxef.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp5y0ba905v6y449ryxef.png" alt="does not contain snowflake" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No errors and no warnings - perfect!&lt;/p&gt;

&lt;h3&gt;
  
  
  "regex" filter
&lt;/h3&gt;

&lt;p&gt;If the other two filters are not cutting it for you, you can always do it with &lt;a href="https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/" rel="noopener noreferrer"&gt;one of the worlds's write-only languages&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Suppose you want to look for a resource instance that has &lt;code&gt;future_&lt;/code&gt; in the name, followed by a four-letter word that is at the end of the resource name:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fukhd3yn6inp9819dsd6d.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fukhd3yn6inp9819dsd6d.jpg" alt="regex search" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With regex, sky's the limit! Also the 255 character limit I put on that &lt;a href="https://doc.qt.io/qt-6/qlineedit.html" rel="noopener noreferrer"&gt;QLineEdit&lt;/a&gt; is the limit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Output file validation
&lt;/h3&gt;

&lt;p&gt;If the output file of &lt;code&gt;terraform validate -json&lt;/code&gt; was somehow made invalid (with errrrm, manual edits?), &lt;code&gt;terraform-validate-explorer&lt;/code&gt; will check for that too:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fls874d9nu4om4vg403u9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fls874d9nu4om4vg403u9.png" alt="invalid file" width="800" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;terraform-validate-explorer&lt;/code&gt; is simple at the moment, with just the basic functionality. To make it more useful and more stable in the future, I plan to implement:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unit tests&lt;/li&gt;
&lt;li&gt;error handling&lt;/li&gt;
&lt;li&gt;showing only unique resources&lt;/li&gt;
&lt;li&gt;filtering an already filtered dataset&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>terraform</category>
      <category>devops</category>
      <category>tooling</category>
      <category>parser</category>
    </item>
    <item>
      <title>Import AzureAD app role assignments into Terraform state</title>
      <dc:creator>Ivica Kolenkaš</dc:creator>
      <pubDate>Sat, 30 Dec 2023 18:07:35 +0000</pubDate>
      <link>https://forem.com/ivicak/import-azuread-app-role-assignments-into-terraform-state-5ckp</link>
      <guid>https://forem.com/ivicak/import-azuread-app-role-assignments-into-terraform-state-5ckp</guid>
      <description>&lt;p&gt;This article shows how to import &lt;a href="https://registry.terraform.io/providers/hashicorp/azuread/latest/docs/resources/app_role_assignment"&gt;AzureAD app role assignments&lt;/a&gt; into the Terraform state. With app role assignments, AzureAD users, groups, or service principals are assigned a role in an application. &lt;a href="https://learn.microsoft.com/en-us/graph/api/resources/approleassignment?view=graph-rest-1.0"&gt;Source&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the use case I'm writing about, AzureAD groups are assigned a role "User" in the AzureAD enterprise application for Snowflake, allowing members of those &lt;a href="https://community.snowflake.com/s/article/HOW-TO-Setup-SSO-with-Azure-AD-and-the-Snowflake-New-URL-Format-or-Privatelink"&gt;AzureAD groups to single sign-on to Snowflake&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  In short
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The problem&lt;/li&gt;
&lt;li&gt;Get Microsoft Graph API token&lt;/li&gt;
&lt;li&gt;
List existing app role assignments using the Graph API

&lt;ul&gt;
&lt;li&gt;Handling multiple assignments&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Import existing app role assignments into the Terraform state&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Context
&lt;/h2&gt;

&lt;p&gt;Creating and managing a data platform involves managing its infrastructure and user access among other things.  In the data platform my team is managing, we use Snowflake for analytical data warehousing and user access is managed from &lt;a href="https://learn.microsoft.com/en-us/entra/identity/saas-apps/snowflake-tutorial"&gt;AzureAD&lt;/a&gt; (Microsoft Entra).&lt;/p&gt;

&lt;p&gt;This setup allows us to use existing AzureAD users and groups and grant them access to roles inside of Snowflake. In short, adding a group named "ECOM_SALES" will allow any member of that group to log in to Snowflake and use a &lt;a href="https://docs.snowflake.com/en/user-guide/security-access-control-considerations#aligning-object-access-with-business-functions"&gt;functional role&lt;/a&gt; named &lt;code&gt;ECOM_SALES&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Below is a screenshot of the Snowflake enterprise application showing multiple groups assigned the role "User":&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--STKSteNF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/b97b7a45wyocgm098sne.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--STKSteNF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/b97b7a45wyocgm098sne.png" alt="Snowflake enterprise application showing added groups" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Being strong believers in infrastructure-as-code, my team manages the entire platform with Terraform. However, during the very early days, some changes were done manually, such as adding AzureAD groups to the enterprise application for Snowflake single sign-on.&lt;/p&gt;

&lt;p&gt;Access management is an important part of the overall platform security and as such should be managed through code. Manual configuration can be overlooked during migrations, it does not follow the &lt;a href="https://www.unido.org/overview/member-states/change-management/faq/what-four-eyes-principle"&gt;four-eyes principle&lt;/a&gt; which can cause outages and does not provide a stable foundation for a data platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;With the understanding of why we want to manage all AzureAD app role assignments with Terraform, let's see how those added manually can be imported and managed with Terraform.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://registry.terraform.io/providers/hashicorp/azuread/latest/docs/resources/app_role_assignment"&gt;The docs&lt;/a&gt; state that app role assignments are imported with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terraform import \
  azuread_app_role_assignment.example \
  00000000-0000-0000-0000-000000000000/appRoleAssignment/aaBBcDDeFG6h5JKLMN2PQrrssTTUUvWWxxxxxyyyzzz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;00000000-0000-0000-0000-000000000000&lt;/code&gt; is the object ID of the service principal object associated with your AzureAD enterprise application for Snowflake.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;aaBBcDDeFG6h5JKLMN2PQrrssTTUUvWWxxxxxyyyzzz&lt;/code&gt; is the app role assignment ID.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I had absolutely no idea where to find the real value of &lt;code&gt;aaBBcDDeFG6h5JKLMN2PQrrssTTUUvWWxxxxxyyyzzz&lt;/code&gt; in the UI.&lt;/p&gt;

&lt;p&gt;Since Terraform providers communicate with 3rd party APIs (AWS, AzureAD etc...) I figured I could do the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Microsoft Graph API token
&lt;/h2&gt;

&lt;p&gt;Microsoft's &lt;a href="https://learn.microsoft.com/en-us/graph/use-the-api"&gt;Graph API&lt;/a&gt; allows programmatic access to the Microsoft Cloud service resources.&lt;/p&gt;

&lt;p&gt;To use it, you must authenticate and obtain a &lt;a href="https://jwt.io/"&gt;JWT&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TOKEN=$(curl -d grant_type=client_credentials \
  -d client_id=$CLIENT_ID \
  -d client_secret=$CLIENT_SECRET \
  -d scope=https://graph.microsoft.com/.default \
  -d resource=https://graph.microsoft.com  \
  https://login.microsoftonline.com/$TENANT_ID/oauth2/token  |
  jq -j .access_token)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Of course, export &lt;code&gt;CLIENT_ID&lt;/code&gt;, &lt;code&gt;CLIENT_SECRET&lt;/code&gt; and &lt;code&gt;TENANT_ID&lt;/code&gt; first.&lt;/p&gt;

&lt;h2&gt;
  
  
  List existing app role assignments using the Graph API
&lt;/h2&gt;

&lt;p&gt;Having the JWT in the &lt;code&gt;TOKEN&lt;/code&gt; variable, we can list the existing app role assignment for group &lt;code&gt;ECOM_SALES&lt;/code&gt; in the AzureAD enterprise application for Snowflake with ID &lt;code&gt;1ab2c3de-f456-7890-fghi-j12345k67lm8&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ASSIGNMENT_ID=$(curl -sX GET \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://graph.microsoft.com/v1.0/servicePrincipals/1ab2c3de-f456-7890-fghi-j12345k67lm8/appRoleAssignedTo |
  jq '.value[] | select(.principalDisplayName =="ECOM_SALES").id')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "@odata.context": "LINK_HERE",
    "@odata.nextLink": "LINK_HERE",
    "value": [
        {
            "id": "ZMW3ujNOGUuENMLa2k6-mVWFENfBHbVDlwQJg1ui848",
            "deletedDateTime": null,
            "appRoleId": "33c0d484-efdc-4e65-b6fc-470f4bcb4f46",
            "createdDateTime": "2023-02-17T08:48:17.1803369Z",
            "principalDisplayName": "ECOM_SALES",
            "principalId": "bab7c564-4e33-4b19-8434-c2dada4ebe99",
            "principalType": "Group",
            "resourceDisplayName": "Snowflake AAD",
            "resourceId": "1ab2c3de-f456-7890-fghi-j12345k67lm8"
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and our variable &lt;code&gt;ASSIGNMENT_ID&lt;/code&gt; has the value &lt;code&gt;ZMW3ujNOGUuENMLa2k6-mVWFENfBHbVDlwQJg1ui848&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;With this value we now have all the elements necessary to import this app role assignment into the Terraform state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multiple assignments
&lt;/h3&gt;

&lt;p&gt;In a situation where you have multiple AzureAD app role assignments for groups such as &lt;code&gt;ECOM_SALES&lt;/code&gt;, &lt;code&gt;ECOM_FINANCE&lt;/code&gt; and &lt;code&gt;ECOM_LEGAL&lt;/code&gt;, you can use the following &lt;code&gt;jq&lt;/code&gt; expression to list them all:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.value[] | select(.principalDisplayName | startswith("ECOM_"))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Import existing app role assignments into the Terraform state
&lt;/h2&gt;

&lt;p&gt;Having the app role assignment ID in a variable named &lt;code&gt;ASSIGNMENT_ID&lt;/code&gt; and the AzureAD enterprise application for Snowflake having ID &lt;code&gt;1ab2c3de-f456-7890-fghi-j12345k67lm8&lt;/code&gt;, we can:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terraform import \
  azuread_app_role_assignment.example \
  1ab2c3de-f456-7890-fghi-j12345k67lm8/appRoleAssignment/$ASSIGNMENT_ID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This article provided a quick solution for importing Azure AD app role assignments into Terraform, addressing the challenge of manual interventions in infrastructure management. Leveraging Microsoft's Graph API and Terraform, we demonstrated a streamlined process to list and import app role assignments.&lt;/p&gt;

</description>
      <category>azuread</category>
      <category>terraform</category>
      <category>snowflake</category>
      <category>dataplatform</category>
    </item>
    <item>
      <title>Practical ECS scaling: horizontally scaling an application based on its response time</title>
      <dc:creator>Ivica Kolenkaš</dc:creator>
      <pubDate>Fri, 22 Dec 2023 17:48:09 +0000</pubDate>
      <link>https://forem.com/aws-builders/practical-ecs-scaling-horizontally-scaling-an-application-based-on-its-response-time-bap</link>
      <guid>https://forem.com/aws-builders/practical-ecs-scaling-horizontally-scaling-an-application-based-on-its-response-time-bap</guid>
      <description>&lt;p&gt;The &lt;a href="https://dev.to/aws-builders/practical-ecs-scaling-vertically-scaling-an-application-with-a-memory-leak-3bel"&gt;previous article&lt;/a&gt; looked at whether changing the performance envelope for an application with a memory leak was effective. This article answers the question: "Should you horizontally scale your application based on its response time?"&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Horizontal scaling&lt;/li&gt;
&lt;li&gt;The endpoint under test&lt;/li&gt;
&lt;li&gt;Running tests&lt;/li&gt;
&lt;li&gt;Should you horizontally scale your application based on response times?&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Horizontal scaling
&lt;/h2&gt;

&lt;p&gt;Since this is the first article that deals with horizontal scaling, a quote from Nathan reminds us what horizontal scaling is.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Horizontal scaling is when you spread your workload across a larger number of application containers. It is based on aggregate resource consumption metrics for the service. For example you can look at average CPU resource consumption metric across all copies of your container. &lt;/p&gt;

&lt;p&gt;When the aggregate average utilization breaches a high threshold you scale out by adding more copies of the container. If it breaches a low threshold you reduce the number of copies of the container. &lt;a href="https://containersonaws.com/presentations/amazon-ecs-scaling-best-practices/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The endpoint under test
&lt;/h2&gt;

&lt;p&gt;Our &lt;a href="https://dev.to/aws-builders/practical-ecs-scaling-an-introduction-4f26"&gt;mock application&lt;/a&gt; comes with this REST API endpoint:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/long_response_time&lt;/code&gt;, simulating an increasingly busy database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When this endpoint is invoked, the application calculates the square root of &lt;code&gt;64 * 64 * 64 * 64 * 64 * 64 ** 64&lt;/code&gt; and saves the result to a database. Due to an increased load on the database, each &lt;code&gt;INSERT&lt;/code&gt; query takes longer and longer to complete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running tests
&lt;/h2&gt;

&lt;p&gt;Your morning just started and that coffee is smelling so good. While sipping it, you glance at your application's monitoring dashboard and notice that the average response time went from ±300ms to more than 2 seconds.&lt;/p&gt;

&lt;p&gt;Not good.&lt;/p&gt;

&lt;p&gt;You decide to configure application autoscaling based on the response time. The idea is that running more containers will help distribute the workload and bring down the response time to acceptable levels.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;scaling&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;auto_scale_task_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_capacity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;scaling&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scale_to_track_custom_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;responsescaling&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;target_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;target_response_time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;period&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;target_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;scale_in_cooldown&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;scale_out_cooldown&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above CDK code configures autoscaling for you service running on ECS. A metric for response time is being tracked, and if its value is bigger than 2, additional ECS tasks are added every minute. The maximum number of tasks is 5.&lt;/p&gt;

&lt;p&gt;The configuration is applied:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6okvz1sxilfv6woxfpud.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6okvz1sxilfv6woxfpud.png" alt="ECS autoscaling configuration"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and it seems to be working! The response time has dropped below 2 seconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ub26xqgaxj0b31xrx8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ub26xqgaxj0b31xrx8x.png" alt="Response time dropping with two running containers"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That cold coffee is the least of your concerns now because increasing the number of tasks helped only temporarily.&lt;/p&gt;

&lt;p&gt;The third and fourth containers starts up fairly quickly but the response time rises relentlessly. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmikp0k6jvwgxn30fesgr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmikp0k6jvwgxn30fesgr.png" alt="Response time rising relentlessly"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After a few minutes, the service scales up to the maximum of 5 tasks to try and cope with the rising response times...&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8sbmpyo6hxo9ajgf8n80.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8sbmpyo6hxo9ajgf8n80.png" alt="Running the maximum number of containers"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;... but it is completely ineffective, as response time is still growing:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdmvkn5cainbbhonuz9jv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdmvkn5cainbbhonuz9jv.png" alt="Response time still rising"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Why is that? &lt;/p&gt;

&lt;p&gt;Well, the ship is taking on water and many sailors are rushing to empty it, but there's only a handful of buckets available 🪣 🪣 🪣.&lt;/p&gt;

&lt;h2&gt;
  
  
  Should you horizontally scale your application based on response times?
&lt;/h2&gt;

&lt;p&gt;You can, but it won't do much good.&lt;/p&gt;

&lt;p&gt;The "&lt;a href="https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/capacity-autoscaling.html" rel="noopener noreferrer"&gt;Identifying a utilization metric&lt;/a&gt;" paragraph has a great explanation on choosing the metric to track and base the autoscaling on.&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The metric must be correlated with demand. When resources are held steady, but demand changes, the metric value must also change. The metric should increase or decrease when demand increases or decreases.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The metric value must scale in proportion to capacity.&lt;/strong&gt; When demand holds constant, adding more resources must result in a proportional change in the metric value. So, doubling the number of tasks should cause the metric to decrease by 50%.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;The part in bold is important because it is applicable in our use case. The metric value is not scaling in proportion to capacity. Doubling the number of tasks (and doubling them again) did not cause the metric value to decrease by 50%.&lt;/p&gt;

&lt;p&gt;An overloaded database can cause response times to skyrocket, and adding more tasks won't help.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In fact, it may actually make it much worse because launching more copies of the application causes more connections to an already overloaded database server. &lt;a href="https://containersonaws.com/presentations/amazon-ecs-scaling-best-practices/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Dear reader, thank you for following my journey through practical ECS scaling. We looked into how a &lt;a href="https://dev.to/aws-builders/practical-ecs-scaling-vertically-scaling-a-cpu-heavy-application-105c"&gt;CPU-heavy application performs better with more CPU resources&lt;/a&gt;, how &lt;a href="https://dev.to/aws-builders/practical-ecs-scaling-vertically-scaling-an-application-with-a-memory-leak-3bel"&gt;memory leaks are like monkey wrenches in the machine&lt;/a&gt; and the futility of horizontally scaling an application when the database is overloaded (this article).&lt;/p&gt;

</description>
      <category>aws</category>
      <category>containers</category>
      <category>scaling</category>
      <category>cdk</category>
    </item>
    <item>
      <title>Practical ECS scaling: vertically scaling an application with a memory leak</title>
      <dc:creator>Ivica Kolenkaš</dc:creator>
      <pubDate>Wed, 20 Dec 2023 18:59:37 +0000</pubDate>
      <link>https://forem.com/aws-builders/practical-ecs-scaling-vertically-scaling-an-application-with-a-memory-leak-3bel</link>
      <guid>https://forem.com/aws-builders/practical-ecs-scaling-vertically-scaling-an-application-with-a-memory-leak-3bel</guid>
      <description>&lt;p&gt;The &lt;a href="https://dev.to/aws-builders/practical-ecs-scaling-vertically-scaling-a-cpu-heavy-application-105c"&gt;previous article&lt;/a&gt; looked at how changing the performance envelope for a CPU-heavy application affected its performance. This article shows whether vertically scaling an application with a memory leak is effective.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The endpoint under test&lt;/li&gt;
&lt;li&gt;Running tests&lt;/li&gt;
&lt;li&gt;
Results

&lt;ul&gt;
&lt;li&gt;Container 1 (1GB of memory)&lt;/li&gt;
&lt;li&gt;Container 2 (2GB of memory)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;How effective is it to vertically scale an application that has a memory leak?&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  The endpoint under test
&lt;/h2&gt;

&lt;p&gt;Our &lt;a href="https://dev.to/ivicak/practical-ecs-scaling-2b05-temp-slug-3558989?preview=609977daf9ecaef438eaca84344f22a40049ed64d0c4f120e2451035ac73faf886c11952d19be6aacd95fc4ed114f985bc762efbd07d1406ead73497#meet-the-app"&gt;mock application&lt;/a&gt; comes with this REST API endpoint:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/memory_leak&lt;/code&gt;, simulating a &lt;a href="https://en.wikipedia.org/wiki/Memory_leak" rel="noopener noreferrer"&gt;memory leak&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When this endpoint is invoked, the application calculates the square root of &lt;code&gt;64 * 64 * 64 * 64 * 64 * 64 ** 64&lt;/code&gt; and returns the result. Due to a bad code merge, 1MB of memory is added to a Python list on each request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;leaked_memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/memory_leak&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;memory_leaky_task&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;leaked_memory&lt;/span&gt;

    &lt;span class="c1"&gt;# appends approx. 1MB of data to a list on each
&lt;/span&gt;    &lt;span class="c1"&gt;# request, creating a memory leak
&lt;/span&gt;    &lt;span class="n"&gt;leaked_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;bytearray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_build_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;do_sqrt&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Running tests
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;It is better to be roughly right than precisely wrong.&lt;br&gt;
  — Alan Greenspan&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To load-test this application, I used &lt;a href="https://github.com/rakyll/hey" rel="noopener noreferrer"&gt;hey&lt;/a&gt; to invoke the endpoint with 5 requests per second for 30 minutes using &lt;code&gt;hey -z 30m -q 1 -c 5 $URL/memory_leak&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;To be able to compare results, I ran the same application in two containers with different hardware configurations:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;CPUs&lt;/th&gt;
&lt;th&gt;Memory (GB)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Container 1&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Container 2&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Container 1 (1GB of memory)
&lt;/h3&gt;

&lt;p&gt;Looking at the summary of &lt;code&gt;hey&lt;/code&gt;, we notice that not all requests were successful:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Summary:
  Total:        1800.0255 secs
  Slowest:      10.1465 secs
  Fastest:      0.0088 secs
  Average:      0.1860 secs
  Requests/sec: 4.3499

Status code distribution:
  [200] 6148 responses
  [502] 341 responses
  [503] 1211 responses
  [504] 130 responses
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Roughly 21% of all requests had a non-200 status code 😞 This is not a great user experience.&lt;/p&gt;

&lt;p&gt;Looking at ECS task details we notice that there are 7 tasks in total, only one of which is running at the moment - other 6 are stopped.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4nwwt1bv12d88xhemhr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4nwwt1bv12d88xhemhr.png" alt="Several stopped tasks" width="800" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To get more details, we can describe one of the stopped tasks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws ecs describe-tasks \
  --cluster ecs-scaling-cluster \
  --tasks 7f0872485e6e421e8f83a062a3704303 |
  jq -r '.tasks[0].containers[0].reason'

OutOfMemoryError: Container killed due to memory usage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;OutOfMemoryError: Container killed due to memory usage&lt;/strong&gt; becomes obvious when looking at the memory utilization metric:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy09c6wo0pkrkn9265m2g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy09c6wo0pkrkn9265m2g.png" alt="Memory utilization" width="800" height="251"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The sawtooth pattern reveals the problem: our application is exceeding the performance envelope for the "Memory" dimension! Each request leaks approx. 1MB of memory, and because the container is given 1GB of memory, serving roughly a 1000 requests leads to the container running out of memory. At that point the ECS service forcefully stops the container that is out of memory and starts a fresh one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Container 2 (2GB of memory)
&lt;/h3&gt;

&lt;p&gt;Running another container with double the memory (1GB → 2GB) and load testing it in the same way we get very similar results, with approx. 7% of all requests having a &lt;code&gt;5xx&lt;/code&gt; status code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Summary:
  Total:        1800.0240 secs
  Slowest:      10.0976 secs
  Fastest:      0.0119 secs
  Average:      0.0850 secs
  Requests/sec: 4.7249

Status code distribution:
  [200] 7868 responses
  [502] 177 responses
  [503] 405 responses
  [504] 55 responses
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this instance only 4 tasks were started, 3 of which were forcefully stopped:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsn05ickg2ux71563zr6a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsn05ickg2ux71563zr6a.png" alt="Several stopped tasks" width="800" height="319"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws ecs describe-tasks \
  --cluster ecs-scaling-cluster \
  --tasks c35b7029c38c4383b26e768aec3c77f2 |
  jq -r '.tasks[0].containers[0].reason'

OutOfMemoryError: Container killed due to memory usage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And again, the sawtooth memory utilization reveals that we have a memory leak:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq396e5gr08051h79qh4r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq396e5gr08051h79qh4r.png" alt="Sawtooth memory utilization" width="800" height="257"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How effective is it to vertically scale an application that has a memory leak?
&lt;/h2&gt;

&lt;p&gt;Not at all.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A memory leak can not be fixed by scaling. You can’t vertically or horizontally scale yourself out of a memory leak. The only way to fix this is to fix the application code. You cannot have scalability with a memory leak. &lt;a href="https://containersonaws.com/presentations/amazon-ecs-scaling-best-practices/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Regardless of its memory configuration, a container with an application that has a memory leak will sooner or later run out of memory.&lt;/p&gt;

&lt;p&gt;AWS re.Post has an article on &lt;a href="https://repost.aws/knowledge-center/ecs-resolve-outofmemory-errors" rel="noopener noreferrer"&gt;troubleshooting OutOfMemory errors&lt;/a&gt;. &lt;a href="https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/" rel="noopener noreferrer"&gt;This blog post&lt;/a&gt; explains how containers (in general, and those running on ECS) consume CPU and memory.&lt;/p&gt;




&lt;p&gt;Next up: Should you horizontally scale your application based on response times?&lt;/p&gt;

&lt;p&gt;You also might want to check how &lt;a href="https://dev.to/aws-builders/practical-ecs-scaling-vertically-scaling-a-cpu-heavy-application-105c"&gt;changing the performance envelope for a CPU-heavy application affects its performance&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>containers</category>
      <category>cdk</category>
      <category>scaling</category>
    </item>
    <item>
      <title>Practical ECS scaling: vertically scaling a CPU-heavy application</title>
      <dc:creator>Ivica Kolenkaš</dc:creator>
      <pubDate>Sun, 17 Dec 2023 15:35:52 +0000</pubDate>
      <link>https://forem.com/aws-builders/practical-ecs-scaling-vertically-scaling-a-cpu-heavy-application-105c</link>
      <guid>https://forem.com/aws-builders/practical-ecs-scaling-vertically-scaling-a-cpu-heavy-application-105c</guid>
      <description>&lt;p&gt;The &lt;a href="https://dev.to/ivicak/practical-ecs-scaling-an-introduction-4f26"&gt;introductory article&lt;/a&gt; defined the performance envelope, and this one looks at how changing the performance envelope for a CPU-heavy application affects its performance.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The endpoint under test&lt;/li&gt;
&lt;li&gt;Running tests&lt;/li&gt;
&lt;li&gt;
Results 

&lt;ul&gt;
&lt;li&gt;Container 1 (0.25CPU)&lt;/li&gt;
&lt;li&gt;Container 2 (0.5CPU)&lt;/li&gt;
&lt;li&gt;Container 3 (1CPU)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Can a CPU-heavy application perform better with more CPU resources?&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  The endpoint under test
&lt;/h2&gt;

&lt;p&gt;Our &lt;a href="https://dev.to/ivicak/practical-ecs-scaling-2b05-temp-slug-3558989?preview=609977daf9ecaef438eaca84344f22a40049ed64d0c4f120e2451035ac73faf886c11952d19be6aacd95fc4ed114f985bc762efbd07d1406ead73497#meet-the-app"&gt;mock application&lt;/a&gt; is built in Flask and has several REST API endpoints, one of which is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/cpu_intensive&lt;/code&gt;, simulating a CPU-intensive task. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When this endpoint is invoked, the application calculates the square root of &lt;code&gt;64 * 64 * 64 * 64 * 64 * 64 ** 64&lt;/code&gt; and returns the result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ http  http://ALB.eu-central-1.elb.amazonaws.com/cpu_intensive

HTTP/1.1 200 OK
Connection: close
Content-Length: 36
Content-Type: application/json
Date: Sun, 26 Nov 2023 15:52:46 GMT
Server: Werkzeug/2.3.7 Python/3.9.6

{
    "result": "2.0568806966515076e+62"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Running tests
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;It is better to be roughly right than precisely wrong.&lt;br&gt;
  — Alan Greenspan&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To load-test this application, I used &lt;a href="https://github.com/rakyll/hey" rel="noopener noreferrer"&gt;hey&lt;/a&gt; to invoke the endpoint with 5 requests per second for 30 minutes using &lt;code&gt;hey -z 30m -q 1 -c 5 $URL/cpu_intensive&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;To be able to compare results, I ran the same application in three containers, each with different hardware constraints:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;CPUs&lt;/th&gt;
&lt;th&gt;Memory (GB)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Container 1&lt;/td&gt;
&lt;td&gt;0.25&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Container 2&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Container 3&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Container 1 (0.25CPU)
&lt;/h3&gt;

&lt;p&gt;As expected, Container 1 performed the worst, averaging 3.13 requests per second. Containers 2 and 3 were both able to serve 4.99 requests per second.&lt;/p&gt;

&lt;p&gt;One of the &lt;a href="https://containersonaws.com/presentations/amazon-ecs-scaling-best-practices/files/slides/21.svg" rel="noopener noreferrer"&gt;graphs&lt;/a&gt; from Nathan's article shows a CPU load peaking and staying at 100% for the duration of the load test. I was able to achieve the same results with container 1 in my test.&lt;/p&gt;

&lt;p&gt;Container 1 clearly on its knees with average CPU utilization at 100% for the duration of the test:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhoyy7xrkf57cjzlrure2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhoyy7xrkf57cjzlrure2.png" alt="A quarter of CPU and half-gig of memory is not enough"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In this graph you can see CPU and memory utilization over time as the load test ramps up. The CPU metric is much higher than the memory metric, and it flattens out around 100%.&lt;/p&gt;

&lt;p&gt;This means that the application ran out of CPU resource first. The workload is primarily CPU bound. This is quite normal, as most workloads run out of CPU before they run out of memory. As the application runs out of CPU, the quality of the service suffers before it actually runs out of memory.&lt;/p&gt;

&lt;p&gt;This tells us one micro optimization we might be able to make, is to &lt;strong&gt;modify the performance envelope to add a bit more CPU&lt;/strong&gt; and a bit less memory. &lt;a href="https://containersonaws.com/presentations/amazon-ecs-scaling-best-practices/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Container 2 (0.5CPU)
&lt;/h3&gt;

&lt;p&gt;Container 2 has the double amount of CPU and delivers the expected performance of 5 requests per second with an average CPU utilization around 90%:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fccc3bke2cvuzhp3yll4q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fccc3bke2cvuzhp3yll4q.png" alt="Half of CPU and one gig of memory is enough"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Container 3 (1CPU)
&lt;/h3&gt;

&lt;p&gt;Doubling the amount of CPU again, container 3 delivers the expected performance with average CPU utilization around 35%:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1eyds1s73ybfksivkfus.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1eyds1s73ybfksivkfus.png" alt="A full CPU core and 2 gig of memory is more than enough"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We could even say that container 3, with 1CPU and 2GB of memory is over provisioned. In dollar amounts, it would cost $41 to run per month. On the other hand, container 2 would cost $20 while delivering the same baseline performance of 5 requests per second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Can a CPU-heavy application perform better with more CPU resources?
&lt;/h2&gt;

&lt;p&gt;As expected, yes. Increasing the amount of CPU from 0.25 to 0.5 allows the application container to deliver the expected performance of 5 requests per second while doing a CPU-heavy calculation.&lt;/p&gt;

&lt;p&gt;Going from 0.5CPU to 1CPU doesn't add any measurable benefit at 5 requests per second, but it would allow the application to respond more quickly and scale to more requests per second.&lt;/p&gt;

&lt;p&gt;Looking at &lt;code&gt;hey&lt;/code&gt;'s output in more detail, we can see that container 3 had response times that are almost &lt;em&gt;3 times faster&lt;/em&gt; that those from container 2.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;CPUs&lt;/th&gt;
&lt;th&gt;Memory (GB)&lt;/th&gt;
&lt;th&gt;Requests/sec&lt;/th&gt;
&lt;th&gt;Avg. response time (sec)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Container 1&lt;/td&gt;
&lt;td&gt;0.25&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;3.1384&lt;/td&gt;
&lt;td&gt;1.5909&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Container 2&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;4.9974&lt;/td&gt;
&lt;td&gt;0.8514&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Container 3&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;2.0&lt;/td&gt;
&lt;td&gt;4.9990&lt;/td&gt;
&lt;td&gt;0.3217&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;The end goal of all this load testing and metric analysis is to define an expected performance envelope that fits your application needs. Ideally it should also provide a little bit of extra space for occasional bursts of activity. &lt;a href="https://containersonaws.com/presentations/amazon-ecs-scaling-best-practices/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Container 2, with 0.5CPU and 1GB of memory provides just that. Vertically scaling a CPU-heavy applications results in increased performance.&lt;/p&gt;




&lt;p&gt;Next up: Let's look at how &lt;a href="https://dev.to/aws-builders/practical-ecs-scaling-vertically-scaling-an-application-with-a-memory-leak-3bel"&gt;vertically scaling an application with a memory leak&lt;/a&gt; goes. ☠️&lt;/p&gt;

</description>
      <category>aws</category>
      <category>containers</category>
      <category>cdk</category>
      <category>scaling</category>
    </item>
    <item>
      <title>Practical ECS scaling: an introduction</title>
      <dc:creator>Ivica Kolenkaš</dc:creator>
      <pubDate>Sun, 17 Dec 2023 15:35:35 +0000</pubDate>
      <link>https://forem.com/aws-builders/practical-ecs-scaling-an-introduction-4f26</link>
      <guid>https://forem.com/aws-builders/practical-ecs-scaling-an-introduction-4f26</guid>
      <description>&lt;p&gt;How effective is it to vertically scale an application that has a memory leak? Can a CPU-heavy application perform better with more CPU resources? Should you horizontally scale your application based on response times?&lt;/p&gt;

&lt;p&gt;To show the answers to these questions in practice, I built and load tested a mock application in order to achieve results that are the same or very similar to those shown in Nathan Peck's great article on &lt;a href="https://containersonaws.com/presentations/amazon-ecs-scaling-best-practices/" rel="noopener noreferrer"&gt;Amazon ECS Scalability Best Practices&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Meet the app
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Falorl6q020694a03cj3w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Falorl6q020694a03cj3w.png" alt="Application architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The application itself is built in Flask and its infrastructure is managed with AWS CDK for Python. The app has several REST API endpoints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/cpu_intensive&lt;/code&gt;, simulating a CPU-intensive task. Calculates the square root of &lt;code&gt;64 * 64 * 64 * 64 * 64 * 64 ** 64&lt;/code&gt; on each request.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/memory_leak&lt;/code&gt;, simulating a &lt;a href="https://en.wikipedia.org/wiki/Memory_leak" rel="noopener noreferrer"&gt;memory leak&lt;/a&gt;. Adds 1MB of memory on each request.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/long_response_time&lt;/code&gt;, simulating increasingly longer responses from a busy downstream system (e.g. a database). Sleeps for 2ms for each request received since the app was started.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All source code is available in &lt;a href="https://github.com/ivica-k/practical-ecs-scaling/tree/main" rel="noopener noreferrer"&gt;this repository&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The performance envelope
&lt;/h2&gt;

&lt;p&gt;Vertically scaling a container means giving the container more hardware resources.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When you vertically scale an application the first step is to identify what resources the application container actually needs in order to function.&lt;/p&gt;

&lt;p&gt;There are different dimensions of resources that an application needs. For example: CPU, memory, storage, and network bandwidth. Some machine learning workloads may actually require GPU as well. &lt;a href="https://containersonaws.com/presentations/amazon-ecs-scaling-best-practices/#vertical-scaling" rel="noopener noreferrer"&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These resources form the &lt;em&gt;performance envelope&lt;/em&gt;; a set of hardware constraints for the container.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftb0u4vhcqkgmgre4irst.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftb0u4vhcqkgmgre4irst.png" alt="Performance envelope"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;The first article in the series deals with an application that exceeds the "CPU" dimension of the performance envelope.&lt;/p&gt;

&lt;p&gt;Have a look at how &lt;a href="https://dev.to/ivicak/practical-ecs-scaling-vertically-scaling-a-cpu-heavy-application-105c"&gt;changing the performance envelope for a CPU-heavy application&lt;/a&gt; affects its performance.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>containers</category>
      <category>cdk</category>
      <category>scaling</category>
    </item>
    <item>
      <title>Manage Airflow connections with Terraform and AWS SecretsManager</title>
      <dc:creator>Ivica Kolenkaš</dc:creator>
      <pubDate>Tue, 15 Aug 2023 17:57:42 +0000</pubDate>
      <link>https://forem.com/aws-builders/manage-airflow-connections-with-terraform-4hof</link>
      <guid>https://forem.com/aws-builders/manage-airflow-connections-with-terraform-4hof</guid>
      <description>&lt;p&gt;Managing infrastructure as code brings speed, consistency and it makes the software development lifecycle more efficient and predictable. Infrastructure for your ETL/orchestration tool is managed with code - why not manage the secrets that your tool uses with code as well?&lt;/p&gt;

&lt;p&gt;This article shows a proof-of-concept implementation of how to manage Airflow secrets through Terraform and keep them committed to a code repository.&lt;/p&gt;

&lt;p&gt;Caveats/assumptions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Users (developers) on the AWS account don't have permissions to retrieve secret values from SecretsManager
&lt;/li&gt;
&lt;li&gt;Terraform is using a remote state with &lt;a href="https://developer.hashicorp.com/terraform/language/state/sensitive-data" rel="noopener noreferrer"&gt;appropriate security measures&lt;/a&gt; in place&lt;/li&gt;
&lt;li&gt;IAM role used by Terraform has relevant permissions to manage a wide range of AWS services and resources&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  TL;DR
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ivica-k/encrypted-airflow-secrets/tree/main" rel="noopener noreferrer"&gt;Repository with example code&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Architecture
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw6ezwrv8xytvh205zhcz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw6ezwrv8xytvh205zhcz.png" alt="Solution architecture diagram"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  How it works
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzolmel5bay3ns8b2wviq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzolmel5bay3ns8b2wviq.png" alt="How it works"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Intro to Airflow Connections
&lt;/h2&gt;

&lt;p&gt;Airflow can connect to various systems, such as databases, SFTP servers or S3 buckets. To connect, it needs credentials. Connections are an Airflow concept to store those credentials. They are a great way to configure access to an external system once and use it multiple times.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/connections.html#" rel="noopener noreferrer"&gt;More on Connections&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A Connection can be expressed as a string; for example, a connection to a MySQL database may look like this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

mysql://username:password@hostname:3306/database


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Airflow understands this format and can use it to connect to the database for which the connection was configured.&lt;/p&gt;

&lt;p&gt;Connections can be configured through environment variables, in an external secrets backend (our use case) and in the internal Airflow database.&lt;/p&gt;

&lt;p&gt;A centralized way of managing connections becomes a necessity as soon as your proof-of-concept goes live or you are working in a team.&lt;/p&gt;

&lt;h2&gt;
  
  
  External secrets backends
&lt;/h2&gt;

&lt;p&gt;Airflow supports multiple &lt;a href="https://airflow.apache.org/docs/apache-airflow-providers/core-extensions/secrets-backends.html" rel="noopener noreferrer"&gt;external secrets backends&lt;/a&gt;, such as AWS SecretsManager, Azure KeyVault and Hashicorp Vault.&lt;/p&gt;

&lt;p&gt;Connection details are read from these backends when a connection is used. This keeps the sensitive part of the connection, such as a password, secure and minimizes the attack surface.&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS SecretsManager backend
&lt;/h3&gt;

&lt;p&gt;Configuring your Airflow deployment to use AWS SecretsManager is well explained on &lt;a href="https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/secrets/secrets_manager/index.html#airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend" rel="noopener noreferrer"&gt;this page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Creating AWS SecretsManager secrets with Terraform is done in a simple way:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

resource "aws_secretsmanager_secret" "secret" {
  name = "my-precious"
}

resource "aws_secretsmanager_secret_version" "string" {
  secret_id     = aws_secretsmanager_secret.secret.id
  secret_string = "your secret here"
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;but committing this to a code repository is a cardinal sin! &lt;/p&gt;

&lt;p&gt;So how do you manage Airflow Connections in such a way that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sensitive part of a connection is hidden&lt;/li&gt;
&lt;li&gt;users can manage connections through code and commit them to a repository&lt;/li&gt;
&lt;li&gt;Airflow can use these connections when running DAGs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read on!&lt;/p&gt;

&lt;h2&gt;
  
  
  Encryption (is the) key
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Encryption is a way to conceal information by altering it so that it appears to be random data. &lt;a href="https://www.cloudflare.com/en-gb/learning/ssl/what-is-encryption/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AWS Key Management Service (KMS) allows us to create and manage encryption keys. These keys can be used to encrypt the contents of many AWS resources (buckets, disks, clusters...) but they can also be used to &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/kms/encrypt.html#examples" rel="noopener noreferrer"&gt;encrypt and decrypt&lt;/a&gt; user-provided strings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnimulmt7hp7vurqqkseb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnimulmt7hp7vurqqkseb.png" alt="User encrypts a string"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Your users (developers) need a developer-friendly way of encrypting strings without having access to the KMS key. Developers love APIs. Keep your developers happy and give them an API.&lt;/p&gt;

&lt;p&gt;In this case, we have an API built with &lt;a href="https://docs.powertools.aws.dev/lambda/python/latest/" rel="noopener noreferrer"&gt;Powertools for AWS Lambda (Python)&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-urls.html" rel="noopener noreferrer"&gt;Lambda Function URLs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A custom &lt;a href="https://github.com/ivica-k/encrypted-airflow-secrets/blob/main/functions/kms_encrypt/lambda_function.py" rel="noopener noreferrer"&gt;Lambda function&lt;/a&gt; can be used to encrypt or generate and encrypt random strings. This covers two use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Administrator of an external system has created credentials for us and we are now using them to create an Airflow connection&lt;/li&gt;
&lt;li&gt;We are creating credentials to a system we manage and will use those credentials to create an Airflow connection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Give a string to this Lambda&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

POST https://xxxxxx.lambda-url.eu-central-1.on.aws/encrypt
Accept: application/json

{
"encrypt_this": "mysql://username:password@hostname:3306/database"
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;and it returns something like this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

AQICAHjTAGlNShkkcAYzHludk...IhvcNAQcGoIGOMIGLAgEAMIGFBg/AluidQ==


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Completely unreadable to you and me and safe to commit to a repository. (If you recognized it, yes, it is &lt;code&gt;base64&lt;/code&gt; encoded. &lt;a href="https://www.base64decode.org/" rel="noopener noreferrer"&gt;Try decoding it&lt;/a&gt;; even less readable!)&lt;/p&gt;

&lt;h2&gt;
  
  
  Create secrets with Terraform
&lt;/h2&gt;

&lt;p&gt;That unreadable "sausage" from before can be used with Terraform, given that it has the permission to decrypt using the key that encrypted the original string. &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

data "aws_kms_secrets" "secret" {
  secret {
    name    = "secret"
    payload = "AQICAHjTAGlNShkkcAYzHludk...IhvcNAQcGoIGOMIGLAgEAMIGFBg/AluidQ=="
  }
}

resource "aws_secretsmanager_secret" "secret" {
  name = var.name
}

resource "aws_secretsmanager_secret_version" "string" {
  secret_id     = aws_secretsmanager_secret.secret.id
  secret_string = data.aws_kms_secrets.secret.plaintext["secret"]
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Code above will happily decrypt the encrypted string using a KMS key from your AWS account and store the decrypted value in SecretsManager. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Warning:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It &lt;strong&gt;will store the secret&lt;/strong&gt; in the Terraform state - take the &lt;a href="https://developer.hashicorp.com/terraform/language/state/sensitive-data" rel="noopener noreferrer"&gt;necessary precautions&lt;/a&gt; to secure it. &lt;strong&gt;Anyone with access&lt;/strong&gt; to the KMS key that encrypted the string &lt;strong&gt;can decrypt it&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Keep your Terraform state secure and your KMS keys secure-er(?).&lt;/p&gt;

&lt;h2&gt;
  
  
  Using it in practice
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Encrypt a MySQL connection string that Airflow will use:
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

http -b POST https://xxxxxx.lambda-url.eu-central-1.on.aws/encrypt encrypt_this='mysql://mysql_user:nb_6qaAI8CmkoI-FKxuK@hostname:3306/mysqldb'
{
    "encrypted_value": "AQICAHjTAGlNShkkcAYzHl8C2qXs7f...zaxroJDDw==",
    "message": "Encrypted a user-provided value."
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h4&gt;
  
  
  Use the &lt;code&gt;"encrypted_value"&lt;/code&gt; value with a Terraform module to create a secret
&lt;/h4&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

module "db_conn" {
  source = "./modules/airflow_secret"

  name             = "airflow/connections/db"
  encrypted_string = "AQICAHjTAGlNShkkcAYzHl8C2qXs7f...zaxroJDDw=="
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;after which you get a nice AWS SecretsManager secret.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67kdz6vvp7m83moj63sp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67kdz6vvp7m83moj63sp.png" alt="AWS SecretsManager secret"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Warning:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Anyone with the &lt;code&gt;secretsmanager:GetSecretValue&lt;/code&gt; permission will be able to read the secret. Keep access to your AWS SecretsManager secrets secured.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Configure Airflow to use the AWS SecretsManager backend
&lt;/h2&gt;

&lt;p&gt;One of the great features of Airflow is the possibility to set (and override) configuration parameters &lt;a href="https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html" rel="noopener noreferrer"&gt;through environment variables&lt;/a&gt;. We can leverage this to configure MWAA so that it uses a different secrets backend:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

resource "aws_mwaa_environment" "this" {
  airflow_configuration_options = {
    "secrets.backend"               = "airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend",
    "secrets.backend_kwargs"        = "{\"connections_prefix\": \"airflow/connections\",\"variables_prefix\": \"airflow/variables\"}"
  }
... rest of the config...
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;With the secret in SecretsManager and Airflow configured to use SecretsManager as the backend, we can finally use the secret in a default way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ivica-k/encrypted-airflow-secrets/blob/main/infra/dags/example_dag.py" rel="noopener noreferrer"&gt;Example DAG&lt;/a&gt; shows that fetching the secret works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27310rqq45buipptlxil.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27310rqq45buipptlxil.png" alt="Airflow logs showing the secret value"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;With this proof-of-concept solution we were able to achieve the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sensitive part of a connection is hidden&lt;/li&gt;
&lt;li&gt;users can manage connections through code and commit them to a repository&lt;/li&gt;
&lt;li&gt;Airflow can use these connections when running DAGs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One obvious downside of having encrypted strings in Git is that you can't understand what actually changed:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

diff --git a/infra/secrets.tf b/infra/secrets.tf
index fe53e5f..7e85d90 100644
--- a/infra/secrets.tf
+++ b/infra/secrets.tf
@@ -9,5 +9,5 @@ module "db_conn" {
   source = "./modules/airflow_secret"

   name             = "airflow/connections/db"
-  encrypted_string = "AQICAHjTAGlNShkkcAYzHl8C2qXs7fs5x9gByXim/PPuwt+TuwH8pYZHik8Cx0AZDM+ECML8AAAAnzCBnAYJ...XwF2a8zaxroJDDw=="
+  encrypted_string = "AQICAHjTAGlNShkkcAYzHl8C2qXs7fs5x9gByXim/PPuwt+TuwGhmhBNcePnQmhjrTgozm6rAAAAnTCmgYJK...nLU8TVWkLDUsSfDs="
 }


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This encrypted approach also doesn't work well for connections that have no secrets in them, for example AWS connections that use IAM roles:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

aws://?aws_iam_role=my_role_name&amp;amp;region_name=eu-west-1&amp;amp;aws_account_id=123456789123


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;p&gt;If you would like to improve the cost-effectiveness of your MWAA setup, give &lt;a href="https://medium.com/apache-airflow/setting-up-aws-secrets-backends-with-airflow-in-a-cost-effective-way-dac2d2c43f13" rel="noopener noreferrer"&gt;this article&lt;/a&gt; by Vince Beck a read.&lt;/p&gt;

&lt;p&gt;Nice things I said about MWAA in &lt;a href="https://medium.com/aws-in-plain-english/my-thoughts-on-aws-managed-workflows-for-apache-airflow-10df7dd6fb4d" rel="noopener noreferrer"&gt;my thoughts on MWAA&lt;/a&gt; 18 months ago are still valid to this day, and the speed of development on the &lt;a href="https://github.com/aws/aws-mwaa-local-runner" rel="noopener noreferrer"&gt;aws-mwaa-local-runner&lt;/a&gt; has increased.&lt;/p&gt;

&lt;p&gt;Until next time, keep those ETLs ET-elling!&lt;/p&gt;

</description>
      <category>airflow</category>
      <category>terraform</category>
      <category>infrastructureascode</category>
      <category>encryption</category>
    </item>
    <item>
      <title>How (not) to spend $15000 per month with AWS Athena</title>
      <dc:creator>Ivica Kolenkaš</dc:creator>
      <pubDate>Mon, 01 May 2023 15:07:29 +0000</pubDate>
      <link>https://forem.com/aws-builders/how-not-to-spend-15000-per-month-with-aws-athena-1fn8</link>
      <guid>https://forem.com/aws-builders/how-not-to-spend-15000-per-month-with-aws-athena-1fn8</guid>
      <description>&lt;p&gt;In the 1980s, traffic in downtown Boston was nearly unbearable so city planners came up with a plan to reroute the highway tunnel below downtown Boston. The project was nicknamed "the Big Dig". Construction started in 1991 and when it finished in 2007, the final price tag was around 15 billion dollars, about twice the original cost that was expected. &lt;a href="https://www.youtube.com/watch?v=dOe_6vuaR_s" rel="noopener noreferrer"&gt;Source&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This is a story about a software project that ran over budget due to organizational mishaps, no oversight and no awareness of cost.&lt;/p&gt;

&lt;p&gt;Spoiler alert: &lt;em&gt;No actual query analysis is shown in the article. I am not a SQL expert; more of an observer with a passion for writing and with 20/20 hindsight.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Before we dive deeper into the money bleeding monster, here's a bit of background.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost of a project
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Total cost of ownership (TCO) is the total cost of owning software over its entire lifecycle, including the initial building price and ongoing charges, such as maintenance, human capital investments, resource allocation, and opportunity costs. &lt;a href="https://www.okteto.com/blog/total-cost-of-ownership-tco-of-building-versus-buying-software-for-development/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A software product can make financial sense if the value it provides to the organization is greater than the TCO over its lifespan.&lt;/p&gt;

&lt;p&gt;The cost of this particular software product, whose main part was the Athena query, was around $15000 per month in its operational phase (excluding the cost of making it). At one point, it was more expensive than ~40 RDS databases. Its value to the business is hard to determine because it provided data for dashboards.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk2cr9tbhzvvv0mnme3gw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk2cr9tbhzvvv0mnme3gw.png" alt="Cost breakdown per service for October 2022"&gt;&lt;/a&gt;&lt;br&gt;Cost breakdown per service for October 2022
  &lt;/p&gt;

&lt;h3&gt;
  
  
  The organization
&lt;/h3&gt;

&lt;p&gt;The organization and its people built this product during a difficult period, with COVID, financial insecurity and layoffs looming over everyone and everything. I'm sure they did their best under the circumstances and this article is in no way meant to criticize.&lt;/p&gt;

&lt;p&gt;Most software development in this organization is done with 2-pizza teams working in 2-week sprints to deliver peer-reviewed code that runs in the cloud configured by infrastructure-as-code. Decent operational monitoring and alerting exists and it works.&lt;/p&gt;

&lt;p&gt;Our money bleeding monster was an outlier; it was built by one developer. Read on to understand how serverless can be expensive.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Cost awareness, or lack of it
&lt;/h2&gt;

&lt;p&gt;From the initial idea about this product, over its design and creation, all the way through delivery no one sat down to calculate how much it would cost. Architecture, design and development was done by a single developer to fulfill a business need. But it is not the fault of that developer that the TCO calculation was not done.&lt;/p&gt;

&lt;p&gt;Business stakeholders and product owners usually think about TCO and whether a software product makes financial sense. A developer could have calculated, with a fair amount of certainty, the operational costs of the product. AWS Athena pricing is dead simple: $5 per terrabyte of data scanned. Fourth grade math at best.&lt;/p&gt;

&lt;p&gt;So what failed then? There was no one to ask the question: "How much will this query cost?"&lt;/p&gt;

&lt;h2&gt;
  
  
  2. No systematic cost reporting
&lt;/h2&gt;

&lt;p&gt;AWS infrastructure used by the organization is managed through code and it is uniformly tagged. Some of those tags include a project name, the environment where the project is running (DEV, PROD) and owner/cost center.&lt;/p&gt;

&lt;p&gt;It would be trivially easy to create a monthly cost report using AWS Cost Explorer or any similar tool. The report could break down cost per project, environment and cost center and it could be sent to owners for review.&lt;/p&gt;

&lt;p&gt;So what failed then? There is always something more pressing to work on. No one cares about these cost reports so the platform team never prioritized them.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Managed service misuse
&lt;/h2&gt;

&lt;p&gt;AWS Athena is a serverless analytics service with capability to query structured data from AWS S3 and other data sources using SQL-like syntax.&lt;/p&gt;

&lt;p&gt;In our case, data was uploaded to an AWS S3 bucket from Kafka. Kafka is the backbone for all microservices and a large chunk of business data flows through it. All that data ends up in an S3 bucket and is partitioned to support &lt;code&gt;WHERE&lt;/code&gt; clauses in queries. Having a &lt;code&gt;WHERE&lt;/code&gt; clause tells Athena to scan only the data partition that matches it. Partitioning schema is based on &lt;code&gt;YEAR/MONTH/DAY&lt;/code&gt; pattern, so an example query can look like:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

SELECT style_id FROM schema.SALES
WHERE month=11 AND day=20 AND color=blue


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This would return all the style IDs of all blue clothing items that were sold on 20th of November. So far so good.&lt;/p&gt;

&lt;p&gt;The real query did not use any &lt;code&gt;WHERE&lt;/code&gt; clauses. This might be acceptable in the DEV environment and personally I'd use &lt;code&gt;LIMIT 10&lt;/code&gt;. Not acceptable in the PROD environment, especially when the volume of data will only ever go up. The ever-growing amount of data in PROD meant that every time the query ran:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it scanned more and more data&lt;/li&gt;
&lt;li&gt;it scanned all the data; even data it does not need&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr03n5gtxuzuddd5mtx9z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr03n5gtxuzuddd5mtx9z.png" alt="Cost report for AWS Athena from June to November 2022"&gt;&lt;/a&gt;&lt;br&gt;Cost report for AWS Athena from June to November 2022
  &lt;/p&gt;

&lt;p&gt;Athena is truly serverless and it will happily scan, scale and charge you for what it scanned. "Scan everything? On it boss!". "Scan some more? Don't mind if I do".&lt;/p&gt;

&lt;p&gt;Using a &lt;code&gt;WHERE&lt;/code&gt; clause can drastically reduce the amount of data scanned which directly correlates to incurred costs. It will also shorten the query execution time.&lt;/p&gt;

&lt;p&gt;So what failed then? There was no one to ask the question: "Can this query be improved?".&lt;/p&gt;

&lt;h2&gt;
  
  
  4. One person "team"
&lt;/h2&gt;

&lt;p&gt;Architecture, design and development on this product was done by a single developer. He had no help, no one to discuss ideas with, no one to review his code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhpoa0tagjwghvdhtfenk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhpoa0tagjwghvdhtfenk.png" alt="Commits made and merged by the same person"&gt;&lt;/a&gt;&lt;br&gt;Commits made and merged by the same person
  &lt;/p&gt;

&lt;p&gt;Every developer has the responsibility to write good software. Every human being has the right to make mistakes.&lt;/p&gt;

&lt;p&gt;So what failed then? The organization should not have allowed this. The developer was not set up for success from day one. One person is not a team.&lt;/p&gt;




&lt;h2&gt;
  
  
  Learnings
&lt;/h2&gt;

&lt;p&gt;I hope it goes without saying but: do the opposite of what the points above illustrate. Also, here are some learnings we acquired over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do the math
&lt;/h3&gt;

&lt;p&gt;Operations phase is an important phase of every product/service. That is when they generate value for the organization. This phase is hopefully also the longest. Because of these two facts, understanding operational costs of the product is very important. Calculate them early.&lt;/p&gt;

&lt;h3&gt;
  
  
  Own it
&lt;/h3&gt;

&lt;p&gt;And I really mean OWN IT. All of it. Product's lifecycle is in its infancy when you ship the code - it doesn't end there. All the logs, metrics, alerts, bills etc. that the product creates must be owned by someone.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inform and be informed
&lt;/h3&gt;

&lt;p&gt;If I tell you that on average, an ice cream costs $0.40 where I live, that's data. But if I add that the ice cream truck passes my house every day, that's information (and temptation!) that you can use to buy cheap ice cream every day.&lt;/p&gt;

&lt;p&gt;1TB of data scanned with Athena costs $5 and that's a fact. Knowing that your query will scan close to 100TB each time it runs is valuable information. Be informed and inform business stakeholders too.&lt;/p&gt;

&lt;h3&gt;
  
  
  Friends or foes
&lt;/h3&gt;

&lt;p&gt;Managed services can be great friends. With all the heavy lifting they do and a tendency to &lt;a href="https://aws.amazon.com/blogs/aws/category/price-reduction/" rel="noopener noreferrer"&gt;reduce prices over time&lt;/a&gt;, one would be hard-pressed to not use them.&lt;/p&gt;

&lt;p&gt;You can make sure they stay your friends by owning your product and creating appropriate billing alerts. Even hardcore serverless teams experience runaway costs - but they catch them early!&lt;/p&gt;

&lt;p&gt;&lt;iframe class="tweet-embed" id="tweet-1635244161778737152-395" src="https://platform.twitter.com/embed/Tweet.html?id=1635244161778737152"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-1635244161778737152-395');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1635244161778737152&amp;amp;theme=dark"
  }



 &lt;/p&gt;




&lt;p&gt;In case you're wondering, the query was improved and it now costs a fraction of what it cost before. And if you're thinking that the money that was wasted on a poor query  could have been used to hire more people, I agree with you. We humans learn and evolve our whole lives, but so do organizations.&lt;/p&gt;

&lt;p&gt;Learning from our own mistakes is the most difficult but those lessons tend to stick the longest. I can only hope that this organization has enough knowledge management capacity to prevent scenarios like this one in the future. Otherwise, the myth that serverless is expensive will live on. High cost comes from sloppy code, bad development practices and processes in the organization that allow those to happen.&lt;/p&gt;

</description>
      <category>awsathena</category>
      <category>query</category>
      <category>costoptimization</category>
      <category>finops</category>
    </item>
    <item>
      <title>How working with AWS open-source tools made me a better developer</title>
      <dc:creator>Ivica Kolenkaš</dc:creator>
      <pubDate>Mon, 10 Apr 2023 14:14:48 +0000</pubDate>
      <link>https://forem.com/aws-builders/how-working-with-aws-open-source-tools-made-me-a-better-developer-343c</link>
      <guid>https://forem.com/aws-builders/how-working-with-aws-open-source-tools-made-me-a-better-developer-343c</guid>
      <description>&lt;h2&gt;
  
  
  In the beginning
&lt;/h2&gt;

&lt;p&gt;In the beginning God created the heaven and the earth. The earth began to cool, the autotrophs began to drool, Neanderthals developed tools and to-make-a-long-story-short I started learning Python. Much like our early ancestors &lt;a href="https://www.goodreads.com/book/show/11148989-catching-fire"&gt;benefited from using fire to prepare food&lt;/a&gt;, I benefited from using (and sporadically contributing to) AWS open-source tools.&lt;/p&gt;

&lt;p&gt;This is a story of how I went from a self taught Python developer with profound dislike of type hints to tolerating and even using them. All it took to convince me is a multi-billion dollar company with tens of thousands of developers.&lt;/p&gt;

&lt;h2&gt;
  
  
  I am not stubborn and I'm not a duck
&lt;/h2&gt;

&lt;p&gt;Contrary to what the intro paragraph led you to believe, I am not a stubborn person. I'd say I'm opinionated, and my opinion about using data types was formed in my teen years. I was a rebel without a pause. It was the time of Python 2.6, which of course has multiple data types, but they are dynamic. This means that you don't have to declare a type for a variable -- the type is inferred by the interpreter at runtime.&lt;/p&gt;

&lt;p&gt;You could just re-declare a variable with a different type (&lt;code&gt;int&lt;/code&gt; to a &lt;code&gt;float&lt;/code&gt;) and that was fine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12000&lt;/span&gt;
&lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;12000.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I was amazed. It was &lt;em&gt;so&lt;/em&gt; easy. No &lt;code&gt;static final void warranty&amp;lt;List&amp;gt;&lt;/code&gt; that college professors tried to teach me. You just sit down (sitting is optional), type Python and watch magic happen in front of you. An ocassional &lt;code&gt;TypeError&lt;/code&gt; wasn't gonna stop me. Trying to iterate over an integer? Been there, done that. Neither of those convinced me to spend some time and learn about type hints in Python.&lt;/p&gt;

&lt;p&gt;&lt;iframe class="tweet-embed" id="tweet-1638201600995934210-342" src="https://platform.twitter.com/embed/Tweet.html?id=1638201600995934210"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-1638201600995934210-342');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1638201600995934210&amp;amp;theme=dark"
  }



&lt;/p&gt;

&lt;p&gt;I won't blame you if by reading this you assume I'm a bad developer. I never said I was good :)&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't believe the hype
&lt;/h2&gt;

&lt;p&gt;Python and me grew older together and with Python version 3.5 came &lt;a href="https://peps.python.org/pep-0484/"&gt;type hints&lt;/a&gt;. As their name suggests, they are just hints; Python is still a dynamically typed language. They are not evaluated and they don't affect your application at runtime.&lt;/p&gt;

&lt;p&gt;Those hints do help though; modern IDEs understand them and suggest better auto-complete results. They also warn when incorrect types are used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I was aware of these Python advancements but I wasn't aboard the hype train. Who is this Guido person and why is he bringing the noise? Python's syntax was so clean before type hints. My colleagues, who were wiser than me, did get on the hype train and soon I was looking at code that resembled this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salaries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AverageSalary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# perform calculations
&lt;/span&gt;
&lt;span class="n"&gt;salaries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;12000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;13000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt;
&lt;span class="n"&gt;avg_salary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AverageSalary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;calculate_avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salaries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It was &lt;em&gt;awful&lt;/em&gt;. It was noisy, it was unreadable, it was sacrilegious.&lt;/p&gt;

&lt;h2&gt;
  
  
  The big melt
&lt;/h2&gt;

&lt;p&gt;Some time ago I was introduced to &lt;a href="https://docs.aws.amazon.com/cdk/v2/guide/work-with-cdk-python.html"&gt;AWS CDK for Python&lt;/a&gt; and it was a breath of fresh air after years of Terraform (I still &amp;lt;3 Terraform, don't get me wrong). Then I started noticing that CDK code is strongly typed in all its language variants, and for very good reasons. I liked CDK but did I like it enough?&lt;/p&gt;

&lt;p&gt;I gulped, bit the bullet and added types where I had to. I also had a brief adventure with CDK for Typescript and...and... I started &lt;strong&gt;liking&lt;/strong&gt; strict typing. It made the code strict, and well defined, and you knew what to expect as a return value. It all made &lt;strong&gt;sense&lt;/strong&gt;! Not to mention how it &lt;a href="https://instagram-engineering.com/static-analysis-at-scale-an-instagram-story-8f498ab71a0c"&gt;improves collaboration, makes large code-bases more maintainable and prevents silly bugs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Peasants put down their pitchforks because the &lt;a href="https://artpassions.net/wilde/selfish_giant.html"&gt;giant's heart has melted&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  From "I hate this" to...
&lt;/h2&gt;

&lt;p&gt;I truly disliked type hints. Working with CDK started to change that for the better and then I discovered &lt;a href="https://awslabs.github.io/aws-lambda-powertools-python/"&gt;AWS Lambda Powertools&lt;/a&gt;. It combines multiple things I am passionate about so I became passionate about using the library and improving it. Lambda Powertools utilizes static typing for several of its main features (&lt;a href="https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/parser/"&gt;Parser&lt;/a&gt;, &lt;a href="https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/data_classes/"&gt;Event Sources&lt;/a&gt; and &lt;a href="https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/typing/"&gt;Typing&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;I was keen to contribute to this open-source project which meant I had to jump into static typing and contribute code that will blend in and be accepted. Using static types was so natural, it made so much sense, especially in a codebase that is unknown to me. Instead of being noise, those type hints were real &lt;strong&gt;hints&lt;/strong&gt; that helped me understand a codebase I was unfamiliar with.&lt;/p&gt;

&lt;p&gt;My stomach now stays perfectly calm while I'm reading or writing code like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;log_level&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Union&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using CDK and contributing to Lambda Powertools made me turn a 180 degrees. I went from "I hate this..." to "type hints are great!". Using them made me understand why type hints and static typing are necessary and helpful. I even use them sporadically in my &lt;a href="https://github.com/ivica-k"&gt;pet projects&lt;/a&gt; while trying to become a better developer.&lt;/p&gt;




&lt;p&gt;Our ancestors cooked food which allowed for their digestive tracts to become smaller, leaving more energy for brain growth. Something similar happened to me; using type hints allowed me to understand unfamiliar codebases and improve my coding skills. My humble contributions are a way of paying back to the open-source community and thanking contributors of &lt;a href="https://awslabs.github.io/aws-lambda-powertools-python/"&gt;Lambda Powertools for Python&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A good way to end this article is with another Public Enemy reference: It took a &lt;a href="https://en.wikipedia.org/wiki/It_Takes_a_Nation_of_Millions_to_Hold_Us_Back"&gt;multi-billion dollar company with tens of thousands of developers&lt;/a&gt; just a few months to convert this non-stubborn dynamic-typer into a typing hints aficionado.&lt;/p&gt;

&lt;p&gt;No statically typed languages were hurt during the writing of this article.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>programming</category>
      <category>python</category>
      <category>cdk</category>
    </item>
  </channel>
</rss>
