<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: FreshBooks</title>
    <description>The latest articles on Forem by FreshBooks (@freshbooks).</description>
    <link>https://forem.com/freshbooks</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F4471%2F225edf10-3473-468c-ae57-04f5e9f4a313.png</url>
      <title>Forem: FreshBooks</title>
      <link>https://forem.com/freshbooks</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/freshbooks"/>
    <language>en</language>
    <item>
      <title>Manipulating Complex Structures in BigQuery: A Guide to DDL Operations</title>
      <dc:creator>iamtodor</dc:creator>
      <pubDate>Fri, 26 May 2023 07:41:17 +0000</pubDate>
      <link>https://forem.com/freshbooks/manipulating-complex-structures-in-bigquery-a-guide-to-ddl-operations-dhm</link>
      <guid>https://forem.com/freshbooks/manipulating-complex-structures-in-bigquery-a-guide-to-ddl-operations-dhm</guid>
      <description>&lt;p&gt;This guide aims to provide a comprehensive understanding of handling changes in complex structures within BigQuery using Data Definition Language (DDL) statements. It explores scenarios involving top-level columns as well as nested columns, addressing limitations with the existing &lt;code&gt;on_schema_change&lt;/code&gt; configuration in dbt for BigQuery.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Introduction&lt;/li&gt;
&lt;li&gt;
How-to

&lt;ul&gt;
&lt;li&gt;Define schema&lt;/li&gt;
&lt;li&gt;Add records&lt;/li&gt;
&lt;li&gt;Add top-level field&lt;/li&gt;
&lt;li&gt;
Change top-level field type

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;CAST&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ALTER COLUMN&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Juggle with STRUCT

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tmp table&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;update STRUCT using SET&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CREATE OR REPLACE TABLE&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

Bonus notes

&lt;ul&gt;
&lt;li&gt;Create a regular table&lt;/li&gt;
&lt;li&gt;Create an external table&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Contact info&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Currently, dbt's &lt;code&gt;on_schema_change&lt;/code&gt; configuration only tracks schema changes related to top-level columns in BigQuery. Nested column changes, such as adding, removing, or modifying a &lt;code&gt;STRUCT&lt;/code&gt;, are not captured. This guide delves into extending the functionality of &lt;code&gt;on_schema_change&lt;/code&gt; to encompass nested columns, enabling a more comprehensive schema change tracking mechanism. What exactly are &lt;code&gt;top-level&lt;/code&gt; as well as &lt;code&gt;nested&lt;/code&gt; ones I'm going to show further.&lt;/p&gt;

&lt;p&gt;Moreover, it's important to note that BigQuery explicitly states on their &lt;a href="https://cloud.google.com/bigquery/docs/managing-table%20schemas#add_a_nested_column_to_a_record_column" rel="noopener noreferrer"&gt;Add a nested column to a RECORD column page&lt;/a&gt; that adding a new nested field to an existing RECORD column using a SQL DDL statement is not supported:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Adding a new nested field to an existing RECORD column by using a SQL DDL statement is not supported.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When it comes to drops one or more columns, from &lt;a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#alter_table_drop_column_statement" rel="noopener noreferrer"&gt;ALTER TABLE DROP COLUMN statement&lt;/a&gt;&lt;br&gt;
page:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You cannot use this statement to drop the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Partitioned columns&lt;/li&gt;
&lt;li&gt;Clustered columns&lt;/li&gt;
&lt;li&gt;Nested columns inside existing RECORD fields&lt;/li&gt;
&lt;li&gt;Columns in a table that has row access policies&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the ongoing proposal with the discussion around&lt;br&gt;
&lt;a href="(https://github.com/dbt-labs/dbt-bigquery/issues/446)"&gt;on_schema_change should handle non-top-level schema changes&lt;/a&gt; topic.&lt;/p&gt;

&lt;p&gt;This limitation further highlights the need for alternative approaches to manipulate complex structures.&lt;/p&gt;

&lt;h2&gt;
  
  
  How-to
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Define schema
&lt;/h3&gt;

&lt;p&gt;To begin, let's dive into the SQL syntax and create the "person" table. This table will store information about individuals, including their ID, name, and address.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;INT64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;address&lt;/span&gt; &lt;span class="n"&gt;STRUCT&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;
        &lt;span class="n"&gt;country&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt; 
    &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Add records
&lt;/h3&gt;

&lt;p&gt;Add a couple of records to the table.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;STRUCT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"USA"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"New-York"&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"Jennifer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;STRUCT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"Canada"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"Toronto"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;How schema in UI looks like&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6stdl5uw34qnnj1bg0bt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6stdl5uw34qnnj1bg0bt.png" alt="table schema"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How data is represented while querying it with&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzzu6olu1voyc8u6vgpzn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzzu6olu1voyc8u6vgpzn.png" alt="data representation"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Add top-level field
&lt;/h3&gt;

&lt;p&gt;Imagine we were tasked to add a new field &lt;code&gt;has_car&lt;/code&gt;, that has an &lt;code&gt;INT64&lt;/code&gt; type.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;has_car&lt;/span&gt; &lt;span class="n"&gt;INT64&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- add record right away&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;has_car&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"James"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;STRUCT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"USA"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"New-York"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;When you add a new column to an existing BigQuery table, the past records will have null values for that newly added column. This behavior is expected because the new column was not present at the time those records were inserted.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zb0gso60q6abs3m0k57.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zb0gso60q6abs3m0k57.png" alt="null value for past records"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Change top-level field type
&lt;/h3&gt;

&lt;p&gt;Then your customer changes their mind and now the &lt;code&gt;has_car&lt;/code&gt; column has to have a &lt;code&gt;BOOL&lt;/code&gt; type instead of &lt;code&gt;INT64&lt;/code&gt;. Here are 2 possible ways to tackle this task.&lt;/p&gt;

&lt;p&gt;Before diving deep into the possible approaches, worth to mention, that BigQuery has &lt;a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/conversion_rules" rel="noopener noreferrer"&gt;Conversion rules&lt;/a&gt;, that you need to consider. For&lt;br&gt;
instance, you can cast &lt;code&gt;BOOL&lt;/code&gt; to &lt;code&gt;INT64&lt;/code&gt;, but you cannot cast &lt;code&gt;INT64&lt;/code&gt; to &lt;code&gt;DATETIME&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In BigQuery, &lt;code&gt;CAST&lt;/code&gt; and &lt;code&gt;ALTER COLUMN&lt;/code&gt; are two different approaches for modifying the data type of a column in a table.&lt;br&gt;
Let's explore each approach:&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;CAST&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;CAST()&lt;/code&gt; function is used to convert the data type of a column or an expression in a SQL query. It allows you to convert a column from one data type to another during the query execution. However, it does not permanently modify the data type of the column in the table's schema.&lt;/p&gt;

&lt;p&gt;The following is an example of using &lt;code&gt;CAST&lt;/code&gt; to convert a column's data type in a query:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;has_car&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;BOOL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;has_car&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h4&gt;
  
  
  &lt;code&gt;ALTER COLUMN&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;ALTER COLUMN&lt;/code&gt; statement is used to modify the data type of a column in the table's schema. It allows you to permanently change the data type of a column in the table, affecting all existing and future data in that column.&lt;/p&gt;

&lt;p&gt;Here's an example of using &lt;code&gt;ALTER COLUMN&lt;/code&gt; to modify the data type of a column:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt;
    &lt;span class="n"&gt;has_car&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt;
    &lt;span class="k"&gt;DATA&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="nb"&gt;BOOL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;It's important to note that &lt;code&gt;ALTER COLUMN&lt;/code&gt; is a DDL statement and can only be executed as a separate operation outside of a regular SQL query. Once the column's data type is altered, it will affect all future operations and queries performed on that table.&lt;/p&gt;

&lt;p&gt;In summary, &lt;code&gt;CAST&lt;/code&gt; is used to convert the data type of a column during query execution, while &lt;code&gt;ALTER COLUMN&lt;/code&gt; is used to permanently modify the data type of a column in the table's schema. The choice between the two depends on whether you want to temporarily convert the data type for a specific query or permanently change the data type for the column in the table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Juggle with STRUCT
&lt;/h3&gt;

&lt;p&gt;If we want to apply changes to nested fields, such as adding, removing, or modifying &lt;code&gt;STRUCT&lt;/code&gt; itself there are few different ways to do so.&lt;/p&gt;

&lt;h4&gt;
  
  
  temp table
&lt;/h4&gt;

&lt;p&gt;First, quite simple is using the &lt;code&gt;temp&lt;/code&gt; table.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person_tmp&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;INT64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;has_car&lt;/span&gt; &lt;span class="n"&gt;INT64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;address&lt;/span&gt; &lt;span class="n"&gt;STRUCT&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;
        &lt;span class="n"&gt;country&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;zip_code&lt;/span&gt; &lt;span class="n"&gt;INT64&lt;/span&gt;
    &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- fill then new zip_code field with the default 0 value&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person_tmp&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;has_car&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt;
            &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;STRUCT&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;zip_code&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="nv"&gt;`dataset_name.person`&lt;/span&gt; &lt;span class="k"&gt;RENAME&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="nv"&gt;`person_past`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="nv"&gt;`dataset_name.person_tmp`&lt;/span&gt; &lt;span class="k"&gt;RENAME&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="nv"&gt;`person`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person_tmp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;However, this approach has some drawbacks and considerations to keep in mind: when modifying a BigQuery table using a temporary table, you need to create a new table with the desired modifications and then copy the data from the original table to the temporary table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Costs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As this process involves duplicating the data. It will increase storage usage, leading to additional storage costs as well as it consumes additional query processing resources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It may impact performance, especially for large tables as you have a limited amount of production resources that are shared.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity and consistency&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Using a temporary table to modify a BigQuery table introduces additional steps and complexity to the process. You need to write queries to create the temporary table, copy data, modify the data, overwrite the original table, and then drop the temporary table. This adds complexity to the overall workflow and may require more code and query execution time.&lt;/p&gt;

&lt;p&gt;Last, but not least, during the modification process, there might be a period where the original table is not accessible or is in an inconsistent state. If other processes or applications depend on the original table's data, this downtime or inconsistency could impact their operations.&lt;/p&gt;

&lt;p&gt;So this is not the very best way.&lt;/p&gt;

&lt;h4&gt;
  
  
  update STRUCT using SET
&lt;/h4&gt;

&lt;p&gt;Another scenario is to change the nested field type. Imagine we would like to update the &lt;code&gt;zip_code&lt;/code&gt; type from &lt;code&gt;STRING&lt;/code&gt; to &lt;code&gt;INT64&lt;/code&gt;. Now we don't want to use the &lt;code&gt;tmp&lt;/code&gt; table way. So the second way is to &lt;a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#update_statement" rel="noopener noreferrer"&gt;UPDATE&lt;/a&gt; &lt;code&gt;STRUCT&lt;/code&gt; using&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;address_new&lt;/span&gt; &lt;span class="n"&gt;STRUCT&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;zip_code&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;UPDATE&lt;/span&gt;
    &lt;span class="nv"&gt;`dataset_name.person`&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt;
    &lt;span class="n"&gt;address_new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt;
            &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;STRUCT&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zip_code&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="k"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt; &lt;span class="k"&gt;RENAME&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;address_past&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt; &lt;span class="k"&gt;RENAME&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;address_new&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;address_past&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;In this case, only the &lt;code&gt;STRUCT&lt;/code&gt; field will be duplicated. That is good enough.&lt;/p&gt;

&lt;h4&gt;
  
  
  CREATE OR REPLACE TABLE
&lt;/h4&gt;

&lt;p&gt;Another last approach is using &lt;a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_table_statement" rel="noopener noreferrer"&gt;&lt;code&gt;CREATE OR REPLACE TABLE&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt;
&lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;has_car&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt;
            &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;STRUCT&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zip_code&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;In the same way, we can remove nested fields. We can just select the needed fields and omit the ones we don't interested in.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="nv"&gt;`dataset_name.person`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="k"&gt;SELECT&lt;/span&gt;
                &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;STRUCT&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;
            &lt;span class="k"&gt;EXCEPT&lt;/span&gt;
                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;zip_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="nv"&gt;`dataset_name.person`&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Bonus notes
&lt;/h2&gt;

&lt;p&gt;If you have some table schema from a separate dataset, that you need to create in your particular dataset the easiest the way is using CLI commands as it's a much faster and less error-prone way to create tables.&lt;/p&gt;
&lt;h3&gt;
  
  
  Create a regular table
&lt;/h3&gt;

&lt;p&gt;This is the example of how to save table schema using Table ID to JSON format with &lt;a href="https://cloud.google.com/bigquery/docs/reference/bq-cli-reference#bq_show" rel="noopener noreferrer"&gt;bq show&lt;/a&gt; command&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

bq show \
    --schema \
    --format=prettyjson \
    project_name:dataset_name.table_name &amp;gt; table_name.json


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;And now you can create a table in your dataset using &lt;a href="https://cloud.google.com/bigquery/docs/reference/bq-cli-reference#bq_mk" rel="noopener noreferrer"&gt;bq mk&lt;/a&gt; command:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

bq mk \
    --table \
    your_dataset_name.table_name \
    table_name.json


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Create an external table
&lt;/h3&gt;

&lt;p&gt;Here is the example of creating a table definition in JSON format using &lt;a href="https://cloud.google.com/bigquery/docs/reference/bq-cli-reference#bq_mkdef" rel="noopener noreferrer"&gt;bq mkdef&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

bq mkdef \
    --source_format=NEWLINE_DELIMITED_JSON \
    --autodetect=false \
    'gs://bucket_name/prefix/*.json' &amp;gt; table_def


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;mkdef&lt;/code&gt; command is to create a table definition in JSON format for data stored in Cloud Storage or Google Drive. It will be used to create an external table.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&lt;p&gt;bq mk \&lt;br&gt;
    --table \&lt;br&gt;
    --external_table_definition=nicereply_csat_raw_def \&lt;br&gt;
    dataset_name.table_name \&lt;br&gt;
    table_name.json&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Contact info&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;If you found this article helpful, I invite you to connect with me on &lt;a href="https://www.linkedin.com/in/iamtodor/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;. I am always looking to expand my network and connect with like-minded individuals in the data industry. Additionally, you can also reach out to me for any questions or feedback on the article. I'd be more than happy to engage in a conversation and help out in any way I can. So don’t hesitate to contact me, and let’s connect and learn together.&lt;/p&gt;

</description>
      <category>bigquery</category>
    </item>
    <item>
      <title>Building CI/CD for Vertex AI pipelines: The production</title>
      <dc:creator>Oleksandr Borodavka</dc:creator>
      <pubDate>Wed, 26 Apr 2023 12:10:08 +0000</pubDate>
      <link>https://forem.com/freshbooks/building-cicd-for-vertex-ai-pipelines-the-production-4lm9</link>
      <guid>https://forem.com/freshbooks/building-cicd-for-vertex-ai-pipelines-the-production-4lm9</guid>
      <description>&lt;p&gt;Hi there! As you probably already know from the first few articles of this series, we tested some new ideas and tools with a &lt;a href="https://dev.to/freshbooks/building-cicd-for-vertex-ai-pipelines-the-first-solution-1mp5"&gt;POC&lt;/a&gt; for one simple pipeline. Today we will review a new generalized version of CI/CD for Vertex AI pipelines we have built based on that experience and some &lt;a href="https://dev.to/freshbooks/building-cicd-for-vertex-ai-pipelines-the-real-world-1o05"&gt;further investigations&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  A bit of context
&lt;/h2&gt;

&lt;p&gt;Let's recall some base points to refresh the context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/vertex-ai" rel="noopener noreferrer"&gt;Vertex AI&lt;/a&gt; is used to build training pipelines.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://docs.github.com/en/actions" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt; is our CI/CD tool.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We built the declarative framework that allows us to standardize the format and operations for all our components and pipelines.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The specifications and implementations of all our components and pipelines are kept in one GitHub repository.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There are three environments: development(DEV), staging(STAGE) and production(PROD).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Stating the task
&lt;/h2&gt;

&lt;p&gt;What do we want to achieve? &lt;br&gt;
In simple words, we want to automate everything as much as possible. Ideally, when new changes appear in our code repository we want these changes applied in production in as short time a period as possible without any manual effort. Moreover, it should be done in a stable, safe, reproducible, and effective way.&lt;/p&gt;

&lt;p&gt;Running a little ahead, there are three absolutely awesome things in our CI/CD practice. In all deployment environments, our Continuous Integration system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;automatically rebuilds components and runs their unit and integration tests&lt;/li&gt;
&lt;li&gt;builds pipelines changed in Pull Requests&lt;/li&gt;
&lt;li&gt;and rebuilds dependent pipelines.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  The way of code
&lt;/h2&gt;

&lt;p&gt;Okay how can we achieve this?&lt;br&gt;
Since developers work with the codebase in the form of Pull Requests we can use them as starting points for the workflows. There are two moments that we have to automate.&lt;/p&gt;

&lt;p&gt;The first one is when a pull request is opened (or updated). Here we want to run the more basic and faster checks for the coming changes to provide quick feedback for a developer. These generally are unit tests and building jobs (just a build to check if configurations are okay) on our DEV environment. Then, if everything is fine, we have to run integration tests for components and run our pipelines on the STAGE environment to be sure it is ready to be merged.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmdslzc0exims1agyxe07.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmdslzc0exims1agyxe07.png" alt="The way of code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The second moment is when the PR is approved and merged. Here we have the changes which have already been tested, reviewed, and merged into the main branch, so it is ready for delivery. All the processes are run again but now on the PROD environment this time.&lt;/p&gt;
&lt;h2&gt;
  
  
  The implementation
&lt;/h2&gt;

&lt;p&gt;For both stages, we use GitHub Actions workflows and some CLI commands of the Python framework, since routines related to code analysis are more expedient to implement with the framework’s specific code. It probably doesn’t make sense to review all the code, that would be too long and too specific. However, we can take a look at the general structure and a bit of a simplified version of the workflows that have detail enough to present the idea.&lt;/p&gt;

&lt;p&gt;Here is the structure of the source code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pipelines/
  pipeline1.yaml
  ...
components/
  component1/
    src/
      ...
    tests/
      unit/
          ...
      integration/
          test.yaml
    config.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is a folder with pipeline specifications and a folder with components. Each component contains source files, tests, and a configuration.&lt;/p&gt;

&lt;p&gt;And it is how the GitHub Actions directory looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.github
  actions
    build_component
    build_pipeline
    run_pipeline
    test_component
    test_component_integration
  workflows
    pr_merged.yml
    pr_opened.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where the &lt;code&gt;actions&lt;/code&gt; is a directory with reusable &lt;a href="https://docs.github.com/en/actions/creating-actions/creating-a-composite-action" rel="noopener noreferrer"&gt;composite actions&lt;/a&gt;, which do all the operations we need with components and pipelines (build, test, and run). And in the &lt;code&gt;workflows&lt;/code&gt; directory, we have two workflows that are automatically triggered by GitHub when a Pull Request is opened/updated or merged respectively. &lt;/p&gt;

&lt;h2&gt;
  
  
  PR is opened
&lt;/h2&gt;

&lt;p&gt;Let's take a look at the first workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MLOps Pull Request - Opened&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# The workflow will be run automatically when a PR is &lt;/span&gt;
  &lt;span class="c1"&gt;# opened to the main branch or changes to the PR are pushed&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;opened&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;synchronize&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;reopened&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pipelines/**/*.yaml"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;components/**/*.yaml"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;components/**/*.py"&lt;/span&gt;

&lt;span class="c1"&gt;# Use concurrency to ensure that only a single workflow &lt;/span&gt;
&lt;span class="c1"&gt;# using the same concurrency group will run at a time&lt;/span&gt;
&lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dev_stage_environment&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;git_diff&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Get a list of changed files in the PR for the future analysis&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;diff&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.getter.outputs.diff }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;fetch-depth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get git diff&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;getter&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;GIT_DIFF="$(echo $(git diff --name-only origin/main...origin/${GITHUB_HEAD_REF}))"&lt;/span&gt;
          &lt;span class="s"&gt;echo "::set-output name=diff::$GIT_DIFF"&lt;/span&gt;
  &lt;span class="na"&gt;get_component_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Get a list of names for the added/changed components&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;git_diff&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;names&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.getter.outputs.names }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get list&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;getter&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;NAMES="$(make get_components --paths='${{ needs.git_diff.outputs.diff }}')"&lt;/span&gt;
          &lt;span class="s"&gt;echo "::set-output name=names::$NAMES"&lt;/span&gt;
  &lt;span class="na"&gt;test_and_build_components_on_dev&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Test and build changed components on the DEV environment&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get_component_list&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;development&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Use matrix strategy to run the tasks in parallel&lt;/span&gt;
      &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJson(needs.get_component_list.outputs.names) }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Test component&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/test_component&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;component_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build component&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/build_component&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;component_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;

  &lt;span class="na"&gt;get_pipeline_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Get a list of names for the added/changed pipelines&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;git_diff&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;test_and_build_components_on_dev&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;names&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.getter.outputs.names }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get list&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;getter&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;NAMES="$(make get_pipelines --paths='${{ needs.git_diff.outputs.diff }}')"&lt;/span&gt;
          &lt;span class="s"&gt;echo "::set-output name=names::$NAMES"&lt;/span&gt;
  &lt;span class="na"&gt;build_pipelines_on_dev&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Build changed pipelines on the DEV environment&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get_pipeline_list&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;development&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Use matrix strategy to run the tasks in parallel&lt;/span&gt;
      &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJson(needs.get_pipeline_list.outputs.names) }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build pipeline&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/build_pipeline&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;

  &lt;span class="na"&gt;get_indirect_pipeline_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Get a list of names for the indirectly changed pipelines&lt;/span&gt;
    &lt;span class="c1"&gt;# (when a related component was changed)&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;git_diff&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;build_pipelines_on_dev&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;names&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.getter.outputs.names }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get list&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;getter&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;NAMES="$(make get_indirect_pipelines --paths='${{ needs.git_diff.outputs.diff }}')"&lt;/span&gt;
          &lt;span class="s"&gt;echo "::set-output name=names::$NAMES"&lt;/span&gt;
  &lt;span class="na"&gt;build_indirect_pipelines_on_dev&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Build indirectly changed pipelines on the DEV environment&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get_indirect_pipeline_list&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;development&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Use matrix strategy to run the tasks in parallel&lt;/span&gt;
      &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJson(needs.get_indirect_pipeline_list.outputs.names) }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build pipeline&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/build_pipeline&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;

  &lt;span class="na"&gt;build_components_and_test_integration_on_stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Build changed components and run integration tests &lt;/span&gt;
    &lt;span class="c1"&gt;# for them on the STAGE environment&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;build_pipelines_on_dev&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;build_indirect_pipelines_on_dev&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;get_component_list&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;staging&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Use matrix strategy to run the tasks in parallel&lt;/span&gt;
      &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJson(needs.get_component_list.outputs.names) }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build component&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/build_component&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;component_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Test component integration&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/test_component_integration&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;component_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;

  &lt;span class="na"&gt;build_and_run_pipelines_on_stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Build changed pipelines and run them on the STAGE environment&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;build_components_and_test_integration_on_stage&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;get_pipeline_list&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;staging&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Use matrix strategy to run the tasks in parallel&lt;/span&gt;
      &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJson(needs.get_pipeline_list.outputs.names) }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build pipeline&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/build_pipeline&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run pipeline&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/run_pipeline&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;

  &lt;span class="na"&gt;build_and_run_indirect_pipelines_on_stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Build indirectly changed pipelines and run them on the STAGE environment&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;build_and_run_pipelines_on_stage&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;get_indirect_pipeline_list&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;staging&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Use matrix strategy to run the tasks in parallel&lt;/span&gt;
      &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJson(needs.get_indirect_pipeline_list.outputs.names) }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build pipeline&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/build_pipeline&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run pipeline&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/run_pipeline&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is automatically called when a pull request is opened or updated. Due to the &lt;a href="https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#concurrency" rel="noopener noreferrer"&gt;concurrency&lt;/a&gt; feature, it will be run once at a time. However, all the similar jobs, like tests will be &lt;a href="https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstrategymatrix" rel="noopener noreferrer"&gt;run in parallel&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The logic inside is as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Analyze the changes in a pull request and find out which components and/or pipelines were affected.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Build them in the predefined order.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run automated tests for the components.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run the pipelines to retrain models and deliver them for the final usage.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It is universal and works with any components and pipelines when they follow the framework agreements. Also, it is safe, works without duplicates, runs the jobs in the right order, and parallelizes them when it is possible.&lt;/p&gt;

&lt;p&gt;The current workflow operates on development and staging environments. &lt;/p&gt;

&lt;h2&gt;
  
  
  PR is merged
&lt;/h2&gt;

&lt;p&gt;The second workflow is run when PR is merged.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MLOps Pull Request - Merged&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# The workflow will be run automatically when a PR is closed&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;closed&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pipelines/**/*.yaml"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;components/**/*.yaml"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;components/**/*.py"&lt;/span&gt;

&lt;span class="c1"&gt;# Use concurrency to ensure that only a single workflow &lt;/span&gt;
&lt;span class="c1"&gt;# using the same concurrency group will run at a time&lt;/span&gt;
&lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prod_environment&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;if_merged&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# There is no way to trigger the workflow when it was merged &lt;/span&gt;
    &lt;span class="c1"&gt;# (for now we know only it was closed)&lt;/span&gt;
    &lt;span class="c1"&gt;# so we have to check it at the first job&lt;/span&gt;
    &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github.event.pull_request.merged == &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo The PR was merged&lt;/span&gt;
  &lt;span class="na"&gt;git_diff&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Get a list of changed files in the PR for the future analysis&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;if_merged&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;diff&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.getter.outputs.diff }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;fetch-depth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get git diff&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;getter&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;GIT_DIFF="$(echo $(git diff --name-only ${GITHUB_SHA}^ ${GITHUB_SHA}))"&lt;/span&gt;
          &lt;span class="s"&gt;echo "::set-output name=diff::$GIT_DIFF"&lt;/span&gt;
  &lt;span class="na"&gt;get_component_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Get a list of names for the added/changed components&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;git_diff&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;names&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.getter.outputs.names }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get list&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;getter&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;NAMES="$(make get_components --paths='${{ needs.git_diff.outputs.diff }}')"&lt;/span&gt;
          &lt;span class="s"&gt;echo "::set-output name=names::$NAMES"&lt;/span&gt;
  &lt;span class="na"&gt;test_and_build_components_on_prod&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Test and build changed components on the PROD environment&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get_component_list&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Use matrix strategy to run the tasks in parallel&lt;/span&gt;
      &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJson(needs.get_component_list.outputs.names) }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Test component&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/test_component&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;component_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build component&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/build_component&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;component_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Test component integration&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/test_component_integration&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;component_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;

  &lt;span class="na"&gt;get_pipeline_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Get a list of names for the added/changed pipelines&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;git_diff&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;test_and_build_components_on_prod&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;names&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.getter.outputs.names }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get list&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;getter&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;NAMES="$(make get_pipelines --paths='${{ needs.git_diff.outputs.diff }}')"&lt;/span&gt;
          &lt;span class="s"&gt;echo "::set-output name=names::$NAMES"&lt;/span&gt;
  &lt;span class="na"&gt;build_and_run_pipelines_on_prod&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Build and run changed pipelines on the PROD environment&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;test_and_build_components_on_prod&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;get_pipeline_list&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Use matrix strategy to run the tasks in parallel&lt;/span&gt;
      &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJson(needs.get_pipeline_list.outputs.names) }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build pipeline&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/build_pipeline&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run pipeline&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/run_pipeline&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;

  &lt;span class="na"&gt;get_indirect_pipeline_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Get a list of names for the indirectly changed pipelines&lt;/span&gt;
    &lt;span class="c1"&gt;# (when a related component was changed)&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;git_diff&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;build_and_run_pipelines_on_prod&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;names&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.getter.outputs.names }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get list&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;getter&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;NAMES="$(make get_indirect_pipelines --paths='${{ needs.git_diff.outputs.diff }}')"&lt;/span&gt;
          &lt;span class="s"&gt;echo "::set-output name=names::$NAMES"&lt;/span&gt;
  &lt;span class="na"&gt;build_and_run_indirect_pipelines_on_prod&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Build indirectly changed pipelines and run them on the PROD environment&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;build_and_run_pipelines_on_prod&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;get_indirect_pipeline_list&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Use matrix strategy to run the tasks in parallel&lt;/span&gt;
      &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJson(needs.get_indirect_pipeline_list.outputs.names) }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build pipeline&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/build_pipeline&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run pipeline&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/run_pipeline&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.name }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This workflow is quite similar to the previous one, just runs all the jobs in the production environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;That’s it! The presented solution has been working well for us for more than 6 months already. We have some ideas on how to make it even better and maybe we will share the results with the community in the next articles.&lt;/p&gt;

&lt;p&gt;I hope this will be be useful to you in your MLOps journey and helps you to save some time while building your own CI/CD process for ML pipelines.&lt;/p&gt;

&lt;p&gt;Please, feel free to share any thoughts, questions, or proposals in the comments.&lt;/p&gt;

&lt;p&gt;Thank you and happy coding!&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>mlops</category>
      <category>cicd</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>The Practical Guide to Utilizing DBT Packages for Data Transformation</title>
      <dc:creator>iamtodor</dc:creator>
      <pubDate>Thu, 12 Jan 2023 10:18:36 +0000</pubDate>
      <link>https://forem.com/freshbooks/the-practical-guide-to-utilizing-dbt-packages-for-data-transformation-4138</link>
      <guid>https://forem.com/freshbooks/the-practical-guide-to-utilizing-dbt-packages-for-data-transformation-4138</guid>
      <description>&lt;h2&gt;
  
  
  Table of content
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;What are packages&lt;/li&gt;
&lt;li&gt;Why use it&lt;/li&gt;
&lt;li&gt;Local packages&lt;/li&gt;
&lt;li&gt;Dbt hub packages&lt;/li&gt;
&lt;li&gt;Verify packages are installed&lt;/li&gt;
&lt;li&gt;Macros usage&lt;/li&gt;
&lt;li&gt;Models usage&lt;/li&gt;
&lt;li&gt;dbt_modules under the hood&lt;/li&gt;
&lt;li&gt;Disclaimer&lt;/li&gt;
&lt;li&gt;Contact&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What are packages
&lt;/h2&gt;

&lt;p&gt;dbt packages are collections of macros, models, and other resources that are used to extend the functionality of dbt. Packages can be used to share common code and resources across multiple dbt projects, and can be published and installed from the &lt;a href="https://hub.getdbt.com/" rel="noopener noreferrer"&gt;dbt Hub&lt;/a&gt;, from GitHub or can be stored locally and installed by specifying the path to the project.&lt;/p&gt;

&lt;p&gt;In dbt, libraries like these are called packages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why use it
&lt;/h2&gt;

&lt;p&gt;dbt packages are so powerful because so many of the analytic problems we encountered are shared across organizations. &lt;/p&gt;

&lt;p&gt;There are a few general benefits to using packages in dbt:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Reusability: packages allow you to reuse code across multiple projects and models. This can save you a lot of time and effort, as you don't have to copy and paste the same code into multiple places.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Collaboration: packaging your models in a package allows multiple people to work on the same models at the same time. You can use version control systems like git to manage changes to the models, and use tools like the &lt;code&gt;dbt test&lt;/code&gt; command to ensure that the models are correct and reliable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sharing: packaging your models or macros in a package allows you to share them with others. You can publish your package on the &lt;a href="https://hub.getdbt.com/" rel="noopener noreferrer"&gt;dbt Hub&lt;/a&gt; or on GitHub, and others can install and use your models in their own dbt projects.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Managing: packages make it easier to manage your codebase. You can use version control to track changes to your package, and you can easily install and update packages in your dbt project.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modularity: packaging your models in a package allows you to break your data pipeline into smaller, more manageable pieces, which are easier to understand and maintain. This could make it streamline development and upkeep your dbt project over time. This is especially useful if you are working on a large project with many different models and transformations.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Overall, using packages can help you to build more efficient, maintainable, and scalable data pipelines with dbt.&lt;/p&gt;

&lt;p&gt;For example, if your aim is to extract the day of the week, there is no sense to reinvent the wheel and develop this macro on your own. Rather, we might want to find the right package and make use of it &lt;a href="https://github.com/calogica/dbt-date#day_of_weekdate-isoweektrue" rel="noopener noreferrer"&gt;&lt;code&gt;{{ dbt_date.day_of_week(column_name) }}&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local packages
&lt;/h2&gt;

&lt;p&gt;In dbt, you can use local packages to organize and reuse code within a single dbt project. Local packages are stored within your project directory and are only available to the models in that project. The best use-case for local packages is some module that you want to live in the same repository, nearby to the main project.&lt;/p&gt;

&lt;p&gt;To create a reusable local package do the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Consider you have the following dbt project dir structure
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; tree -L 1 .
.
├── data
├── dbt_project.yml
├── macros
├── models
├── packages
├── packages.yml
├── profiles.yml
└── target
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Create &lt;code&gt;packages&lt;/code&gt; dir, so here we would put our first local package.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir packages
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Create &lt;code&gt;packages.yml&lt;/code&gt; file, so here we would link our first local package.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;touch packages.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Before moving on please verify that you have the following dir structure
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; tree -L 1 .
.
├── data
├── dbt_modules
├── dbt_project.yml
├── macros
├── models
├── packages
├── packages.yml
├── profiles.yml
├── snapshots
└── target
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Jump into &lt;code&gt;packages&lt;/code&gt; dir and init your package with the name &lt;code&gt;local_utils&lt;/code&gt;. The name of package is arbitrary.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd packages
dbt init local_utils
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It will create a package with the following structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; tree local_utils
local_utils
├── README.md
├── analysis
├── data
├── dbt_project.yml
├── macros
├── models
│   └── example
│       ├── my_first_dbt_model.sql
│       ├── my_second_dbt_model.sql
│       └── schema.yml
├── snapshots
└── tests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Next, you need to change the project &lt;code&gt;name&lt;/code&gt; in &lt;code&gt;dbt_project.yml&lt;/code&gt; from &lt;code&gt;my_new_project&lt;/code&gt; to a meaningful and self-explainable name. This name will be used further as macro or model references. Let's call it &lt;code&gt;local_utils&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Specify our local package in the before-mentioned &lt;code&gt;packages.yml&lt;/code&gt; as follows:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;packages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;local&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/opt/dbt/packages/local_utils/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure that you provide your absolute path to the packages. Otherwise, it would not work.&lt;/p&gt;

&lt;p&gt;Save the &lt;code&gt;packages.yml&lt;/code&gt; file and run the &lt;code&gt;dbt deps&lt;/code&gt; command to install the package. This will link the package and make it available to your dbt models.&lt;/p&gt;

&lt;p&gt;Here is an example of what the &lt;code&gt;dbt deps&lt;/code&gt; command might look like when you install your local package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; dbt deps
Running with dbt=0.21.1
Installing /opt/dbt/packages/local_utils/
  Installed from &amp;lt;local @ /opt/dbt/packages/local_utils/&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can observe a newly created &lt;code&gt;dbt_modules&lt;/code&gt; dir, that contains binary file &lt;code&gt;local_utils&lt;/code&gt;. It means than our local package is ready to be used.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; tree dbt_modules
dbt_modules
└── utils -&amp;gt; /opt/dbt/packages/local_utils/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Dbt hub packages
&lt;/h2&gt;

&lt;p&gt;In dbt, you can use packages from the dbt Hub to share your code with others and to reuse code from other users in your own projects. The &lt;a href="https://hub.getdbt.com/" rel="noopener noreferrer"&gt;dbt Hub&lt;/a&gt; is a community-driven library of packages that you can use to extend the functionality of dbt.&lt;/p&gt;

&lt;p&gt;Probably, the best examples of third-party packages driven by the community would be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://hub.getdbt.com/dbt-labs/dbt_utils/latest/" rel="noopener noreferrer"&gt;dbt_utils&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hub.getdbt.com/calogica/dbt_expectations/latest/" rel="noopener noreferrer"&gt;dbt_expectations&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are a few benefits to using dbt Hub packages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Reusable code: dbt Hub packages allow you to reuse code that has been shared by other users, teams and companies. This can save you a lot of time and effort, as you don't have to write the same logic from scratch, test and maintain it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The major advantage of any open-source:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Community support: When you use packages from the dbt Hub, you can benefit from the support and expertise of the dbt community. If you have questions or run into issues with a package, you can ask for help on the dbt community forums or Slack channel.&lt;/li&gt;
&lt;li&gt;Collaboration: By sharing your own packages on the dbt Hub, you can make your code available to other users. This can help to foster collaboration and improve the overall quality of your code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Overall, using dbt Hub packages can help you to build more efficient, maintainable, and scalable data pipelines with dbt, and to collaborate with others in the dbt community.&lt;/p&gt;

&lt;p&gt;To install a package from the dbt Hub in your dbt project, you will need to add the package to your packages.yml file.&lt;/p&gt;

&lt;p&gt;Here is the basic process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to the &lt;a href="https://hub.getdbt.com/" rel="noopener noreferrer"&gt;dbt Hub&lt;/a&gt; and search for the package you want to install.&lt;/li&gt;
&lt;li&gt;Click on the package to view its details.&lt;/li&gt;
&lt;li&gt;Copy the package name and version from the installation instructions.&lt;/li&gt;
&lt;li&gt;Open your &lt;code&gt;packages.yml&lt;/code&gt; file and add the package name and version to the packages list. It should look something like this:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;packages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dbt-labs/dbt_utils&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.7.6&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save the &lt;code&gt;packages.yml&lt;/code&gt; file and run the &lt;code&gt;dbt deps&lt;/code&gt; command to install the package. This will download the package and make it available to your dbt models.&lt;/p&gt;

&lt;p&gt;Here is an example of what the &lt;code&gt;dbt deps&lt;/code&gt; command might look like when you install dbt hub package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; dbt deps
Installing dbt-labs/dbt_utils@0.7.6
  Installed from version 0.7.6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unlike of local package, hub package was downloaded to &lt;code&gt;dbt_modules&lt;/code&gt; dir physically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; tree -L 2 dbt_modules
dbt_modules
└── dbt_utils
    ├── CHANGELOG.md
    ├── LICENSE
    ├── README.md
    ├── RELEASE.md
    ├── dbt_project.yml
    ├── docker-compose.yml
    ├── etc
    ├── integration_tests
    ├── macros
    └── run_test.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Verify packages are installed
&lt;/h2&gt;

&lt;p&gt;To verify that a package is installed in your dbt project, you can check the &lt;code&gt;packages.yml&lt;/code&gt; file and run the &lt;code&gt;dbt deps&lt;/code&gt; command.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Check the &lt;code&gt;packages.yml&lt;/code&gt; file: This file lists all of the packages that are installed in your dbt project. Look for the name of the package you want to verify. If it is listed in the packages list, then it is installed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Run the &lt;code&gt;dbt deps&lt;/code&gt; command:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;This command will show you a list of all of the packages that are installed in your dbt project. Look for the name of the package you want to verify. If it is listed, then it is installed.&lt;/li&gt;
&lt;li&gt;In the root dbt project dir, you observe a new dir &lt;code&gt;dbt_modules/&lt;/code&gt; which contains the compiled packages that are ready to be used. &lt;strong&gt;NOTE&lt;/strong&gt;: dir &lt;code&gt;dbt_modules/&lt;/code&gt; has to be added to &lt;code&gt;.gitignore&lt;/code&gt;.
&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; tree -L 1 .
.
├── data
├── dbt_modules
├── dbt_project.yml
├── macros
├── models
├── packages
├── packages.yml
├── profiles.yml
├── snapshots
└── target
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your &lt;code&gt;packages.yml&lt;/code&gt; file contains package that is not installed then you would not be able to run any dbt command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; dbt list
Encountered an error:
Compilation Error
  dbt found 1 package(s) specified in packages.yml, but only 0 package(s) installed in dbt_modules. Run dbt deps to install package dependencies.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So this is our guarantee that in runtime we would not have any issues related to the package installation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Macros usage
&lt;/h2&gt;

&lt;p&gt;In dbt, you can use packages to define custom macros that can be called from your dbt models. Here is an example of how you might use a package to define a custom macro.&lt;/p&gt;

&lt;p&gt;Here are a few examples of how you might use macros in dbt to perform common data transformations.&lt;/p&gt;

&lt;p&gt;For instance, lets create the following macros in our local package under &lt;code&gt;local_utils/macros/cents_to_dollars.sql&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;macro&lt;/span&gt; &lt;span class="n"&gt;cents_to_dollars&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;column_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;precision&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;({{&lt;/span&gt; &lt;span class="k"&gt;column_name&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="nb"&gt;precision&lt;/span&gt; &lt;span class="p"&gt;}})&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endmacro&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we can call our macros as &lt;code&gt;{{ local_utils.cents_to_dollars(your_column_name) }}&lt;/code&gt;. The &lt;code&gt;local_utils&lt;/code&gt; package names comes from the &lt;code&gt;name&lt;/code&gt; in our package &lt;code&gt;dbt_project.yml&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;Usage a macros from dbt hub packages is pretty much the same. Imagine we want to generate a surrogate key based on a few columns. This is the functional &lt;code&gt;dbt-utils&lt;/code&gt; we previously installed provides: &lt;a href="https://github.com/dbt-labs/dbt-utils#generate_surrogate_key-source" rel="noopener noreferrer"&gt;&lt;code&gt;{{ dbt_utils.generate_surrogate_key([field_a, field_b[,...]]) }}&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So the macros usage pattern from the third-party package is &lt;code&gt;{{ package_name.macros_name() }}&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Models usage
&lt;/h2&gt;

&lt;p&gt;As we created our own local package &lt;code&gt;local_utils&lt;/code&gt; as a prerequisite, which has the following structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; tree packages/local_utils
local_utils
├── README.md
├── analysis
├── data
├── dbt_project.yml
├── macros
├── models
│   └── example
│       ├── my_first_dbt_model.sql
│       ├── my_second_dbt_model.sql
│       └── schema.yml
├── snapshots
└── tests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The most important function in dbt is &lt;code&gt;ref()&lt;/code&gt;; its impossible to build even moderately complex models without it. &lt;code&gt;ref()&lt;/code&gt; is how you reference one model within another inside your package. Here is how this looks in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"model_a"&lt;/span&gt;&lt;span class="p"&gt;)}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So if we would like to reference &lt;code&gt;my_first_dbt_model&lt;/code&gt; from &lt;code&gt;my_second_dbt_model&lt;/code&gt; within &lt;code&gt;local_utils&lt;/code&gt; package then we do the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"my_first_dbt_model"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we want to reference &lt;code&gt;my_first_dbt_model&lt;/code&gt; from our main project then we need to slightly change the way we call it. There is also a two-argument variant of the &lt;code&gt;ref&lt;/code&gt; function. With this variant, you can pass both a package name and model name to &lt;code&gt;ref&lt;/code&gt; to avoid ambiguity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"package_name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"model_name"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our particular case would be as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"local_utils"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"my_first_dbt_model"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: The &lt;code&gt;package_name&lt;/code&gt; should only include the name of the package, not the maintainer. For example, if we use the &lt;code&gt;dbt-labs/dbt-utils&lt;/code&gt; package, type &lt;code&gt;dbt-utils&lt;/code&gt; in that argument, and not &lt;code&gt;dbt-labs/dbt-utils&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  dbt_modules under the hood
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;dbt_modules&lt;/code&gt; directory is a directory that is used by dbt to store packages and their models. When you install a package using the &lt;code&gt;dbt deps&lt;/code&gt; command, the package and its models are downloaded and stored in the &lt;code&gt;dbt_modules&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;dbt_modules&lt;/code&gt; directory is located in the root directory of your dbt project. It contains subdirectories for each installed package, and each package directory contains the packages models, macros, and other resources.&lt;/p&gt;

&lt;p&gt;The way dbt installs &lt;code&gt;local&lt;/code&gt; and &lt;code&gt;dbt hub&lt;/code&gt; packages is different.&lt;/p&gt;

&lt;p&gt;Considering to have the following &lt;code&gt;package.yml&lt;/code&gt; content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;packages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;package&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dbt-labs/dbt_utils&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.7.6&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;local&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/opt/dbt/packages/local_utils/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You would have the following modules under generated &lt;code&gt;dbt_modules&lt;/code&gt; dir:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; tree -L 2 dbt_modules
dbt_modules
├── dbt_utils
│   ├── CHANGELOG.md
│   ├── LICENSE
│   ├── README.md
│   ├── RELEASE.md
│   ├── dbt_project.yml
│   ├── docker-compose.yml
│   ├── etc
│   ├── integration_tests
│   ├── macros
│   └── run_test.sh
└── utils -&amp;gt; /opt/dbt/packages/local_utils/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As I mentioned before, it creates a symlink for local packages, and for the dbt hub package, it simply copies all the needed files in the same name folder.&lt;/p&gt;

&lt;p&gt;We use Google Cloud Composer to orchestrate all the transformation jobs. We basically copy our project to GCP bucket with &lt;code&gt;gsutil -m rsync&lt;/code&gt;. Unfortunately, it does not support symbolic links:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Since gsutil rsync is intended to support data operations (like moving a data set to the cloud for computational processing) and it needs to be compatible both in the cloud and across common operating systems, there are no plans for gsutil rsync to support operating system-specific file types like symlinks.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Taken from &lt;code&gt;gutils rsync&lt;/code&gt;'s documentation &lt;a href="https://cloud.google.com/storage/docs/gsutil/commands/rsync#be-careful-when-synchronizing-over-os-specific-file-types-symlinks,-devices,-etc" rel="noopener noreferrer"&gt;Be Careful When Synchronizing Over Os-Specific File Types (Symlinks, Devices, Etc.)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The possible solution is to compress everything locally to archive, copy it to the bucket, and then unpack it to composer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tar -czvf dbt-project.tar.gz dbt-project
gsutil -m rsync dbt-project.tar.gz gs://$BUCKET/prefix
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here’s what those switches actually mean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-c: Create an archive.
-z: Compress the archive with gzip.
-v: Display progress in the terminal while creating the archive, also known as “verbose” mode. The v is always optional in these commands, but it’s helpful.
-f: Allows you to specify the filename of the archive.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are pitfalls that we met when working with dbt packages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Disclaimer
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;All this experience applies to dbt v0.21.1&lt;/li&gt;
&lt;li&gt;I am aware of since v1.0 they &lt;a href="https://docs.getdbt.com/guides/migration/versions/upgrading-to-v1.0#breaking-changes" rel="noopener noreferrer"&gt;changed&lt;/a&gt; the default value to &lt;code&gt;dbt_packages&lt;/code&gt; instead of &lt;code&gt;dbt_modules&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;I like to think that most of the guide still appears to be applicable to the latest dbt version&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Contact
&lt;/h2&gt;

&lt;p&gt;If you found this article helpful, I invite you to connect with me on &lt;a href="https://www.linkedin.com/in/iamtodor/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;. I am always looking to expand my network and connect with like-minded individuals in the data industry. Additionally, you can also reach out to me for any questions or feedback on the article. I'd be more than happy to engage in a conversation and help out in any way I can. So don’t hesitate to contact me, and let’s connect and learn together.&lt;/p&gt;

</description>
      <category>gratitude</category>
      <category>learning</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Building CI/CD for Vertex AI pipelines: The real world</title>
      <dc:creator>Oleksandr Borodavka</dc:creator>
      <pubDate>Tue, 20 Sep 2022 15:11:50 +0000</pubDate>
      <link>https://forem.com/freshbooks/building-cicd-for-vertex-ai-pipelines-the-real-world-1o05</link>
      <guid>https://forem.com/freshbooks/building-cicd-for-vertex-ai-pipelines-the-real-world-1o05</guid>
      <description>&lt;p&gt;Our &lt;a href="https://dev.to/freshbooks/building-cicd-for-vertex-ai-pipelines-the-first-solution-1mp5"&gt;first solution&lt;/a&gt; for a CI/CD implementation is a great start for us. It works well when you do not change many things at the same time when the team is small and updates are not frequent. However, the real workflow usually is not so simple. Let's take a look at some possible situations that can happen if we stay with the first implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Concurrency mess
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Issue
&lt;/h3&gt;

&lt;p&gt;Let's look at what we get if we open a pull request with several changes. For instance, we have two changed components and one pipeline, and the pipeline contains both of these components.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdk7op5nazfs8a19j0umb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdk7op5nazfs8a19j0umb.png" alt="Concurrency mess"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this case, GitHub Actions runs 3 workflows in parallel. Since the components are related to the pipeline, the pipeline will be built after each component. So the same job will be run 3 times. Also, we do not have any guarantee about the order of execution. It can be done in any order. Like on the diagram, firstly the &lt;em&gt;pipeline&lt;/em&gt; will be built with &lt;em&gt;component2&lt;/em&gt;, then with &lt;em&gt;component1&lt;/em&gt;, and as a result, we will receive the &lt;em&gt;pipeline&lt;/em&gt; only with updated &lt;em&gt;component1&lt;/em&gt;, but none of the builds will be with both updated components.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution
&lt;/h3&gt;

&lt;p&gt;We can avoid such situations by running the processes in a predefined order. In our case, there are no relations like &lt;em&gt;component-to-component&lt;/em&gt; or &lt;em&gt;pipeline-to-pipeline&lt;/em&gt;, we have only relations &lt;em&gt;component-to-pipeline&lt;/em&gt;. So, if we would be able to run firstly all processes for components and after all processes for pipelines, it would guarantee us the correct build in the end. &lt;br&gt;
Our solution is as follows, within one GHA workflow we do the following steps: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Analyze the list of changed files in a pull request

&lt;ul&gt;
&lt;li&gt;Find changed components&lt;/li&gt;
&lt;li&gt;Find changed pipelines&lt;/li&gt;
&lt;li&gt;Find pipelines related to changed components&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run jobs for all component&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run jobs for all pipelines&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It can be done with the &lt;a href="https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs" rel="noopener noreferrer"&gt;matrix strategy&lt;/a&gt; and some additional logic in the code. The matrix strategy is a feature of GitHub Action that lets to run many jobs in parallel for a matrix of parameters. In our case, we can extract a list of changed components/pipelines and run all necessary tasks for them with a single job definition.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghk7j391qxlgxxep1h4a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghk7j391qxlgxxep1h4a.png" alt="Run in order"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Doing all jobs in one workflow deprives us of duplicates as well. We also gain one more important outcome, we do not need to manually add and maintain workflows for every pipeline and component anymore as we did in the first implementation. Everything is being built on the fly.&lt;/p&gt;
&lt;h2&gt;
  
  
  Many Pull Requests
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Issue
&lt;/h3&gt;

&lt;p&gt;Great, but what if we have two or more open pull requests? Each PR is run in the required order, it is already solved in the previous step. However, since we store all the configurations and docker images in the cloud, they are shared resources. It means that one CI/CD process can interfere with another process, rewrite common resources, and so on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F136dhvsgxk3avj0s76a8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F136dhvsgxk3avj0s76a8.png" alt="Many Pull Requests"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here we have a mess again. In addition, if &lt;em&gt;component1&lt;/em&gt; from &lt;em&gt;PR1&lt;/em&gt; is buggy and it is built in time between &lt;em&gt;component1&lt;/em&gt; and &lt;em&gt;pipeline1&lt;/em&gt; from &lt;em&gt;PR2&lt;/em&gt;, then &lt;em&gt;PR2&lt;/em&gt; will fail, just because of a bug in &lt;em&gt;PR1&lt;/em&gt;. Or even worse, when the situation is reversed and a buggy component can pass due to overwrite.&lt;/p&gt;
&lt;h3&gt;
  
  
  Solution
&lt;/h3&gt;

&lt;p&gt;Probably the simplest solution here will be to deny concurrent CI/CD jobs. Just process one pull request at a time. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;Pull Request 1&lt;/code&gt; -&amp;gt; &lt;code&gt;Pull Request 2&lt;/code&gt; -&amp;gt; &lt;code&gt;...&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;With GitHub Actions, it can be done with the &lt;a href="https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#concurrency" rel="noopener noreferrer"&gt;concurrency&lt;/a&gt; feature. That allows us to define concurrency groups to ensure that only a single job or workflow using the same group will run at a time.&lt;br&gt;
Yes, it can be a bottleneck in the future, especially if the builds take significant time. But it solves the issue and it works for us now. Also, it is quite easy to implement, so let's go further.&lt;/p&gt;
&lt;h2&gt;
  
  
  Versioning
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Issue
&lt;/h3&gt;

&lt;p&gt;Another dangerous area is versions of docker images. We use &lt;a href="https://www.kubeflow.org/docs/components/pipelines/sdk/component-development/#containerize-your-components-code" rel="noopener noreferrer"&gt;containerized components&lt;/a&gt; that allow us to work with them as with independent applications, it increases reusability, and so on. It means that a ready-to-use component consists of two entities: a &lt;a href="https://www.kubeflow.org/docs/components/pipelines/sdk/component-development/#creating-a-component-specification" rel="noopener noreferrer"&gt;specification file&lt;/a&gt; and a docker image. &lt;br&gt;
Let's look at what can happen if we always use the &lt;code&gt;image:latest&lt;/code&gt; version of the docker image. &lt;/p&gt;
&lt;h4&gt;
  
  
  Situation 1
&lt;/h4&gt;

&lt;p&gt;For instance, we already have a pipeline specification.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pipelineSpec"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"deploymentSpec"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"component1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gcr.io/components/component1:latest"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We do not want to change anything, just want to use it to run the pipeline. Maybe we do it automatically on schedule. At the same time, we can have an ongoing CI/CD job that already rebuilt a part of the components, but some part is still in progress. Outcomes can be unpredictable.&lt;/p&gt;

&lt;h4&gt;
  
  
  Situation 2
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fto48aj89wm78h90a2o3n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fto48aj89wm78h90a2o3n.png" alt="Two developers work on the same component"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Say two developers work with the same component. Tom updates &lt;em&gt;component1&lt;/em&gt;, then works with &lt;em&gt;component2&lt;/em&gt;, and at this moment Bet updates &lt;em&gt;component1&lt;/em&gt; as well. When Tom builds &lt;em&gt;pipeline1&lt;/em&gt; the right version of &lt;em&gt;component2&lt;/em&gt; will be used, but the latest image of &lt;em&gt;component1&lt;/em&gt; was overwritten by Bet. Tom will receive something unexpected as result. It is good if the difference between these parallel changes is obvious. Otherwise, it can take a lot of time to understand what is wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution
&lt;/h3&gt;

&lt;p&gt;By linking the concrete versions of components with &lt;code&gt;image:version&lt;/code&gt; and doing only one CI/CD job at a time, we will be fine in the Staging and Production environments. The situation in the Development environment is still tricky. As an option, we can provide a unique dev environment for each developer. However, it will increase the price and require additional work on setting up and maintaining a variety of environments.&lt;br&gt;
As a solution for the dev, we can add another CLI command (or extend the existing one) to rebuild the whole pipeline with all related components. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgfoqody3uoqqrkbylge.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgfoqody3uoqqrkbylge.png" alt="Cache components on local"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this way, we will avoid the occasional overwriting of components. Everything will be built from the current code and used locally without a chance of interruption. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Here we have considered some important aspects that are not so obvious at the start. However ignoring it can make the CI/CD processes unstable and unpredictable.&lt;br&gt;
Next time we will apply this knowledge to get a new version of CI/CD.&lt;/p&gt;

&lt;p&gt;To be continued...&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>mlops</category>
      <category>cicd</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Configuring python linting to be part of CI/CD using GitHub actions</title>
      <dc:creator>iamtodor</dc:creator>
      <pubDate>Thu, 15 Sep 2022 06:44:07 +0000</pubDate>
      <link>https://forem.com/freshbooks/configuring-python-linting-to-be-part-of-cicd-using-github-actions-1731</link>
      <guid>https://forem.com/freshbooks/configuring-python-linting-to-be-part-of-cicd-using-github-actions-1731</guid>
      <description>&lt;p&gt;Hello everyone, I am a DataOps Engineer at &lt;a href="https://www.freshbooks.com/" rel="noopener noreferrer"&gt;FreshBooks&lt;/a&gt;. In this article I would like to share my experience on configuration best practices for GitHub actions pipelines for linting.&lt;/p&gt;

&lt;p&gt;Freshbooks DataOps team has a linter configuration that developers can run before submitting a PR. We had an idea to integrate lint checks into our regular CI/CD pipeline. This adoption would eliminate potential errors, bugs, stylistic errors. We will basically enforce the common code style across the team.&lt;/p&gt;

&lt;p&gt;FreshBooks uses GitHub as a home for our code base, so we would like to use it as much as possible. Recently I finished this configuration so the linter and its checks are now part of a GitHub actions CI/CD workflow.&lt;/p&gt;

&lt;p&gt;This article has two major parts: the first one is linter configuration, and the second one is GitHub workflow configuration itself. Feel free to read all the parts, or skip some and jump into specific one you are interested in.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Linters configuration

&lt;ul&gt;
&lt;li&gt;Disable unwanted checks&lt;/li&gt;
&lt;li&gt;Documentation&lt;/li&gt;
&lt;li&gt;Import error&lt;/li&gt;
&lt;li&gt;Tweaks for airflow code&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

GitHub workflow actions CI/CD configurations

&lt;ul&gt;
&lt;li&gt;When to run it&lt;/li&gt;
&lt;li&gt;What files does it run against&lt;/li&gt;
&lt;li&gt;Run linter itself&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Conclusion&lt;/li&gt;

&lt;li&gt;Contact&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Linters configuration
&lt;/h2&gt;

&lt;p&gt;Here are the linters and checks we are going to use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://flake8.pycqa.org/en/latest/" rel="noopener noreferrer"&gt;flake8&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://flakeheaven.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;flakeheaven&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/psf/black" rel="noopener noreferrer"&gt;black&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/PyCQA/isort" rel="noopener noreferrer"&gt;isort&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: author assumes you are familiar with the above-mentioned linters, tools, and checks.&lt;/p&gt;

&lt;p&gt;I would like to share how to configure them for the python project. I prepared a full &lt;a href="https://github.com/iamtodor/demo-github-actions-python-linter-configuration" rel="noopener noreferrer"&gt;github actions python configuration demo repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We use &lt;code&gt;flakeheaven&lt;/code&gt; as a &lt;code&gt;flake8&lt;/code&gt; wrapper, which is very easy to configure in one single &lt;code&gt;pyproject.toml&lt;/code&gt;. The whole &lt;code&gt;pyproject.toml&lt;/code&gt; configuration file can be found in&lt;br&gt;
a &lt;a href="https://github.com/iamtodor/demo-github-actions-python-linter-configuration/blob/main/pyproject.toml" rel="noopener noreferrer"&gt;demo repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fflakeheaven-pyproject-config.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fflakeheaven-pyproject-config.png%3Fraw%3Dtrue" alt="pyproject.toml"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I would say the config file is self-explainable, so I will not stop here for long. Just a few notes about tiny tweaks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Disable unwanted checks
&lt;/h3&gt;

&lt;p&gt;A few checks that we don't want to see complaints about:&lt;/p&gt;

&lt;h4&gt;
  
  
  Documentation
&lt;/h4&gt;

&lt;p&gt;The default &lt;code&gt;flakeheaven&lt;/code&gt; configuration assumes every component is documented.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&amp;gt;&amp;gt;&amp;gt; python -m flakeheaven lint utils.py

utils.py
     1:   1 C0114 Missing module docstring (missing-module-docstring) [pylint]
  def custom_sum(first: int, second: int) -&amp;gt; int:
  ^
     1:   1 C0116 Missing function or method docstring (missing-function-docstring) [pylint]
  def custom_sum(first: int, second: int) -&amp;gt; int:
  ^
     5:   1 C0116 Missing function or method docstring (missing-function-docstring) [pylint]
  def custom_multiplication(first: int, second: int) -&amp;gt; int:


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We are ok if not every module will be documented. We are also ok if not every function or method will be documented. We are not going to push documentation for documentation's sake. So we want to disable &lt;code&gt;C0114&lt;/code&gt; and &lt;code&gt;C0116&lt;/code&gt; checks from pylint.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fflakeheaven-disable-docs.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fflakeheaven-disable-docs.png%3Fraw%3Dtrue" alt="flakeheaven disable docs"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Import error
&lt;/h4&gt;

&lt;p&gt;Our linter requirements live in a separate file and we don't aim to mix it with our main production requirements. Hence, linter would complain about import libraries as linter env does not have production libraries, quite obvious.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&amp;gt;&amp;gt;&amp;gt; python -m flakeheaven lint . 

dags/dummy.py
     3:   1 E0401 Unable to import 'airflow' (import-error) [pylint]
  from airflow import DAG
  ^
     4:   1 E0401 Unable to import 'airflow.operators.dummy_operator' (import-error) [pylint]
  from airflow.operators.dummy_operator import DummyOperator
  ^


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;So we need to disable &lt;code&gt;E0401&lt;/code&gt; check from &lt;code&gt;pylint&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fflakeheaven-disable-import-checks.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fflakeheaven-disable-import-checks.png%3Fraw%3Dtrue" alt="flakeheaven disable import checks"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We assume that the developer who writes the code and imports the libs is responsible for writing reliable tests. So if the test does not pass it means that it's something with the import or code (logic) itself. Thus, the import check is not something we would like to put as a linter job.&lt;/p&gt;

&lt;p&gt;Also, there is another possible solution to disable this check by including &lt;code&gt;# noqa: E0401&lt;/code&gt; after the import statement. &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;  &lt;span class="c1"&gt;# noqa: E0401
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.operators.dummy_operator&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DummyOperator&lt;/span&gt;  &lt;span class="c1"&gt;# noqa: E0401
&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h4&gt;
  
  
  Tweaks for airflow code
&lt;/h4&gt;

&lt;p&gt;To configure code for Airflow DAGs there are also a few tweaks. Here is the dummy example &lt;code&gt;dummy.py&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fpython-airflow-tasks-order.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fpython-airflow-tasks-order.png%3Fraw%3Dtrue" alt="python dummy DAG"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we run &lt;code&gt;flakeheaven&lt;/code&gt; with the default configuration we would see the following error:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&amp;gt;&amp;gt;&amp;gt; python -m flakeheaven lint .                                                       

dags/dummy.py
    17:   9 W503 line break before binary operator [pycodestyle]
  &amp;gt;&amp;gt; dummy_operator_2
  ^
    18:   9 W503 line break before binary operator [pycodestyle]
  &amp;gt;&amp;gt; dummy_operator_3
  ^
    19:   9 W503 line break before binary operator [pycodestyle]
  &amp;gt;&amp;gt; [dummy_operator_4, dummy_operator_5, dummy_operator_6, dummy_operator_7]
  ^


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;However, we want to keep each task specified in a new line, hence we need to disable &lt;code&gt;W503&lt;/code&gt; from pycodestyle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fflakeheaven-diable-line-break-W503.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fflakeheaven-diable-line-break-W503.png%3Fraw%3Dtrue" alt="disable W503"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, with the default configuration we would get the next warning:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&amp;gt;&amp;gt;&amp;gt; python -m flakeheaven lint .                                                       

dags/dummy.py
    15:   5 W0104 Statement seems to have no effect (pointless-statement) [pylint]
  (
  ^


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This is about how we specify task order. The workaround here is to exclude &lt;code&gt;W0104&lt;/code&gt; from pylint.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fflakeheaven-disable-statement-no-effect-W0104.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fflakeheaven-disable-statement-no-effect-W0104.png%3Fraw%3Dtrue" alt="disable W0104"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;More info about rules could be found on &lt;a href="https://www.flake8rules.com/" rel="noopener noreferrer"&gt;flake8 rules page&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub workflow actions CI/CD configurations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: author assumes you are familiar with &lt;a href="https://github.com/features/actions" rel="noopener noreferrer"&gt;GitHub actions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We configure GitHub Workflow to be triggered on every PR against the main (master) branch.&lt;/p&gt;

&lt;p&gt;The whole &lt;code&gt;py_linter.yml&lt;/code&gt; config can be found in a &lt;a href="https://github.com/iamtodor/demo-github-actions-python-linter-configuration/blob/main/.github/workflows/py_linter.yml" rel="noopener noreferrer"&gt;demo repo&lt;/a&gt;. I will walk you through it step by step.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-full-v3.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-full-v3.png%3Fraw%3Dtrue" alt="py_linter.yml"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  When to run it
&lt;/h3&gt;

&lt;p&gt;We are interested in running linter only when a PR has &lt;code&gt;.py&lt;/code&gt; files. For instance, when we update &lt;code&gt;README.md&lt;/code&gt; there is no sense in running a python linter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-py-push-pr.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-py-push-pr.png%3Fraw%3Dtrue" alt="configure run workflow on PRs and push"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What files does it run against
&lt;/h3&gt;

&lt;p&gt;We are interested in running a linter only against the modified files. Let's say, we take a look at the provided repo, if I update &lt;code&gt;dags/dummy.py&lt;/code&gt; I don't want to waste time and resources running the linter against &lt;code&gt;main.py&lt;/code&gt;. For this purpose we use &lt;a href="https://github.com/dorny/paths-filter" rel="noopener noreferrer"&gt;Paths Filter GitHub Action&lt;/a&gt;, which is very flexible.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-paths-filter-v2.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-paths-filter-v2.png%3Fraw%3Dtrue" alt="Paths Filter GitHub Action"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we have modified a &lt;code&gt;.py&lt;/code&gt; file and any other files such as &lt;code&gt;.toml&lt;/code&gt; in one PR, we don't want to run a linter against the non-python files, so we configure filtering only for &lt;code&gt;.py&lt;/code&gt; files no matter the location: root, tests, src, etc.&lt;/p&gt;

&lt;p&gt;The changed file can have the following statuses: &lt;code&gt;added&lt;/code&gt;, &lt;code&gt;modified&lt;/code&gt;, or &lt;code&gt;deleted&lt;/code&gt;. There is no reason to run the linter against deleted files as your workflow would simply fail, because that particular changed file is no longer in the repo. So we need to configure what changes we consider triggering the linter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-paths-filter-modified-v2.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-paths-filter-modified-v2.png%3Fraw%3Dtrue" alt="added|modified"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I define the variable where I can find the output (only the &lt;code&gt;.py&lt;/code&gt; files) from the previous filter. This variable would contain modified &lt;code&gt;.py&lt;/code&gt; files that I can further pass to a &lt;code&gt;flakeheaven&lt;/code&gt;, &lt;code&gt;black&lt;/code&gt;, and &lt;code&gt;isort&lt;/code&gt;. By default, the output is disabled and "Paths Changes Filter" allows you to customize it: you can list the files in &lt;code&gt;.csv&lt;/code&gt;, &lt;code&gt;.json&lt;/code&gt;, or in a &lt;code&gt;shell&lt;/code&gt; mode. Linters accept files separated simply by space, so our choice here is &lt;code&gt;shell&lt;/code&gt; mode.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-paths-filter-list-shell.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-paths-filter-list-shell.png%3Fraw%3Dtrue" alt="list files shell"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Run linter itself
&lt;/h3&gt;

&lt;p&gt;The next and last step is to run the linter itself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-run-linter-step.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-run-linter-step.png%3Fraw%3Dtrue" alt="run linter step"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before we run the linter on changed files we run a check to see if there are actual changes in &lt;code&gt;.py&lt;/code&gt; files by checking if there are any &lt;code&gt;.py&lt;/code&gt; files from the previous step.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-run-linter-check-for-changes.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-run-linter-check-for-changes.png%3Fraw%3Dtrue" alt="check if there are .py files"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, using the before-mentioned output variable we can safety pass the content from this &lt;code&gt;steps.filter.outputs.py_scripts_filter_files&lt;/code&gt; variable to linter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-run-linter-commands.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Fgh-config-run-linter-commands.png%3Fraw%3Dtrue" alt="linter commands"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;That's all I would like to share. I hope it is useful for you, and that you can utilize this experience and knowledge. &lt;/p&gt;

&lt;p&gt;I wish you to see these successful checks every time you push your code :)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Flinter-success.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fiamtodor%2Fdemo-github-actions-python-linter-configuration%2Fblob%2Fmain%2Farticle%2Fimg%2Flinter-success.png%3Fraw%3Dtrue" alt="success linter"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you have any questions feel free to ask in a comment section, I will do my best to provide a comprehensive answer for you. &lt;/p&gt;

&lt;p&gt;Question to you: do you have linter checks as a part of your CI/CD?&lt;/p&gt;

&lt;h2&gt;
  
  
  Contact
&lt;/h2&gt;

&lt;p&gt;If you found this article helpful, I invite you to connect with me on &lt;a href="https://www.linkedin.com/in/iamtodor/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;. I am always looking to expand my network and connect with like-minded individuals in the data industry. Additionally, you can also reach out to me for any questions or feedback on the article. I'd be more than happy to engage in a conversation and help out in any way I can. So don’t hesitate to contact me, and let’s connect and learn together.&lt;/p&gt;

</description>
      <category>python</category>
      <category>github</category>
      <category>cicd</category>
      <category>linter</category>
    </item>
    <item>
      <title>Building CI/CD for Vertex AI pipelines: The first solution</title>
      <dc:creator>Oleksandr Borodavka</dc:creator>
      <pubDate>Thu, 07 Jul 2022 11:21:54 +0000</pubDate>
      <link>https://forem.com/freshbooks/building-cicd-for-vertex-ai-pipelines-the-first-solution-1mp5</link>
      <guid>https://forem.com/freshbooks/building-cicd-for-vertex-ai-pipelines-the-first-solution-1mp5</guid>
      <description>&lt;p&gt;Hey everyone, I am a Senior Data Engineer (MLOps) at &lt;a href="https://www.freshbooks.com/" rel="noopener noreferrer"&gt;FreshBooks&lt;/a&gt;. We are currently working on transitioning to fully automated end-to-end ML pipelines. And I want to share some ideas, solutions, and challenges we are dealing with during this journey.&lt;/p&gt;

&lt;h2&gt;
  
  
  A starting point
&lt;/h2&gt;

&lt;p&gt;We are already building training pipelines with &lt;a href="https://cloud.google.com/vertex-ai" rel="noopener noreferrer"&gt;Vertex AI&lt;/a&gt;.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72a1n38f4wnpvaks6ymq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72a1n38f4wnpvaks6ymq.png" alt="Example of Vertex AI pipeline from the GC blog article"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Example of Vertex AI pipeline from the GC blog &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/use-vertex-pipelines-build-automl-classification-end-end-workflow" rel="noopener noreferrer"&gt;article&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That works pretty well. We can include into our pipelines data preparation steps, train several models and choose the best one, dump metadata, send notifications, etc. The Vertex AI pipelines are great themselves, but most of the processes during development and maintaining them are manual.&lt;br&gt;&lt;br&gt;
What's the issue here? Say we already have not just a couple of pipelines but a couple dozen. We probably already have a list of our favourite components which are used in most pipelines and say a bug was fixed in one of them. In this scenario we would need to rebuild all the related pipelines, check them locally, then deploy and check them in a real environment. What if we forgot to rebuild something? Or part of a pipeline was broken after the update? Things become trickier... In addition, all these routine operations take some time.&lt;/p&gt;

&lt;h2&gt;
  
  
  How can we do better?
&lt;/h2&gt;

&lt;p&gt;Okay, we definitely need some automation here.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52hexskyyahrn8vbmebp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52hexskyyahrn8vbmebp.png" alt="Illustration of CI/CD"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Illustration of CI/CD from &lt;a href="https://www.synopsys.com/glossary/what-is-cicd.html" rel="noopener noreferrer"&gt;synopsys&lt;/a&gt;&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.redhat.com/en/topics/devops/what-is-ci-cd" rel="noopener noreferrer"&gt;CI/CD&lt;/a&gt; practices are widely used in typical software development. And we can use it in our case as well.&lt;br&gt;
To start doing it we need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some CI/CD tool, to build and run all the processes. Yeah, we can write some custom scripts for this goal, but there are a lot of stable solutions that we can use.&lt;/li&gt;
&lt;li&gt;Automated tests, to check if everything works fine. Spreading changes without quality control is probably not a good idea.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  GitHub Actions
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.github.com/en/actions" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt; was a good choice for us since we already store all sources in GitHub, making it easy to start working with. There is secret storage and a &lt;a href="https://github.com/marketplace?type=actions" rel="noopener noreferrer"&gt;marketplace&lt;/a&gt; with prebuilt actions. You do not need to set up any infrastructure, everything is provided by GitHub for free (with limitations). Alongside that is a mature modern tool that provides many ways of customization, like self-hosted runners and reusable workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tests
&lt;/h3&gt;

&lt;p&gt;What about tests? Let's firstly answer what we want to test. So, a Vertex AI pipeline is a set of components organized as a DAG. Each component can be presented as a small containerized application. We can test such applications as any other software with Unit tests. Also, we can build a simple isolated pipeline(s) just for one target component to be sure it is configured and built well, and works in the Vertex AI environment. These can be our integration tests for components. &lt;br&gt;
For the pipeline itself, we can add some components inside, to do ML-specific things like model validation. When we run the pipeline in a test environment and all stages of the pipeline are successfully done we can consider it works properly. It will be the integration tests for the pipelines.&lt;/p&gt;

&lt;h3&gt;
  
  
  Repository structure
&lt;/h3&gt;

&lt;p&gt;There is one more thing we should touch on before writing the CI/CD scripts and that is the organization of code in our repository. &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

pipelines/
  pipeline1.yaml
  ...
components/
  component1/
    src/
      ...
    tests/
      unit/
          ...
      integration/
          test.yaml
    config.yaml


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;There is a folder with pipeline specifications and a folder with components. Each component contains source files, tests, and a configuration.&lt;br&gt;
&lt;em&gt;We also built an MLOps Framework to simplify some routines, that is why you see &lt;code&gt;yaml&lt;/code&gt; files instead of the usual &lt;code&gt;json&lt;/code&gt;. We probably will publish another article about it.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  CI/CD workflows
&lt;/h2&gt;

&lt;p&gt;Looks like everything is ready to bring GitHub Actions to the scene.&lt;/p&gt;

&lt;p&gt;To build CI/CD in our case we need to be able:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run unit tests for a component&lt;/li&gt;
&lt;li&gt;build a component&lt;/li&gt;
&lt;li&gt;build a pipeline&lt;/li&gt;
&lt;li&gt;run pipelines in different environments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In order to not copy-paste almost the same code, we can use the &lt;a href="https://docs.github.com/en/actions/using-workflows/reusing-workflows" rel="noopener noreferrer"&gt;reusable workflows&lt;/a&gt; feature of GHA.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;test_component.yml&lt;/code&gt; - runs unit tests for a component specified by input parameters&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Test component&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# the section defines the workflow as reusable and describes the input parameters&lt;/span&gt;
  &lt;span class="na"&gt;workflow_call&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;                              
    &lt;span class="na"&gt;inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;tests_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# run the job on a fresh ubuntu instance&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Run&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tests&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;component"&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# get the repo to the instance&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="c1"&gt;# install requirements for the framework&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install python requirements&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;make install&lt;/span&gt;
      &lt;span class="c1"&gt;# tun tests using the path to the tests from input parameters&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run tests&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;make test ${{ inputs.tests_path }}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;code&gt;build_component.yml&lt;/code&gt; - builds a component specified by input parameters&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build component&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# the section defines the workflow as reusable and describes the input parameters&lt;/span&gt;
  &lt;span class="na"&gt;workflow_call&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;component_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;WORKLOAD_IDENTITY_PROVIDER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;SERVICE_ACCOUNT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# run the job on a fresh ubuntu instance&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;component"&lt;/span&gt;
    &lt;span class="c1"&gt;# set permissions necessary for google-github-actions/auth&lt;/span&gt;
    &lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read"&lt;/span&gt;
      &lt;span class="na"&gt;id-token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write"&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# get the repo to the instance&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;

      &lt;span class="c1"&gt;# Install docker&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install docker&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;make install_docker&lt;/span&gt;

      &lt;span class="c1"&gt;# auth to GCP with workload_identity&lt;/span&gt;
      &lt;span class="c1"&gt;# also create credentials_file to use it in the next steps&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth"&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authenticate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Google&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Cloud"&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google-github-actions/auth@v0"&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;token_format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;access_token"&lt;/span&gt;
          &lt;span class="na"&gt;workload_identity_provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.WORKLOAD_IDENTITY_PROVIDER }}&lt;/span&gt;
          &lt;span class="na"&gt;service_account&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.SERVICE_ACCOUNT }}&lt;/span&gt;
          &lt;span class="na"&gt;create_credentials_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
          &lt;span class="na"&gt;export_environment_variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="c1"&gt;# login to GCR with the token from the previous step&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Login to GCR&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/login-action@v2&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;registry&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcr.io&lt;/span&gt;
          &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;oauth2accesstoken&lt;/span&gt;
          &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.auth.outputs.access_token }}&lt;/span&gt;

      &lt;span class="c1"&gt;# install python requirements&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install requirements&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;make install&lt;/span&gt;
      &lt;span class="c1"&gt;# run build component command with input parameters&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build the component&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;make build component ${{ inputs.component_name }}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Since we &lt;a href="https://www.kubeflow.org/docs/components/pipelines/sdk/component-development/#containerize-your-components-code" rel="noopener noreferrer"&gt;build components as docker images&lt;/a&gt; we have to install docker on the runner and login to GCR. All necessary credentials are stored in the repository secrets.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;run_pipeline.yml&lt;/code&gt; - run a pipeline specified by input parameters&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run pipeline&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# the section defines the workflow as reusable and describes the input parameters&lt;/span&gt;
  &lt;span class="na"&gt;workflow_call&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;WORKLOAD_IDENTITY_PROVIDER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;SERVICE_ACCOUNT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# run the job on a fresh ubuntu instance&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Run&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pipeline"&lt;/span&gt;
    &lt;span class="c1"&gt;# set permissions necessary for google-github-actions/auth&lt;/span&gt;
    &lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read"&lt;/span&gt;
      &lt;span class="na"&gt;id-token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write"&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# get the repo to the instance&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;

      &lt;span class="c1"&gt;# auth to GCP with workload_identity&lt;/span&gt;
      &lt;span class="c1"&gt;# also create credentials_file and export it to environment variables in order to use it in the next steps&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth"&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authenticate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Google&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Cloud"&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google-github-actions/auth@v0"&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;token_format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;access_token"&lt;/span&gt;
          &lt;span class="na"&gt;workload_identity_provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.WORKLOAD_IDENTITY_PROVIDER }}&lt;/span&gt;
          &lt;span class="na"&gt;service_account&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.SERVICE_ACCOUNT }}&lt;/span&gt;
          &lt;span class="na"&gt;create_credentials_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
          &lt;span class="na"&gt;export_environment_variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

      &lt;span class="c1"&gt;# install python requirements for framework&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install python dependecies&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;make install&lt;/span&gt;
      &lt;span class="c1"&gt;# run "run pipeline" command with input parameters  &lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run pipeline&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;make run ${{ inputs.pipeline_name }}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;code&gt;build_pipeline.yml&lt;/code&gt; looks almost the same so we can skip it and move further.&lt;/p&gt;

&lt;p&gt;Great, we have all building blocks that do the main operations. Now we just need to use them in order to build and run pipelines when changes happen. Let's use &lt;code&gt;pipeline1&lt;/code&gt; and &lt;code&gt;component1&lt;/code&gt; from the example above.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;build_pipeline1.yml&lt;/code&gt; - builds and tests the pipeline1 when it is changed&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build pipeline1&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# the workflow can be run manually&lt;/span&gt;
  &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# the workflow will be run automatically when a PR is opened to the main branch&lt;/span&gt;
  &lt;span class="c1"&gt;# or changes to the PR related to the pipeline are pushed&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;opened&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;synchronize&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;reopened&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pipelines/pipeline1.yaml"&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# call build_pipeline workflow with target parameters&lt;/span&gt;
    &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/workflows/build_pipeline.yml&lt;/span&gt;
    &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pipeline1"&lt;/span&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;WORKLOAD_IDENTITY_PROVIDER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEV_WORKLOAD_IDENTITY_PROVIDER }}&lt;/span&gt;
      &lt;span class="na"&gt;SERVICE_ACCOUNT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEV_SERVICE_ACCOUNT }}&lt;/span&gt;
  &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# call run_pipeline workflow with target parameters&lt;/span&gt;
    &lt;span class="c1"&gt;# to check it in dev env&lt;/span&gt;
    &lt;span class="c1"&gt;# this step requires the previous step to be done successfully&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build&lt;/span&gt;
    &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/workflows/run_pipeline.yml&lt;/span&gt;
    &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pipeline1"&lt;/span&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;WORKLOAD_IDENTITY_PROVIDER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEV_WORKLOAD_IDENTITY_PROVIDER }}&lt;/span&gt;
      &lt;span class="na"&gt;SERVICE_ACCOUNT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEV_SERVICE_ACCOUNT }}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;code&gt;build_component1.yml&lt;/code&gt; - builds (and tests) the component1 component when it is changed and builds the related pipeline after this&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;p&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build component1&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
  &lt;span class="c1"&gt;# the workflow can be run manually&lt;/span&gt;&lt;br&gt;
  &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
  &lt;span class="c1"&gt;# the workflow will be run automatically when a PR is opened to the main branch&lt;/span&gt;&lt;br&gt;
  &lt;span class="c1"&gt;# or changes to the PR related to the component1 are pushed&lt;/span&gt;&lt;br&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;opened&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;synchronize&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;reopened&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;components/component1/**"&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
  &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
    &lt;span class="c1"&gt;# call test_component workflow with target parameters&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/workflows/test_component.yml&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;tests_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;components/component1/tests&lt;/span&gt;&lt;br&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
    &lt;span class="c1"&gt;# call build_component workflow with target parameters&lt;/span&gt;&lt;br&gt;
    &lt;span class="c1"&gt;# this step requires the previous step to be done successfully&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tests&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/workflows/build_component.yml&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;component_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;component1"&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;WORKLOAD_IDENTITY_PROVIDER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEV_WORKLOAD_IDENTITY_PROVIDER }}&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;SERVICE_ACCOUNT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEV_SERVICE_ACCOUNT }}&lt;/span&gt;&lt;br&gt;
  &lt;span class="na"&gt;integration-tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
    &lt;span class="c1"&gt;# call run_pipeline workflow with target parameters&lt;/span&gt;&lt;br&gt;
    &lt;span class="c1"&gt;# to check the component in dev env&lt;/span&gt;&lt;br&gt;
    &lt;span class="c1"&gt;# this step requires the previous step to be done successfully&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/workflows/run_pipeline.yml&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;component1-test"&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;WORKLOAD_IDENTITY_PROVIDER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEV_WORKLOAD_IDENTITY_PROVIDER }}&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;SERVICE_ACCOUNT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEV_SERVICE_ACCOUNT }}&lt;/span&gt;&lt;br&gt;
  &lt;span class="na"&gt;build-pipeline1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
    &lt;span class="c1"&gt;# call build_pipeline workflow with target parameters&lt;/span&gt;&lt;br&gt;
    &lt;span class="c1"&gt;# this step requires the previous step to be done successfully&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;integration-tests&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/workflows/build_pipeline.yml&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pipeline1"&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;WORKLOAD_IDENTITY_PROVIDER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEV_WORKLOAD_IDENTITY_PROVIDER }}&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;SERVICE_ACCOUNT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEV_SERVICE_ACCOUNT }}&lt;/span&gt;&lt;br&gt;
  &lt;span class="na"&gt;test-pipeline1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
    &lt;span class="c1"&gt;# call run_pipeline workflow with target parameters&lt;/span&gt;&lt;br&gt;
    &lt;span class="c1"&gt;# to check it in dev env&lt;/span&gt;&lt;br&gt;
    &lt;span class="c1"&gt;# this step requires the previous step to be done successfully&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build-pipeline1&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/workflows/run_pipeline.yml&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;pipeline_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pipeline1"&lt;/span&gt;&lt;br&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;WORKLOAD_IDENTITY_PROVIDER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEV_WORKLOAD_IDENTITY_PROVIDER }}&lt;/span&gt;&lt;br&gt;
      &lt;span class="na"&gt;SERVICE_ACCOUNT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEV_SERVICE_ACCOUNT }}&lt;/span&gt;&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Conclusion&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;That is it. When changes come the related workflows will be run by GHA. Tests run, components and pipelines rebuild and if everything goes well we will be able to merge the Pull Request and update these changes to Production by running these workflows manually or adding another step to trigger it automatically.&lt;/p&gt;

&lt;p&gt;But, are we happy now?&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhm98wefjintro03ujqs5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhm98wefjintro03ujqs5.png" alt="Futurama Fry / Not Sure If | Know Your Meme"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>mlops</category>
      <category>cicd</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Automatically Send Invoices With WhatsApp</title>
      <dc:creator>lygel</dc:creator>
      <pubDate>Mon, 25 Apr 2022 16:35:25 +0000</pubDate>
      <link>https://forem.com/freshbooks/automatically-send-invoices-with-whatsapp-fii</link>
      <guid>https://forem.com/freshbooks/automatically-send-invoices-with-whatsapp-fii</guid>
      <description>&lt;p&gt;In this tutorial we look at how you can create a shareable link when an Invoice is created in FreshBooks. Then send this link to your client over Whatsapp. So that the client can view the invoice immediately on his/her mobile. The same concept can be applied for checkout links, expenses and more.&lt;/p&gt;

&lt;h4&gt;
  
  
  Prerequisites
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;A FreshBooks Developer account.&lt;/li&gt;
&lt;li&gt;A &lt;a href="https://www.twilio.com/docs/whatsapp/sandbox" rel="noopener noreferrer"&gt;Twilio Sandbox account&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Basic knowledge of Async, Await and Node.js.&lt;/li&gt;
&lt;li&gt;A code editor (e.g. VS Code, Sublime, Atom etc.)&lt;/li&gt;
&lt;li&gt;How to generate a bearer token in Postman&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Setup your express app locally
&lt;/h4&gt;

&lt;p&gt;First we setup our express app, which listens on port 3000 and has a uri available at '/webhooks/ready'&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt; 
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urlencoded&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;extended&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;function &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Hello World 1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,()&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Get Body &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/webhooks/ready&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nf"&gt;function &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Thanks for your business POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,()&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`POST Body &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;invoice.create&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;invoice.update&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
      &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;object_id&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  
      &lt;span class="nf"&gt;sendShareLink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;object_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;,()&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;listening on port 3000&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h4&gt;
  
  
  Create a public web server
&lt;/h4&gt;

&lt;p&gt;I am making use of ‘ngrok’ to create a publicly accessible web server. You can download ngrok using this &lt;a href="https://ngrok.com/download" rel="noopener noreferrer"&gt;link&lt;/a&gt;. Once you have installed ngrok, you can start ngrok and expose your local web server. Don’t forget to make note of your https url provided by ngrok, we will use this to register a webhook. ngrok will relay our calls to our localhost server at port 3000&lt;/p&gt;
&lt;h4&gt;
  
  
  Register for webhooks
&lt;/h4&gt;

&lt;p&gt;FreshBooks need to notify our application on invoice creation. To get notified, we need to register and listen to the &lt;a href="https://www.freshbooks.com/api/webhooks" rel="noopener noreferrer"&gt;FreshBooks webhook&lt;/a&gt; for the event 'invoice.created'. Register for webhooks using the URI generated earlier using ngrok e.g. &lt;code&gt;https://d7b0-213-127-111-74.ngrok.io&lt;/code&gt; . This part has not yet been built into the application at time of writing. For now we do this using &lt;a href="https://documenter.getpostman.com/view/3322108/S1ERwwza#fb78c21c-8c52-4acf-baa5-b10c7799415e" rel="noopener noreferrer"&gt;postman&lt;/a&gt;. You can use ngrok inspect to get the verifier code for the webhook. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can inspect calls coming in via a web UI provided by ngrok. After you start the ngrok agent, open &lt;a href="http://localhost:4040/" rel="noopener noreferrer"&gt;http://localhost:4040&lt;/a&gt; in a browser on the same machine&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4&gt;
  
  
  Getting a shareable link and client contact
&lt;/h4&gt;

&lt;p&gt;We first generate a FreshBooks client to interact using the &lt;a href="https://dev.to/freshbooks/getting-started-with-freshbooks-nodejs-sdk-expenses-invoices-a6"&gt;FreshBooks nodeJs sdk&lt;/a&gt;. We initialise the client with the clientID of our app and the bearer token which we provided using env variables.&lt;/p&gt;

&lt;p&gt;When you generate an invoice using the FreshBooks UI, it triggers a webhook call to our previously registered link. When our app receives this api call we retrieve the invoice id. The invoice id is then used to generate an invoice link using the FreshBooks Client.&lt;/p&gt;

&lt;p&gt;To create a shareable invoice link we use the nodejs sdk, we use the get shareable link api to get an invoice link against the invoice id. Additionally we also retrieve the mobile number of the client.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;postWhatsapp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./postWhatsapp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;clientId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CLIENTID&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;TOKEN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;accountId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;invoiceId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;



&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;accountId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;invoiceId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Client&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@freshbooks/api&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clientId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;shareLink&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shareLink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;accountId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;invoiceId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;invoiceInfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;single&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;accountId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;invoiceId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;single&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;accountId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;invoiceInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   


        &lt;span class="nf"&gt;postWhatsapp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;shareLink&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;shareLink&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mobPhone&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="p"&gt;};&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h4&gt;
  
  
  Sending your invoice over whatsapp
&lt;/h4&gt;

&lt;p&gt;Once we have a shareable link, we use the Twilio SDK to initialise a client using our ‘Twilio SID’ and ‘Auth Token’.  Using this twilio client we send a whatsapp message which includes the shareable link to the invoice.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;twilio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;twilio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;accountSid&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ACCSID&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;authToken&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AUTHTOK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;twilio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="nx"&gt;accountSid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;authToken&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;shareLink&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;mobNo&lt;/span&gt;

&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;shareLink&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;mobNo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt; 
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; 
     &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Here is your share link &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;shareLink&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
     &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;whatsapp:+14155238886&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       
     &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`whatsapp:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;mobNo&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; 
   &lt;span class="p"&gt;})&lt;/span&gt; 
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sid&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;done&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;If your looking for more information on the Twilio whatsapp api, you can check this &lt;a href="https://www.twilio.com/docs/whatsapp/quickstart/node" rel="noopener noreferrer"&gt;link&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now whenever you create an invoice for a client, your server receives a notice, gets the share link, and sends it to them via WhatsApp.&lt;/p&gt;

&lt;p&gt;You can checkout the entire code at my personal repo &lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/lygel07" rel="noopener noreferrer"&gt;
        lygel07
      &lt;/a&gt; / &lt;a href="https://github.com/lygel07/freshbooks-whatsapp-link" rel="noopener noreferrer"&gt;
        freshbooks-whatsapp-link
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      
    &lt;/h3&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>javascript</category>
      <category>node</category>
      <category>webhooks</category>
    </item>
    <item>
      <title>API Client Design Across Languages - Part 2 - Making Requests</title>
      <dc:creator>Andrew McIntosh</dc:creator>
      <pubDate>Tue, 29 Mar 2022 18:53:59 +0000</pubDate>
      <link>https://forem.com/freshbooks/api-client-design-across-languages-part-2-making-requests-4i28</link>
      <guid>https://forem.com/freshbooks/api-client-design-across-languages-part-2-making-requests-4i28</guid>
      <description>&lt;p&gt;It's been a while since my last post (&lt;a href="https://dev.to/freshbooks/api-client-design-across-languages-part-1-5hmk"&gt;API Client Design Across Languages - Part 1&lt;/a&gt;), but life and work have gotten in the way. Regardless, I'm am finally continuing my dive into how API clients can differ in style and usage across languages while still maintaining the same features.&lt;/p&gt;

&lt;p&gt;The first post focused on the basic structure of different API clients. In this post I will go into how a client may make requests against the API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Request Libraries
&lt;/h2&gt;

&lt;p&gt;Languages vary in how well supported making HTTP requests is in their core implementations or standard libraries. Almost by definition, languages used on the web generally have easy ways to make HTTP requests. However, there are often dedicated request libraries that can make this simpler or cleaner, and in general I recommend their use unless the language has very clear and clean support.&lt;/p&gt;

&lt;p&gt;There are reasons to not want to include a request library in an SDK as every additional dependency added is one that developers using it will have to add as well. Keeping a small dependency graph also makes it easier to maintain updates. But, letting a library do the low-level work is also good for maintenance and security, and often someone looking to use your SDK will already be using a request library.&lt;/p&gt;

&lt;p&gt;I'll show some examples of the choices FreshBooks made for different languages below.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sane Request Defaults
&lt;/h2&gt;

&lt;p&gt;Both to make it easier for developers to use our SDKs, and to make our own lives easier in supporting them, we set some defaults for HTTP requests.&lt;/p&gt;

&lt;p&gt;Timeouts are especially important. If a request takes too long, it may can impact both the user (slowing the response to their customer) and us (if our servers get saturated by slow requests, we stop service our customers). Most HTTP clients have easily set timeouts but often they are not enabled by default.&lt;/p&gt;

&lt;p&gt;Setting user-agent strings is also helpful. Including things like the SDK language and version helps FreshBooks figure our SDK usage and what languages are popular with our developers. Of course we let users over-ride the user-agent if desired. It can also help an API support team track down a reported error if the client has a unique user-agent string.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interface
&lt;/h2&gt;

&lt;p&gt;The SDK should try to be as consistent and intuitive as possible, especially if the API itself is a fairly CRUD-heavy RESTful API versus one with more unique behaviours around resources.&lt;/p&gt;

&lt;p&gt;Try to make to keep things standardized like resorce pluralization (eg. &lt;code&gt;clients&lt;/code&gt;, &lt;code&gt;invoices&lt;/code&gt; vs. &lt;code&gt;client&lt;/code&gt;, &lt;code&gt;invoices&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Try to keep method signature variables in a similar order. For instance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;versus&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The more standard and intuitive the SDK is, the easier it is for developers and the fewer support tickets.&lt;/p&gt;

&lt;h2&gt;
  
  
  FreshBook's SDKs
&lt;/h2&gt;

&lt;p&gt;Like last time, I'll show some examples of how FreshBook's SDKs are built in a few different languages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python
&lt;/h3&gt;

&lt;p&gt;In python we're using the &lt;a href="https://docs.python-requests.org/en/latest/"&gt;requests&lt;/a&gt; library for simplicity. Requests is widely used (see the &lt;a href="https://github.com/stripe/stripe-python/blob/master/setup.py#L34"&gt;Stripe&lt;/a&gt; and &lt;a href="https://github.com/auth0/auth0-python/blob/master/setup.py#L30"&gt;Auth0&lt;/a&gt; SDKs) so it isn't too onerous of a requirement. In fact, FreshBooks' Python SDK is &lt;a href="https://github.com/freshbooks/freshbooks-python-sdk/blob/release/1.0.0/requirements.txt"&gt;very light on dependencies&lt;/a&gt; in general.&lt;/p&gt;

&lt;p&gt;You can see where we &lt;a href="https://github.com/freshbooks/freshbooks-python-sdk/blob/release/1.0.0/freshbooks/api/resource.py#L30"&gt;instantiate a session&lt;/a&gt; (to allow for retries), and make the &lt;a href="https://github.com/freshbooks/freshbooks-python-sdk/blob/release/1.0.0/freshbooks/api/resource.py#L57-L83"&gt;HTTP requests&lt;/a&gt;). In the client we can instantiate the &lt;a href="https://github.com/freshbooks/freshbooks-python-sdk/blob/release/1.0.0/freshbooks/api/projects.py#L69-L83"&gt;shared client code&lt;/a&gt; for each &lt;a href="https://github.com/freshbooks/freshbooks-python-sdk/blob/release/1.0.0/freshbooks/client.py#L363-L365"&gt;different resource&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Usage looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;invoice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;freshBooksClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;invoice_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;clients&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;freshBooksClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;organization&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"FreshBooks"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Node.js
&lt;/h3&gt;

&lt;p&gt;Like Python, our Node.js SDK is using a well-known library &lt;a href="https://axios-http.com/"&gt;axios&lt;/a&gt;. While it is not quite as ubiquitous as Python's requests, it is very commonly used. For instance, it is used by &lt;a href="https://github.com/auth0/node-auth0/blob/v2.40.0/package.json#L37"&gt;Auth0&lt;/a&gt; (if you're looking for a different example, &lt;a href="https://github.com/MONEI/Shopify-api-node/blob/3.8.2/package.json#L19"&gt;Shopify&lt;/a&gt; makes use of &lt;a href="https://github.com/sindresorhus/got"&gt;Got&lt;/a&gt;). You can find it &lt;a href="https://github.com/freshbooks/freshbooks-nodejs-sdk/blob/%40freshbooks/api%403.0.0/packages/api/src/APIClient.ts#L169-L177"&gt;configured here&lt;/a&gt;. The &lt;a href="https://github.com/freshbooks/freshbooks-nodejs-sdk/blob/%40freshbooks/api%403.0.0/packages/api/src/APIClient.ts#L264-L292"&gt;shared client code&lt;/a&gt; takes reqeust and response transform functions for &lt;a href="https://github.com/freshbooks/freshbooks-nodejs-sdk/blob/%40freshbooks/api%403.0.0/packages/api/src/APIClient.ts#L350-L360"&gt;each resource&lt;/a&gt; to convert the repsonses to objects.&lt;/p&gt;

&lt;h3&gt;
  
  
  PHP
&lt;/h3&gt;

&lt;p&gt;Like Node.js, the PHP ecosystem has quite a number of good HTTP request libraries. &lt;a href="https://github.com/guzzle/guzzle"&gt;Guzzle&lt;/a&gt; is perhaps one of the most well known, but there are &lt;a href="https://github.com/WordPress/Requests"&gt;many&lt;/a&gt; &lt;a href="https://github.com/php-http/httplug"&gt;other&lt;/a&gt; &lt;a href="https://github.com/nategood/httpful"&gt;popular&lt;/a&gt; libraries out there. Luckily, PHP also has some interface standards around HTTP clients and messages, particularly &lt;a href="https://www.php-fig.org/psr/psr-7/"&gt;PSR-7&lt;/a&gt;, &lt;a href="https://www.php-fig.org/psr/psr-17/"&gt;PSR-17&lt;/a&gt;, and &lt;a href="https://www.php-fig.org/psr/psr-18/"&gt;PSR-18&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Implementing the FreshBooks SDK to these standards means that we don't force any particular library on developers. They are free to choose any library that implements those standards.&lt;/p&gt;

&lt;p&gt;In our &lt;a href="https://github.com/amcintosh/freshbooks-php-sdk"&gt;README&lt;/a&gt; we provide an example for those who have no particular preference:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Requires a PSR-18 implementation client. If you do not already have a compatible client, you can install one with it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;composer require amcintosh/freshbooks php-http/guzzle7-adapter&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Again, &lt;a href="https://github.com/amcintosh/freshbooks-php-sdk/blob/0.4.0/src/FreshBooksClient.php#L73-L90"&gt;here is the configuration&lt;/a&gt;, and the &lt;a href="https://github.com/amcintosh/freshbooks-php-sdk/blob/0.4.0/src/FreshBooksClient.php#L204-L207"&gt;resources&lt;/a&gt; using &lt;a href="https://github.com/amcintosh/freshbooks-php-sdk/blob/0.4.0/src/Resource/AccountingResource.php#L119-L124"&gt;shared client code&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Usage looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$invoice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$freshBooksClient&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$accountId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$invoiceId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nv"&gt;$clients&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$freshBooksClient&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="k"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$accountId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$clients&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;organization&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// 'FreshBooks'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Up Next
&lt;/h2&gt;

&lt;p&gt;So there you can see a number of different HTTP client options, and examples of how FreshBooks' SDKs utilize them.&lt;/p&gt;

&lt;p&gt;I hope you found something interesting or useful here and I hope you can catch the next post where I plan to go into request data and response structures.&lt;/p&gt;

</description>
      <category>sdk</category>
      <category>python</category>
      <category>php</category>
      <category>node</category>
    </item>
    <item>
      <title>API Client Design Across Languages - Part 1</title>
      <dc:creator>Andrew McIntosh</dc:creator>
      <pubDate>Thu, 11 Nov 2021 17:39:03 +0000</pubDate>
      <link>https://forem.com/freshbooks/api-client-design-across-languages-part-1-5hmk</link>
      <guid>https://forem.com/freshbooks/api-client-design-across-languages-part-1-5hmk</guid>
      <description>&lt;p&gt;In my recent post &lt;a href="https://dev.to/freshbooks/some-best-practices-on-building-an-integration-1gbb"&gt;Some Best Practices On Building An Integration&lt;/a&gt;, I espoused the benefits of &lt;a href="https://dev.to/freshbooks/some-best-practices-on-building-an-integration-1gbb#use-libraries-tools-and-sdks"&gt;using API owner supplied tools and libraries&lt;/a&gt;, and mentioned areas where a well-built SDK hides complexity from, or otherwise makes things easier for, a developer.&lt;/p&gt;

&lt;p&gt;A colleague suggested that it might be useful to present examples of some of these areas to give some pointers for someone who needs to implement that functionality themselves, can't make use of an SDK, or simply for someone looking to build their own API client. So, this is part 1 of a deep dive into functionality in FreshBooks' (and some other API owner's) SDKs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic Structure
&lt;/h2&gt;

&lt;p&gt;This first post won't go too much into functionality as I think it's best to start on structure.&lt;/p&gt;

&lt;p&gt;A RESTful API is language agnostic and clients built any number of languages must all support the same API features and resources. However, the actual design of the client and usage of the client itself can, and probably should, be different language to language. For example, a Ruby client versus a Java client will still call the same API endpoint, but the form of the methods to make that call, and the form of the returned data could look very different.&lt;/p&gt;

&lt;p&gt;I feel it's best to build an API client in a way that is natural to the specific language it's written in. This extends from the project layout, to the client initialization, the method calls themselves, and the returned data. This makes things more intuitive and easy for a developer to use.&lt;/p&gt;

&lt;p&gt;The language influences the design primarily in two ways: language capabilities, and common language conventions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capabilities
&lt;/h3&gt;

&lt;p&gt;By capabilities, I'm talking about language design and features. A statically typed language usually needs a bit more structure than a dynamically typed one. For instance, an API client in a language like PHP or Python could return JSON results as associative arrays (array and dictionary respectively), as you don't have to declare the various return value's types are. It would be difficult to do the same in Java with a HashMap (possible, but it would not be clean), so you're much more likely to build data objects for the responses with all the fields included and nicely typed.&lt;/p&gt;

&lt;p&gt;Other features play in as well. How does the language handle functions with different options? Function overloadings? Optional arguments? These all affect the design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conventions
&lt;/h3&gt;

&lt;p&gt;Beyond what you &lt;em&gt;can&lt;/em&gt; do with a language, there's also what you &lt;em&gt;should&lt;/em&gt; do. You &lt;em&gt;can&lt;/em&gt; write your Python or Ruby in a very Java-like way, but it might not feel as natural to a Ruby developer using your library. Of course conventions aren't so cut-and-dry as capabilities; there are many ways to do something and sometimes one is considered "more right" than others, but often not as well. Looking at how other libraries are implemented and getting to know a language helps informs a lot of design choices. The best advice is to try to make things clear.&lt;/p&gt;

&lt;h2&gt;
  
  
  FreshBook's SDKs
&lt;/h2&gt;

&lt;p&gt;At the time of writing, FreshBooks has first-party Python and Node.js SDKs, and a community-supported Java one (all three are listed &lt;a href="https://www.freshbooks.com/api/libraries"&gt;here&lt;/a&gt;). As I said, I'm going to walk through some of the differences in the design, but today I'll get started with the basics of client initialization and configuration.&lt;/p&gt;

&lt;p&gt;First, let's talk about the configuration the FreshBooks' SDKs need to support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We require the clients to be initialized with their application's unique client id for the user-agent string, so that's a required parameter.&lt;/li&gt;
&lt;li&gt;To use the API requires authentication. Depending on what a developer has implemented, they'll either have a valid OAuth2 access token to initialize the client with, or they'll want to go through the authorization flow, which would require their client secret and redirect urls. Ideally the SDK supports both.&lt;/li&gt;
&lt;li&gt;If they have an expired token, they may want to refresh it, which would require the refresh token to be supplied.&lt;/li&gt;
&lt;li&gt;The developer may want to override some of the default settings like user-agent string, timeouts, or disabling automatic retries on failures.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Java
&lt;/h3&gt;

&lt;p&gt;I'll start with the Java SDK because the features of the Java language makes it a good first example to set the others against.&lt;/p&gt;

&lt;p&gt;Java supports function overloading, but with the number of possible options mentioned above, that would get very compicated combination-wise. You could just use nullable parameters, but that would be confusing and ugly. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;FreshBooksClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;clientId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;clientSecret&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;redirectUri&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;accessToken&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;userAgent&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Integer&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which could like anything like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FreshBooksClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FreshBooksClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;access_token&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FreshBooksClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;access_token&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is what the &lt;a href="https://en.wikipedia.org/wiki/Builder_pattern"&gt;builder pattern&lt;/a&gt; is for. You can see the full code for&lt;br&gt;
&lt;a href="https://github.com/amcintosh/freshbooks-java-sdk/blob/v0.3.0/lib/src/main/java/net/amcintosh/freshbooks/FreshBooksClient.java#L73-L88"&gt;the client&lt;/a&gt; and &lt;a href="https://github.com/amcintosh/freshbooks-java-sdk/blob/v0.3.0/lib/src/main/java/net/amcintosh/freshbooks/FreshBooksClient.java#L410"&gt;the builder&lt;/a&gt; on github but essentially the client is not initialized directly. You initialize a "client builder", which has a constructor for each of the base cases (&lt;strong&gt;"client_id"&lt;/strong&gt; versus &lt;strong&gt;"client_id, secret, url"&lt;/strong&gt;) and different methods for the various options, and the builder returns a client.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nf"&gt;FreshBooksClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;FreshBooksClientBuilder&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;FreshBooksClientBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;clientId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;clientSecret&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;redirectUri&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;FreshBooksClientBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;clientId&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;FreshBooksClientBuilder&lt;/span&gt; &lt;span class="nf"&gt;withAccessToken&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;accessToken&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;FreshBooksClientBuilder&lt;/span&gt; &lt;span class="nf"&gt;withReadTimeout&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which allows you to instantiate the client in the various differing ways cleanly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FreshBooksClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;FreshBooksClientBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FreshBooksClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;FreshBooksClientBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withAccessToken&lt;/span&gt;&lt;span class="o"&gt;(&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FreshBooksClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;FreshBooksClientBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withAccessToken&lt;/span&gt;&lt;span class="o"&gt;(&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withReadTimeout&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This requires much more structure in the client, but allows much cleaner usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python
&lt;/h3&gt;

&lt;p&gt;By comparison, Python allows for a much more concise implementation. Python is an object-oriented language and you could implement a builder pattern, but as python also supports &lt;a href="https://en.wikipedia.org/wiki/Named_parameter"&gt;named parameters&lt;/a&gt;, and there actually aren't too many options for the client, we can get away with something much simpler and more in the pythonic style (again, &lt;a href="https://github.com/freshbooks/freshbooks-python-sdk/blob/release/0.8.0/freshbooks/client.py#L33"&gt;full code on github&lt;/a&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;redirect_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;access_token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;refresh_token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_TIMEOUT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;auto_retry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which allows for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;redirect_uri&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;access_token&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;access_token&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, the language features of Python can lead to a very different implementation and usage than Java.&lt;/p&gt;

&lt;h3&gt;
  
  
  Node.js
&lt;/h3&gt;

&lt;p&gt;FreshBooks' Node.js SDK is written in TypeScript. Again, there are different ways to go about implementation, but we took a fairly common javascript pattern and passed &lt;a href="https://github.com/freshbooks/freshbooks-nodejs-sdk/blob/%40freshbooks/api%403.0.0/packages/api/src/APIClient.ts#L1290-L1298"&gt;a configuration object&lt;/a&gt; as &lt;a href="https://github.com/freshbooks/freshbooks-nodejs-sdk/blob/%40freshbooks/api%403.0.0/packages/api/src/APIClient.ts#L137"&gt;a parameter&lt;/a&gt;. The &lt;a href="https://github.com/stripe/stripe-node"&gt;Stripe Node.js Library&lt;/a&gt; does something similar (in general Stripe is a great place to look for any "how have others"-type API questions.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Options&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;clientSecret&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
    &lt;span class="nx"&gt;redirectUri&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
    &lt;span class="nx"&gt;accessToken&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
    &lt;span class="nx"&gt;refreshToken&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
    &lt;span class="nx"&gt;apiUrl&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
    &lt;span class="nx"&gt;retryOptions&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;IAxiosRetryConfig&lt;/span&gt;
    &lt;span class="nx"&gt;userAgent&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clientId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;defaultRetry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;retryDelay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;axiosRetry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exponentialDelay&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;retryCondition&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;APIClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;isNetworkRateLimitOrIdempotentRequestError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;clientSecret&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;redirectUri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;accessToken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;refreshToken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;apiUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;FRESHBOOKS_API_URL&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;API_BASE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;retryOptions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;defaultRetry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;with initialization looking like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;clientSecret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;client_secret&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;redirectUri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;accessToken&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;valid&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This also happens to be a fairly common pattern in PHP, thus a possible future FreshBooks PHP SDK would likely look similar. &lt;a href="https://github.com/auth0/auth0-PHP#sdk-initialization"&gt;auth0's PHP SDK&lt;/a&gt; has an example of this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Up Next
&lt;/h2&gt;

&lt;p&gt;I hope you found it interesting seeing the different ways a client for the same API can look language-to-language. As I said, I'll dive a bit more into functionality differences next time, but feel free to dig around the projects and if you have any questions, please reach out.&lt;/p&gt;

</description>
      <category>sdk</category>
      <category>java</category>
      <category>python</category>
      <category>node</category>
    </item>
    <item>
      <title>Getting Started with FreshBooks NodeJS SDK - Expenses &amp; Invoices</title>
      <dc:creator>jimioniay</dc:creator>
      <pubDate>Thu, 21 Oct 2021 14:08:02 +0000</pubDate>
      <link>https://forem.com/freshbooks/getting-started-with-freshbooks-nodejs-sdk-expenses-invoices-a6</link>
      <guid>https://forem.com/freshbooks/getting-started-with-freshbooks-nodejs-sdk-expenses-invoices-a6</guid>
      <description>&lt;p&gt;Getting Started with FreshBooks NodeJS SDK - Expenses &amp;amp; Invoices&lt;br&gt;
In this tutorial, we’ll be looking into the FreshBooks NodeJs SDK and how simple and easy it is to create, update and fetch Invoices, Expenses, Clients, Items, Payments, Projects, Time Entries etc. We have done all the heavy-lifting making it super convenient for you!&lt;/p&gt;

&lt;p&gt;We have handled http calls, http retries, Idempotency, consistent request and response structures and many more.This way you get to focus on your business logic rather than figuring out how the FreshBooks API works. &lt;/p&gt;

&lt;p&gt;Prerequisites &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A FreshBooks Developer account. If you don't have one, you can create one here.&lt;/li&gt;
&lt;li&gt;Authenticate yourself on the FreshBooks API using Oauth2.0. No idea how to do that? No problem, we have an excellent tutorial here.&lt;/li&gt;
&lt;li&gt;Basic knowledge of Async, Await and Node.js.&lt;/li&gt;
&lt;li&gt;A code editor (e.g. VS Code, Sublime, Atom etc.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's get started!&lt;br&gt;
Install the FreshBooks Nodejs SDK&lt;/p&gt;

&lt;p&gt;In your Node project directory, install the FreshBooks NodeJs Client via npm or yarn&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm install @freshbooks/api 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;yarn install @freshbooks/api
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get your FreshBooks Client ID&lt;/p&gt;

&lt;p&gt;Login to the FreshBooks Dashboard, click on the Settings/Gear Icon, then click on Developer Portal. Select your Oauth App and then note the Client ID (you’ll need it). &lt;br&gt;
(By the way, this tutorial assumes you have created an existing Oauth App in the past and understand the dynamics of FreshBooks Authentication. If you haven’t, then this tutorial on how to create one.)&lt;/p&gt;

&lt;p&gt;Instantiate the FreshBooks Client &lt;br&gt;
Using the block of code, we can instantiate the FreshBooks Client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { Client } from '@freshbooks/api';
import winston from 'winston'; // This is optional

//This logger is also optional
const logger = winston.createLogger({
   level: 'error',
   transports: [
       new winston.transports.File({ filename: 'error.log', level: 'error' }),
       new winston.transports.File({ filename: 'combined.log' }),
   ],
});


// Get CLIENT ID from STEP 2 ABOVE
const clientId = '&amp;lt;CLIENT ID&amp;gt;';

// Get token from authentication or helper function or configuration
const token = '&amp;lt;BEARER TOKEN&amp;gt;';

// Instantiate new FreshBooks API client
const freshBooksClient = new Client(token, {
   clientId
}, logger);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set a value for your Client ID and your Bearer Token. This tutorial assumes you have a helper function that helps generate the bearer tokens and refresh tokens from the /auth/oauth/token endpoints. If you don’t, you can check out authentication tutorial&lt;/p&gt;

&lt;p&gt;Confirm Instantiation &lt;br&gt;
Using the function below, we can confirm the instantiations works&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const confirmClientInstantiation = async () =&amp;gt; {
   try {
       const { data: { firstName, roles } } = await    freshBooksClient.users.me()
       accountId = roles[0].accountId;
       logger.info(`Hello ${firstName}`)
       return {
           firstName,
           accountId
       }
   } catch ({ code, message }) {
       // Handle error if API call failed
       logger.error(`Error fetching user: ${code} - ${message}`)
       return {
           error: {
               code, message
           }
       }
   }
}
console.log(await confirmClientInstantiation());
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If everything works as expected you should see a response similar to the below when you invoke the function. It also returns some useful information (especially the accountid. Store in a variable as you’ll need it in other method calls).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{ firstName: 'John', accountId: 'Zz2EMMR' }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If there is something wrong, you will receive a response that looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  error: {
    code: 'unauthenticated',
    message: 'This action requires authentication to continue.'
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create A Client&lt;br&gt;
If everything works as expected, you should be able to create a Client, an Invoice, etc. &lt;br&gt;
For simplicity, we'll create a Client. Once we do that, this same Client will be created immediately on the FreshBooks dashboard&lt;br&gt;
 we will create a Client and the same client created immediately on the FreshBooks Dashboard&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const createAClient = async () =&amp;gt; {
   let client =
   {
       fName: "John",
       lName: "Doe",
       email: 'no-reply@example.com',
   }
   console.log(accountId)
   try {
       const { ok, data } = await freshBooksClient.clients.create(client, accountId)
       return ok &amp;amp;&amp;amp; { data };
   } catch ({ code, message }) {
       return {
           error: { code, message }
       }
   }
}

console.log(await createAClient())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;List Expenses&lt;br&gt;
We should also be able to list Expenses using the sample block below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;//Fetch Expenses
const fetchExpenses = async () =&amp;gt; {
   try {
       const { ok, data } = await freshBooksClient.expenses.list(accountId);
       return ok &amp;amp;&amp;amp; data
   } catch ({ code, message }) {
       console.error(`Error fetching expenses for accountid:  ${accountId}. The response message got was ${code} - ${message}`)
   }
}

console.log(await fetchExpenses());
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If everything checks out, you should get a list of expenses These expenses are  also listed on the FreshBooks Dashboard&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  expenses: [
    {
       …
      id: '7538415',
      taxAmount2: null,
      taxAmount1: null,
      visState: 0,
      status: 0,
      vendor: 'FreshBooks Payments',
      notes: 'CC Payment Transaction Fee Invoice: #2021-09',
      updated: 2021-04-17T06:45:36.000Z,
      ...
    }
  ] 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Conclusion&lt;br&gt;
This implementation simply scratched the surface of the possibilities of the Node.js SDK as there are several use cases that can be achieved with it.&lt;/p&gt;

</description>
      <category>freshbooks</category>
      <category>node</category>
      <category>accounting</category>
      <category>freshbooksapi</category>
    </item>
    <item>
      <title>Some Best Practices On Building An Integration</title>
      <dc:creator>Andrew McIntosh</dc:creator>
      <pubDate>Tue, 19 Oct 2021 10:57:35 +0000</pubDate>
      <link>https://forem.com/freshbooks/some-best-practices-on-building-an-integration-1gbb</link>
      <guid>https://forem.com/freshbooks/some-best-practices-on-building-an-integration-1gbb</guid>
      <description>&lt;p&gt;Hi, I’m Andrew McIntosh. I’m a software engineer at FreshBooks. I’ve moved around development teams a couple of times, but right now I’m working on our API and Integrations team, trying to make things better for developers looking to build API integrations. If that’s you, or you want that to be you, here are five bits of advice on how you can build an (or build a better) API integration:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Get To Know The API&lt;/li&gt;
&lt;li&gt;Use Libraries, Tools, and SDKs&lt;/li&gt;
&lt;li&gt;Limit The Scope of Requests&lt;/li&gt;
&lt;li&gt;Properly Handle Errors&lt;/li&gt;
&lt;li&gt;Do More Asynchronous Work&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Get To Know The API
&lt;/h2&gt;

&lt;p&gt;This might seem obvious, &lt;strong&gt;but read the docs of whatever API you’re working with&lt;/strong&gt;. This is really the starting point on figuring out how you can do the thing you want to do. Reading through, you might even find a better way to do something than you initially thought. A good API is intuitive and predictable, but even good APIs can have odd, unexpected behaviour in places. Look at examples too as they can really help in cases where the documentation is dated or sparse. &lt;/p&gt;

&lt;p&gt;Keeping API documentation up to date is not always easy, so if you find some place where it’s wrong, someone will be very happy if you report it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Libraries, Tools, and SDKs
&lt;/h2&gt;

&lt;p&gt;A company with an API wants it to be used, so often they invest time and effort into making things that will make your life using it easier (like this!). &lt;strong&gt;Take advantage of the work they did for you and consider using libraries or SDKs they’ve provided&lt;/strong&gt;. If a company doesn’t have an SDK, or if what they have doesn’t fit your language or framework of choice, take a look for a 3rd-party solution (often API docs will list a bunch of these in addition to company-build ones).&lt;/p&gt;

&lt;p&gt;Using an SDK or library isn’t generally required, but there are advantages to not having to reinvent the wheel. Many of the topics I’m going to cover next might already be handled by a well-built SDK! Often SDK developers know the API quite well (especially if they’re employees of the company) and they can often simplify, clarify, or even paper over those oddities or inconsistencies I mentioned in documentation.&lt;/p&gt;

&lt;p&gt;For example, for historical reasons a lot of the oldest accounting endpoints in FreshBooks’ API return dates in North American Eastern Time (US/Eastern aka EST/EDT, time zones are complicated), while all newer endpoints to return dates in UTC (we’re moving everything to UTC, but it takes time). If you’re using one of our SDKs, we hide that from you and return all the dates in UTC. We’ve done the work so YOU don’t have to figure out which is which!&lt;/p&gt;

&lt;p&gt;Just like documentation, if you find a bug or missing feature in a tool, the owner would love your feedback, and if it’s open source and you’re keen, you could even try to fix it yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limit The Scope of Requests
&lt;/h2&gt;

&lt;p&gt;For your own benefit, as well as the API owner’s you want to &lt;strong&gt;fetch data in an efficient way&lt;/strong&gt;. This means not grabbing data that you don’t care about and will just throw away, as well as not fetching too much data at once and then having memory or performance issues trying to process it that can spill over into slow responsiveness for your users.&lt;/p&gt;

&lt;p&gt;This means that you should look at how an API handles filters (to limit returned records to only those you care about), pagination (fetch records in smaller batches so you can handle them in chunks), sorting (so those batches come in an order that works for you), and maybe even what fields are included in the response (so the record itself isn’t filled with data you don’t need). Here is FreshBooks’ &lt;a href="https://www.freshbooks.com/api/parameters"&gt;Search, Paging and Includes documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For example, one integration I was debugging would sync invoices from another service to a particular client. It would check to see if that existed before creating it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;matchingClients&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;freshbooksClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;accountId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;matchingClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;matchingClients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;currClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;currClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;organization&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;clientOrganization&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But it was sometimes creating duplicate clients. This was because FreshBooks defaults to returning 30 records at a time with the newest ones first. This worked fine when it first created the client, but as customers used the app and made more clients, the client to sync with got bumped off of the first 30 results and was no longer found. In addition to that, the code was fetching as many clients as it could, and then filtered them in memory. It either needed a filter or pagination (or both).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;clientSearchQueryBuilder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;SearchQueryBuilder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Organization_like&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;clientOrganization&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;matchingClients&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;freshBooksApi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;currentUser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fbAccountId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;searchQueryBuilder&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;matchingClients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;matchingClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;matchingClients&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let the API do that work!&lt;/p&gt;

&lt;h2&gt;
  
  
  Properly Handle Errors
&lt;/h2&gt;

&lt;p&gt;There’s a lot to proper error handling, so let's look at things in pieces.&lt;/p&gt;

&lt;h4&gt;
  
  
  API Errors
&lt;/h4&gt;

&lt;p&gt;A lot of API docs will have information on error codes, states, messages. You don’t want your integration to break unexpectedly, so you should look to handle these. Logging response messages is really helpful when building an application to understand business rules or validation failures. When you’re up and running in production, it’s equally important to help you know why something might be failing. For example, a good API won’t just give you a &lt;code&gt;422 Unprocessable Entity&lt;/code&gt;, but might return a message like &lt;code&gt;422 - At least one field among first_name, last_name, email or organization is required&lt;/code&gt; (a &lt;a href="https://www.freshbooks.com/api/clients"&gt;FreshBooks client&lt;/a&gt; validation error message)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The API isn’t your code, and you don’t control what comes out. This is a black box, and it’s helpful to follow defensive coding practices.&lt;/strong&gt; You should check http response codes, wrap your calls in try/catches, etc.. Don’t assume that a response you get has an object or resource data structure as a failure may return an error data structure instead, so you should be prepared to parse or otherwise handle either. If a REST APIs backend service is down, a company’s API gateway might not even return you JSON but instead give you an HTML error page that your JSON-expecting code will just fail to parse. In short, don’t assume that the response will always come in the form you expect, and ensure you handle cases when it doesn’t.&lt;/p&gt;

&lt;h4&gt;
  
  
  Timeouts and Connection Exceptions
&lt;/h4&gt;

&lt;p&gt;The API you're using isn’t perfect. It could be slow or offline. In these cases, your integration could run into issues if all the processors or workers are stuck waiting on the API. &lt;strong&gt;For this reason, you should configure timeouts on your calls.&lt;/strong&gt; HTTP clients have easily set timeouts but often they are not enabled by default. Again, a good SDK will have some sane defaults for you (we &lt;a href="https://github.com/freshbooks/freshbooks-python-sdk/blob/release/0.8.0/freshbooks/client.py#L51"&gt;default to 30 seconds but let you easily override&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;In general there are connect timeouts (how long you wait to establish a connection), and read timeouts (how long you wait for the response). Some clients let you set them together or individually, but you’ll want to set both to protect from an unresponsive server (connect), or a really slow response (read). &lt;/p&gt;

&lt;p&gt;Figuring out exactly what the timeout values should be is very dependent on your integration and the API you’re using, but even setting them to something high (15-30 seconds) can save you a lot of pain.&lt;/p&gt;

&lt;h4&gt;
  
  
  Rate Limits
&lt;/h4&gt;

&lt;p&gt;Most APIs will limit traffic to prevent a bad actor from hindering other integration’s performance or degrading the entire system. You should build your integration to respect and accommodate these limits. Running up against them constantly doesn’t do you any good, as your work won’t get done faster, and could result in your integration being banned. &lt;/p&gt;

&lt;p&gt;As mentioned above, it’s best to build your integration with efficient calls in mind. Reducing the number of calls you need to make means you’re less likely to hit call limits. &lt;/p&gt;

&lt;p&gt;Another good practice is to properly handle rate limit errors (generally HTTP 429 errors), and then rather than retry right away, wait for a bit before making the next call. If the next call is still limited, wait even longer. This is called an exponential backoff, and again, most HTTP libraries will have a way to enable this and many SDKs will have implemented this (our SDKs do it &lt;a href="https://github.com/freshbooks/freshbooks-python-sdk/blob/release/0.8.0/freshbooks/api/resource.py#L34-L42"&gt;here&lt;/a&gt; and &lt;a href="https://github.com/freshbooks/freshbooks-nodejs-sdk/blob/%40freshbooks/api%402.0.1/packages/api/src/APIClient.ts#L115-L119"&gt;here&lt;/a&gt;). &lt;/p&gt;

&lt;h2&gt;
  
  
  Do More Asynchronous Work
&lt;/h2&gt;

&lt;p&gt;If your integration is handling a lot of data via an API, you should consider &lt;strong&gt;moving as much work to asynchronous tasks as possible&lt;/strong&gt;. In this way you can not block your users, manage and throttle your work loads, and easily retry failures. The above advice on handling rate limits gets a lot easier if your processing code isn’t blocking user actions while retrying. It also lets you throttle the calls in whatever queue system you’re using. &lt;/p&gt;

&lt;p&gt;Look into &lt;a href="https://aws.amazon.com/sqs/"&gt;Amazon’s SQS&lt;/a&gt;, &lt;a href="https://cloud.google.com/tasks/docs/creating-queues"&gt;Google’s Cloud Tasks queues&lt;/a&gt;, Python’s &lt;a href="https://docs.celeryproject.org/en/stable/getting-started/introduction.html"&gt;celery&lt;/a&gt; and something like &lt;a href="https://www.cloudamqp.com/"&gt;CloudAMQP&lt;/a&gt;, or &lt;a href="https://github.com/OptimalBits/bull"&gt;Bull&lt;/a&gt; with Redis. There are a lot of options out there.&lt;/p&gt;

&lt;p&gt;Another good idea is to use webhooks if the API supports them (&lt;a href="https://www.freshbooks.com/api/webhooks"&gt;FreshBooks does&lt;/a&gt;). This allows you to register to receive messages when an event happens. Rather than polling an API every 5 minutes to see if a new resource has been created, you can tell the API to send you a message when that happens. This can save you a lot of calls and overhead. &lt;/p&gt;

&lt;p&gt;Going back to that old integration I was debugging, it would sync invoices with FreshBooks every 20 minutes, but the process involved gathering up all the invoices it hadn’t pushed and looping through them. However, the process was driven by a synchronous HTTP call that would timeout after just a couple minutes, killing the process. It could take days to move everything over. Calling the process more often would only help a little as there were only so many calls the service could handle at one time and these long-running calls could block workers from handling user interaction with the app. We redesigned the whole sync process of the integration so that everything was done in small asynchronous tasks. On first signup we had an async task that would fetch each invoice (paginated), and put a message on the queue to process that invoice. The invoice processing tasks could thus be retried, scaled up, or throttled as needed. We also utilized webhooks for real-time updates. Each event received would just create another invoice process task. Instead of days, things could be processed in seconds, minutes, or perhaps hours for very large workloads. &lt;/p&gt;

&lt;h2&gt;
  
  
  Go Out And Build
&lt;/h2&gt;

&lt;p&gt;Well, I hope that gives you a few ideas on building a robust integration. If you know the API you’re using and the tools available, keep your calls efficient, handle the unexpected gracefully, and keep your time consuming processing away from the user’s actions, you’re on a great path to succeed. If you have any questions, please reach out to me, and if you have anything related to FreshBooks’ API you can email &lt;a href="mailto:newapi@freshbooks.com"&gt;newapi@freshbooks.com&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>freshbooks</category>
      <category>sdk</category>
    </item>
  </channel>
</rss>
