<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: mucio</title>
    <description>The latest articles on Forem by mucio (@mucio).</description>
    <link>https://forem.com/mucio</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F216559%2F7bed524b-0196-4b4f-9581-2f84b63c4f98.jpeg</url>
      <title>Forem: mucio</title>
      <link>https://forem.com/mucio</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mucio"/>
    <language>en</language>
    <item>
      <title>AWS MWAA and AWS SES integration</title>
      <dc:creator>mucio</dc:creator>
      <pubDate>Tue, 02 Nov 2021 10:14:46 +0000</pubDate>
      <link>https://forem.com/mucio/aws-mwaa-and-aws-ses-integration-odg</link>
      <guid>https://forem.com/mucio/aws-mwaa-and-aws-ses-integration-odg</guid>
      <description>&lt;h1&gt;
  
  
  Table of Contents
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;Intro&lt;/li&gt;
&lt;li&gt;The standard SMTP configuration&lt;/li&gt;
&lt;li&gt;Can we skip the credentials?&lt;/li&gt;
&lt;li&gt;A possible solution&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Intro
&lt;/h1&gt;

&lt;p&gt;This post is about integrating MWAA with SES using an IAM role and not &lt;a href="https://docs.aws.amazon.com/ses/latest/DeveloperGuide/smtp-credentials.html"&gt;SMTP credentials&lt;/a&gt;. I will try to keep it short and focused.&lt;/p&gt;

&lt;p&gt;The assumption here is that you already have your MWAA 2.0.2 environment and its role configured, as per the AWS documentation, and with the &lt;code&gt;SES:*&lt;/code&gt; actions allowed on your AWS SES.&lt;/p&gt;

&lt;h1&gt;
  
  
  The standard SMTP configuration
&lt;/h1&gt;

&lt;p&gt;If you already have your MWAA environment configured and you are trying to send emails, you probably ended up on &lt;a href="https://docs.aws.amazon.com/mwaa/latest/userguide/configuring-env-variables.html#configuring-env-variables-reference"&gt;this documentation page&lt;/a&gt;: the default settings there should be good enough to give you an idea of what you need. &lt;/p&gt;

&lt;p&gt;The one we are interested in the most is the &lt;code&gt;email.email_backend&lt;/code&gt;. As you can see in the documentation the default value is &lt;code&gt;airflow.utils.email.send_email_smtp&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is the default airflow SMTP integration and works pretty well if you have SMTP credentials (and it works also with SES, see the link above to create the credentials).&lt;/p&gt;

&lt;h1&gt;
  
  
  Can we skip the credentials?
&lt;/h1&gt;

&lt;p&gt;I do not like to create more credentials than I need. I wanted to be able to use AWS SES without them, just using the IAM Role assigned to my MWAA environment.&lt;/p&gt;

&lt;p&gt;What we need is a different email backend, specifically &lt;code&gt;airflow.providers.amazon.aws.utils.emailer.send_email&lt;/code&gt;. This will use a small module which is sitting among the ones provided by AWS.&lt;/p&gt;

&lt;p&gt;All good and dandy until you try to send the first email. &lt;/p&gt;

&lt;p&gt;Here is a quick DAG to copy&amp;amp;paste to test your email:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;airflow.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;airflow.operators.dummy_operator&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DummyOperator&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;airflow.operators.email&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;EmailOperator&lt;/span&gt;

&lt;span class="n"&gt;my_dag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"test_email_dag"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
             &lt;span class="n"&gt;schedule_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"0 0 * * *"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DummyOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"start"&lt;/span&gt;
                     &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DummyOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"end"&lt;/span&gt;
                   &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;run_this&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EmailOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'sent_email'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                         &lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'mucio@mucio.net'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                         &lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"test"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                         &lt;span class="n"&gt;html_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"this is a test, nothing to worry"&lt;/span&gt;
                        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;run_this&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you run this DAG you will probably see the following error in the logs:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;botocore.exceptions.ParamValidationError: Parameter validation failed:&lt;br&gt;
Invalid type for parameter Source, value: None, type: , valid types: &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What is happening here? Do you remember the &lt;code&gt;email_backend&lt;/code&gt; provided by the AWS? For some reasons the &lt;code&gt;mail_from&lt;/code&gt; parameter which is passed as &lt;code&gt;Source&lt;/code&gt; to boto3 &lt;a href="https://github.com/apache/airflow/blob/3c08c025c5445ffc0533ac28d07ccf2e69a19ca8/airflow/providers/amazon/aws/utils/emailer.py#L40"&gt;does not contain the correct value&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;This problem was already reported in a few Airflow issues and &lt;a href="https://github.com/apache/airflow/pull/18042"&gt;PRs&lt;/a&gt;. The fix didn't make the cut for Airflow 2.2 and will be probably there in version 2.3, but because we are talking about MWAA (version 2.0.2), we don't really know when this will be fixed on AWS.&lt;/p&gt;

&lt;h1&gt;
  
  
  A possible solution
&lt;/h1&gt;

&lt;p&gt;The solution I come up with was to rewrite the &lt;code&gt;emailer.py&lt;/code&gt; utility, deploy it in the &lt;code&gt;dags&lt;/code&gt; folder and reference it in the MWAA configuration.&lt;/p&gt;

&lt;p&gt;Here the new emailer (I put mine in &lt;code&gt;ses_email_fix/emailer.py&lt;/code&gt;, with an empty &lt;code&gt;__init__.py&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="s"&gt;"""Airflow module fix for email backend using AWS SES"""&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Union&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;airflow.configuration&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;conf&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;airflow.providers.amazon.aws.hooks.ses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SESHook&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Union&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
               &lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;html_content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;cc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Union&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;bcc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Union&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;mime_subtype&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'mixed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;mime_charset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'utf-8'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;conn_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'aws_default'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="s"&gt;"""Email backend for SES."""&lt;/span&gt;

    &lt;span class="n"&gt;hook&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SESHook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;aws_conn_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conn_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;hook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mail_from&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'smtp'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'SMTP_MAIL_FROM'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;html_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;html_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;cc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;bcc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bcc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;mime_subtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mime_subtype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;mime_charset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mime_charset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is taking the &lt;code&gt;mail_from&lt;/code&gt; from the &lt;code&gt;smtp.smtp_mail_from&lt;/code&gt; MWAA environment setting. Also I changed my &lt;code&gt;email.email_backend&lt;/code&gt; to be &lt;code&gt;ses_email_fix.emailer.send_email&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Last thing you don't want to forget is to put the folder &lt;code&gt;ses_email_fix&lt;/code&gt; in your &lt;code&gt;.airflowignore&lt;/code&gt; file (otherwise Airflow will parse that as a DAG).&lt;/p&gt;

&lt;p&gt;Now, after updating your environment, your test DAG should be able to fire emails via SES without using credentials.&lt;/p&gt;

&lt;p&gt;Last words are for the Airflow community who came up with a &lt;a href="https://github.com/apache/airflow/pull/16166/files/31e2f1d291e1b438d8826b9e9482c2d73abeb32c"&gt;quick workaround&lt;/a&gt; for this problem that I just implemented in my own MWAA environment.&lt;/p&gt;

&lt;p&gt;Credits: Cover photo by Quoc Nguyen from Pexels&lt;/p&gt;

</description>
      <category>aws</category>
      <category>mwaa</category>
      <category>airflow</category>
      <category>ses</category>
    </item>
    <item>
      <title>Airflow, Python and String Concatenation</title>
      <dc:creator>mucio</dc:creator>
      <pubDate>Mon, 26 Oct 2020 14:51:37 +0000</pubDate>
      <link>https://forem.com/mucio/part-2-string-concatenation-4pg9</link>
      <guid>https://forem.com/mucio/part-2-string-concatenation-4pg9</guid>
      <description>&lt;p&gt;This is the second part of the series about "Python concepts for people who are in need of using Apache Airflow but have little or no knowledge of Python".&lt;/p&gt;

&lt;p&gt;While Airflow concepts will be explained &lt;a href="https://en.wikipedia.org/wiki/En_passant" rel="noopener noreferrer"&gt;en passant&lt;/a&gt;, the main focus of these articles are Python concepts and techniques.&lt;/p&gt;

&lt;p&gt;In this article I will focus on string concatenation or to put together a text using multiple pieces like hard coded strings, variables, and/or templates.&lt;/p&gt;

&lt;h1&gt;
  
  
  Why string concatenation?
&lt;/h1&gt;

&lt;p&gt;Initially I wanted to dedicate this article to Python data structures (like lists and dictionaries), but to show them in practice most of my Airflow related examples where about strings.&lt;/p&gt;

&lt;p&gt;Feel free to skip this post, or quickly skim it, if you are familiar with Python strings.&lt;/p&gt;

&lt;p&gt;Other reasons why I wanted to focus a bit more on this topic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;over time Python introduced multiple ways to do string concatenation, looking at code written by other people can be sometimes confusing&lt;/li&gt;
&lt;li&gt;it is a topic for interview questions, especially for juniors engineers or analysts&lt;/li&gt;
&lt;li&gt;it is easy to do it wrong (wrong = in a way which is hard to maintain)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some people find string concatenation confusing, it can be. I hope at the end of this article you can have the tools to understand string concatenation and formatting. If not, feel free to write your feedback in the comments.&lt;/p&gt;

&lt;h1&gt;
  
  
  A basic dag with strings
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.operators.dummy_operator&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DummyOperator&lt;/span&gt;

&lt;span class="n"&gt;dag_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string_sample&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;task1_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;task2_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;my_dag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dag_name&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_dag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
             &lt;span class="n"&gt;schedule_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0 0 * * *&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
             &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;task1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DummyOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task_{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task1_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                      &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;task2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DummyOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task2_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                      &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;task1&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;task2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we look into the Airflow web UI we will see a DAG called &lt;code&gt;string_sample_dag&lt;/code&gt;, which consists of two tasks: &lt;code&gt;task_start&lt;/code&gt; and &lt;code&gt;task_end&lt;/code&gt;. The operators used are, again, the DummyOperator ones. This DAG is just an excuse to show you how to concatenate and format strings in Python.&lt;/p&gt;

&lt;h1&gt;
  
  
  The many ways to concatenate strings in Python
&lt;/h1&gt;

&lt;h2&gt;
  
  
  The lazy way, the &lt;code&gt;+&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The simplest way to concatenate multiple strings is to use the plus sign, &lt;code&gt;+&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;dag_name&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_dag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It works well adding strings, here another working example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;world&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;hello&lt;/span&gt; &lt;span class="n"&gt;world&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here is when being lazy doesn't work that well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nc"&gt;Traceback &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;most&lt;/span&gt; &lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="n"&gt;File&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;input&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nb"&gt;TypeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;must&lt;/span&gt; &lt;span class="n"&gt;be&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;+&lt;/code&gt; operator is a simple man, it sees values it tries to concatenate them. Sometimes it works, sometimes it doesn't. In our case, it is not able to join a string with an integer. To make it work we need to transform &lt;code&gt;123&lt;/code&gt; in a string &lt;code&gt;"123"&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;hello&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Better formatting with &lt;code&gt;.format()&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;In the first task of the DAG there is a second, better way to concatenate strings. The string method &lt;code&gt;.format()&lt;/code&gt; allows us to format a string that we use as template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task_{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task1_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In our previous "hello world" example the template for our greeting is &lt;code&gt;"hello {}"&lt;/code&gt;, where instead of &lt;code&gt;{}&lt;/code&gt; we want to have &lt;code&gt;world&lt;/code&gt;, &lt;code&gt;123&lt;/code&gt; or maybe &lt;code&gt;John&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;world&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello {}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;hello&lt;/span&gt; &lt;span class="n"&gt;world&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And it works with numbers too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello {}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;hello&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  With multiple values
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;greeting&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;world&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{} {}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;greeting&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;world&lt;/span&gt; &lt;span class="n"&gt;hello&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Uhm, this doesn't look right. The &lt;code&gt;.format()&lt;/code&gt; method assigns the values to the &lt;code&gt;{}&lt;/code&gt; placeholders in the order they appear. Of course we could switch the order of the passed values, but what if over time we change the template?&lt;/p&gt;

&lt;p&gt;To avoid this use ordinal numbers or, even better, name the placeholders. &lt;/p&gt;

&lt;p&gt;First with numbers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{1} {0}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;greeting&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;hello&lt;/span&gt; &lt;span class="n"&gt;world&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; The first value has index 0, the second 1.&lt;/p&gt;

&lt;p&gt;Naming the placeholders:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{hi} {who}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;who&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hi&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;greeting&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;hello&lt;/span&gt; &lt;span class="n"&gt;world&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using names placeholders our template string is more meaningful (which will be nice reviewing this code after a while), but we need to specify which value is &lt;code&gt;hi&lt;/code&gt; and which is &lt;code&gt;who&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For more formatting options and examples you can take a look at the Python &lt;a href="https://docs.python.org/3/library/string.html#format-string-syntax" rel="noopener noreferrer"&gt;documentation here&lt;/a&gt;. You probably will not need them for your DAGs, but it is good to know where to start (beside using Stack Overflow).&lt;/p&gt;

&lt;h2&gt;
  
  
  The &lt;code&gt;f&lt;/code&gt; string
&lt;/h2&gt;

&lt;p&gt;Python 3.6 introduced a new way to format strings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;world&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;hello&lt;/span&gt; &lt;span class="n"&gt;world&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;f&lt;/code&gt; stands for formatted string. But also for fast, because this formatting method is faster to write and faster to produce the final result.&lt;/p&gt;

&lt;p&gt;In a formatted string the values in between &lt;code&gt;{}&lt;/code&gt; are expressions that will be evaluated at runtime. Therefore it is possible to do things like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; hello to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;hello&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;world&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, more use case and advanced option can be discovered reading the Python &lt;a href="https://docs.python.org/3/reference/lexical_analysis.html#f-strings" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Few additional things on Python strings
&lt;/h1&gt;

&lt;p&gt;These are probably details that you will not use writing DAGs, but it is worth to mention to complete the basic overview on the strings.&lt;/p&gt;

&lt;p&gt;You probably noticed that formatting a string in Python utilizes the curly braces, then how to format a string that contains curly braces? Double curly, duh:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;In {{text}} is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;In&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;world&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What about the double quote sign &lt;code&gt;"&lt;/code&gt;? First, in Python you can use &lt;code&gt;'&lt;/code&gt; and &lt;code&gt;"&lt;/code&gt; for a string, and you need to use the same sign to close the string. Another why is to escape the quote sign using a &lt;code&gt;\&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;This is a double quote: &lt;/span&gt;&lt;span class="sh"&gt;"'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;This&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;double&lt;/span&gt; &lt;span class="n"&gt;quote&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This is a double quote too: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;This&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;double&lt;/span&gt; &lt;span class="n"&gt;quote&lt;/span&gt; &lt;span class="n"&gt;too&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, how to deal with very long strings. Python allows you to break a long line of code into multiple using the backslash &lt;code&gt;\&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; \
&lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;world&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;hello&lt;/span&gt; &lt;span class="n"&gt;world&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see Python concatenates the two strings ignoring the new line (to achieve that you can do &lt;code&gt;"hello \n"&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;But you can also create a multi line string using &lt;code&gt;"""&lt;/code&gt; or &lt;code&gt;'''&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;hello
&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;world&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;hello&lt;/span&gt;
&lt;span class="n"&gt;world&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  More additional things
&lt;/h2&gt;

&lt;p&gt;Actually there is much more to say about Python strings,  concatenation, and formatting, but this will require another post (or more).&lt;/p&gt;

&lt;p&gt;Personally I believe these things are not very interesting for someone approaching Python for the first time. But I have been wrong in the past, vote unicorn to tell me that I am wrong and you want to know more about strings (and maybe add a comment on what you find difficult with strings in Python). &lt;/p&gt;

&lt;h1&gt;
  
  
  Which one to use?
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1wufbtsbz7pyhg6a2hlm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1wufbtsbz7pyhg6a2hlm.png" alt="Use f-strings"&gt;&lt;/a&gt;&lt;br&gt;
Using Python 3.6 or above, &lt;code&gt;f&lt;/code&gt;-string is the way to go: it is fast to write and easy to maintain.&lt;/p&gt;

&lt;p&gt;But I am the first to admit that for quick debugging I still resort to the &lt;code&gt;+&lt;/code&gt; (when possible).&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;.format()&lt;/code&gt; is worth to mention because there are many examples in the wild using it or you can encounter some old code which requires your attention.&lt;/p&gt;

&lt;p&gt;That said, I suggest you to take a look at the links to the documentation to be aware of the formatting possibility offered by Python, sooner or later you will need them.&lt;/p&gt;

&lt;h1&gt;
  
  
  Shameless plug
&lt;/h1&gt;

&lt;p&gt;In case you need support or assistance feel free to reach out to me in the comment or direct messages. On twitter you can find me with the handler &lt;a class="mentioned-user" href="https://dev.to/mucio"&gt;@mucio&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you need more structured help, the nice people at &lt;a href="https://untitleddata.company/" rel="noopener noreferrer"&gt;Untitled Data Company&lt;/a&gt; (which includes me) will be happy to help you with all your data needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Credits
&lt;/h3&gt;

&lt;p&gt;Cover photo by &lt;a href="https://unsplash.com/@jessbaileydesigns?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Jess Bailey&lt;/a&gt; on &lt;a href="https://unsplash.com/s/photos/tape-craft-scissors?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>airflow</category>
      <category>strings</category>
    </item>
    <item>
      <title>Quickly Setup Airflow for Development with Breeze</title>
      <dc:creator>mucio</dc:creator>
      <pubDate>Wed, 30 Sep 2020 19:41:54 +0000</pubDate>
      <link>https://forem.com/mucio/quickly-setup-airflow-for-development-with-breeze-d8h</link>
      <guid>https://forem.com/mucio/quickly-setup-airflow-for-development-with-breeze-d8h</guid>
      <description>&lt;p&gt;&lt;strong&gt;Disclaimer:&lt;/strong&gt; I have submitted the PRs for two of the Breeze features mentioned in this article (the &lt;code&gt;start-airflow&lt;/code&gt; command and the &lt;code&gt;--init-scripts&lt;/code&gt; flag). I feel responsible for your user experience using them, so if you have questions or feedback please reach out  to me.&lt;/p&gt;

&lt;h1&gt;
  
  
  TL;DR
&lt;/h1&gt;

&lt;p&gt;To have Airflow running on your machine do the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install &lt;a href="https://docs.docker.com/engine/install/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt; and &lt;a href="https://docs.docker.com/compose/install/" rel="noopener noreferrer"&gt;Docker Compose&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Clone the Airflow repository &lt;code&gt;git clone git@github.com:apache/airflow.git&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;In the Airflow folder run &lt;code&gt;./breeze start-airflow&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With the first run Breeze creates the folder &lt;code&gt;files/dags&lt;/code&gt; in the repo folder. Adding DAG files in that folder will make them appear in Airflow.&lt;/p&gt;

&lt;p&gt;Go to &lt;a href="https://localhost:28080" rel="noopener noreferrer"&gt;https://localhost:28080&lt;/a&gt; to see your Airflow running.&lt;/p&gt;

&lt;h1&gt;
  
  
  Intro
&lt;/h1&gt;

&lt;p&gt;If you do not like when food recipes start with pages of blabbing skip this part.&lt;/p&gt;

&lt;h2&gt;
  
  
  My problem
&lt;/h2&gt;

&lt;p&gt;I started a to write a few blog posts for people who are approaching Python and Apache Airflow for the first time. I needed an quick way for my readers to setup their own Airflow and an even quicker way to explain how to do it.&lt;/p&gt;

&lt;p&gt;I wanted something so simple that you and I could focus only on DAG's code. Enter Breeze.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Breeze?
&lt;/h2&gt;

&lt;p&gt;Breeze is a command line tool to spin up a dockerized* Airflow instance for development or testing. It can be used to create an environment with specific properties to run tests, before deploying to production. This is pretty cool if you are into CI/CD.&lt;/p&gt;

&lt;p&gt;The first time I met Breeze I was working on automating the creation of our own environment to run tests (for a data warehouse, not for Airflow), and I was very intrigued by the idea.&lt;/p&gt;

&lt;p&gt;Therefore when I started thinking about how easily have a dev Airflow running, Breeze was on top of my list. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dockerized stands for "running in a virtual machine with no impact on your computer (called the host, while the vm is the guest)." Well, no impact beside consuming CPU and RAM 😕&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Setup
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;I run my Airflow/Breeze using WLS2 on Windows 10, people using a Mac or a Linux machine will have probably a smoother experience than me, but WLS2 with Ubuntu is quite good (if you are on Windows 10 the WLS2 setup is covered &lt;a href="https://docs.microsoft.com/en-us/windows/wsl/install-win10" rel="noopener noreferrer"&gt;here&lt;/a&gt;) and Breeze runs more easily in a linux box (or a mac).&lt;/p&gt;

&lt;p&gt;What do you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install &lt;a href="https://docs.docker.com/engine/install/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Install &lt;a href="https://docs.docker.com/compose/install/" rel="noopener noreferrer"&gt;Docker Compose&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Install git (this is usually already installed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In my case I installed all these tools in my Ubuntu WLS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation and first run
&lt;/h2&gt;

&lt;p&gt;Clone the Airflow repository from Github with:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

git clone git@github.com:apache/airflow.git


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Once the repo is downloaded go to the Airflow folder and run Breeze:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="nb"&gt;cd &lt;/span&gt;airflow
./breeze start-airflow


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Breeze will download a number of docker images and will ask you if you want to build some of them, just say "yes" when asked (you can use the flags &lt;code&gt;--assume-yes&lt;/code&gt; or &lt;code&gt;--assume-no&lt;/code&gt; if you find this annoying). The first build can take few minutes, depending on your internet speed and machine.&lt;/p&gt;

&lt;p&gt;If everything goes as expected you should see a screen like this:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F9kro0e8k557mzp3bh93w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F9kro0e8k557mzp3bh93w.png" alt="started airflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I love it when a plan comes together"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Congratulations, your Airflow is up and running. &lt;/p&gt;

&lt;p&gt;If you go to &lt;a href="http://localhost:28080/" rel="noopener noreferrer"&gt;http://localhost:28080/&lt;/a&gt; you will see the Airflow UI. The default credentials are &lt;code&gt;admin/admin&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ft6v0c9zra8bi97x77oju.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ft6v0c9zra8bi97x77oju.png" alt="admin/admin to login"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Username: admin - Password: admin&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How to use this?
&lt;/h2&gt;

&lt;p&gt;What you see are three &lt;a href="https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/" rel="noopener noreferrer"&gt;tmux&lt;/a&gt; panes (tmux is a Linux tool to create a terminal session and split it in multiple parts, called panes). In the lower left corner you have the Airflow Scheduler, which takes care of running things, on the right the Webserver is waiting for you to visit the Airflow Web UI. The top pane is to run additional commands.&lt;/p&gt;

&lt;p&gt;If you press &lt;code&gt;Ctrl+b&lt;/code&gt; followed by an arrow key you will be able to move between panes. There is not much you need to do in the bottom panes, you can stop the scheduler and the webserver with Ctrl+C. The top one is use the Airflow CLI commands (run &lt;code&gt;airflow --help&lt;/code&gt; if you want to know more). &lt;/p&gt;

&lt;p&gt;To get quickly out from tmux run the following command:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

./stop_airflow.sh


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The purpose of having these three panes is to allow you to observe what is happening in Airflow and in case use the command line interface (although this is for more advanced use cases).&lt;/p&gt;

&lt;h1&gt;
  
  
  Developing with Breeze
&lt;/h1&gt;

&lt;p&gt;Now that Airflow is running, you can just put your dags in the folder &lt;code&gt;files/dags&lt;/code&gt; created in your Airflow repository folder. If the folder is not there, Breeze will create it. The DAGs could take few minutes to appear on the web UI.&lt;/p&gt;

&lt;p&gt;In case a DAG syntax is wrong the bottom left pane (the Webserver one) shows the errors.&lt;/p&gt;

&lt;p&gt;Few additional notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In case you run Breeze using an SQLite database as Airflow backend (see below), that database is recreated with every run. In case you want to store Airflow configuration objects (like connections to your databases, users, etc.) use a different backend or use an initialization script (again see below).&lt;/li&gt;
&lt;li&gt;Environment variables can be entered in the file &lt;code&gt;files/airflow-breeze-config/variables.env&lt;/code&gt; (create it, if not there), these are set preparing the Airflow environment.&lt;/li&gt;
&lt;li&gt;In case you want to initialize Airflow, you can put a file called &lt;a href="http://init.sh" rel="noopener noreferrer"&gt;init.sh&lt;/a&gt; in the folder &lt;code&gt;files/airflow-breeze-config&lt;/code&gt;. The instructions in this file will be executed before Airflow Scheduler and Webserver start.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Some details and recipes
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;start-airflow&lt;/code&gt; command provides a simple way to start Airflow and monitor it. Behind the scene Breeze initialize the Airflow backend database and create an admin user that can be used to login into the web UI (credential &lt;code&gt;admin/admin&lt;/code&gt;). &lt;/p&gt;

&lt;h3&gt;
  
  
  Recipe 1 - A persistent backend
&lt;/h3&gt;

&lt;p&gt;As mentioned above the default database is recreated with every execution, if you want to have something more persistent you can use a different backend, for example PostgreSQL:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

./breeze start-airflow &lt;span class="nt"&gt;-b&lt;/span&gt; postgres


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This is start an additional container with a database dedicated for Airflow. Now your changes will survive a restart.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recipe 2 - A different Airflow version
&lt;/h3&gt;

&lt;p&gt;By default Breeze will start the most recent version of Airflow (currently 2.0.0dev) which is probably different from what you have in production. The good thing is that Breeze allows you to pick the version you need with another flag:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

./breeze start-airflow &lt;span class="nt"&gt;--install-airflow-version&lt;/span&gt; 1.10.10


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Of course you can compose multiple flags:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

./breeze start-airflow &lt;span class="nt"&gt;--install-airflow-version&lt;/span&gt; 1.10.10 &lt;span class="nt"&gt;-b&lt;/span&gt; postgres


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Feel free to go ahead and explore the other &lt;a href="https://github.com/apache/airflow/blob/master/BREEZE.rst#airflow-breeze-syntax" rel="noopener noreferrer"&gt;possible flags&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recipe 3 - Initialize Airflow with your own database connection
&lt;/h3&gt;

&lt;p&gt;One way to do it is to use a resilient backend, you can add your connection in the web UI and use it. At least this is what I was doing when I first started using Airflow.&lt;/p&gt;

&lt;p&gt;A more interesting approach is to use the optional initialization script for Breeze to create the connection. This will make easier to maintain the connections and other Airflow settings, plus you can store this file in your versioning tool (e.g. git).&lt;/p&gt;

&lt;p&gt;Here an example of &lt;code&gt;init.sh&lt;/code&gt; file:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# Connections&lt;/span&gt;
airflow connections add &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--conn-login&lt;/span&gt; my_user &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--conn-password&lt;/span&gt; my_pwd &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--conn-type&lt;/span&gt; jdbc &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--conn-host&lt;/span&gt; localhost &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--conn-port&lt;/span&gt; 9457 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--conn-extra&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    my_connection

&lt;span class="c"&gt;# Variable&lt;/span&gt;
airflow variables &lt;span class="nb"&gt;set &lt;/span&gt;my_variable variable_content


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Using this file will create a JDBC connection called &lt;code&gt;my_connection&lt;/code&gt; and a Variable called &lt;code&gt;my_variable&lt;/code&gt;. You can see them in the Web UI clicking on the corresponding section in the menu Admin.&lt;/p&gt;

&lt;h1&gt;
  
  
  Additional information
&lt;/h1&gt;

&lt;p&gt;The main point of Breeze was to provide an easy way to run automatic tests for the core Airflow developers, the people building Airflow not with Airflow. Breeze's goal is to lay down the foundation to easily run Airflow, taking care of: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;start the needed docker containers&lt;/li&gt;
&lt;li&gt;expose the ports for the Airflow components (e.g. webserver and backend database)&lt;/li&gt;
&lt;li&gt;provide an convenient way to run new code in Airflow (e.g. put the dags in &lt;code&gt;files/dags&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;eventually run tests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These features were too interesting to leave them just to the core developers ;)&lt;/p&gt;

&lt;p&gt;But this is not everything, if you want to know more about the possibilities offered by Breeze I suggest you to take a look at this video (&lt;a href="https://www.youtube.com/watch?v=4MCTXq-oF68&amp;amp;feature=youtu.be&amp;amp;ab_channel=ApacheAirflow" rel="noopener noreferrer"&gt;Airflow Breeze - Development and Test environment fro Apache Airflow&lt;/a&gt;); it will not make your DAGs better, but will give you more ideas on how to use Breeze and your new dev environment.&lt;/p&gt;

&lt;h1&gt;
  
  
  Final words
&lt;/h1&gt;

&lt;p&gt;If you are still here, feel free to leave a comment and provide your feedback. I will be happy to assist you and answer your questions (if I am able to).&lt;/p&gt;

&lt;h1&gt;
  
  
  Shameless plug
&lt;/h1&gt;

&lt;p&gt;In case you need support or assistance feel free to reach out to me in the comment or direct messages. On twitter you can find me with the handler &lt;a href="https://twitter.com/mucio" rel="noopener noreferrer"&gt;@mucio&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you need more structured help, the nice people at &lt;a href="https://untitleddata.company/" rel="noopener noreferrer"&gt;Untitled Data Company&lt;/a&gt; (which includes me) will be happy to help you with all your data needs.&lt;/p&gt;

</description>
      <category>python</category>
      <category>airflow</category>
    </item>
    <item>
      <title>How much Python do you need to know to write an Airflow DAG? Part 1 - ANSWERS</title>
      <dc:creator>mucio</dc:creator>
      <pubDate>Fri, 29 May 2020 21:56:04 +0000</pubDate>
      <link>https://forem.com/mucio/how-much-python-do-you-need-to-know-to-write-an-airflow-dag-part-1-answers-1ldm</link>
      <guid>https://forem.com/mucio/how-much-python-do-you-need-to-know-to-write-an-airflow-dag-part-1-answers-1ldm</guid>
      <description>&lt;p&gt;Answers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;airflow.operators.dummy_operator&lt;/code&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nope, confusingly enough the module &lt;code&gt;datetime&lt;/code&gt; contains the class &lt;code&gt;datetime&lt;/code&gt;, so &lt;code&gt;datetime.datetime&lt;/code&gt;.  &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To have the follwing DAG working:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"sample_dag"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="n"&gt;schedule_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"0 0 * * *"&lt;/span&gt;
             &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
             &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We would need to import the whole module, &lt;code&gt;import datetime&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;In the original code you can thinks that &lt;code&gt;from datetime import datetime&lt;/code&gt; creates a shortcut to &lt;code&gt;datetime.datetime&lt;/code&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Nope, Airflow will pick up the Tasks in our DAG following the dependency tree we defined. If we remove &lt;code&gt;start &amp;gt;&amp;gt; end&lt;/code&gt; they will be executed together. With &lt;code&gt;start &amp;lt;&amp;lt; end&lt;/code&gt; we can reverse time and save the dinosaurs. Everything is possible with Python&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Yes, because it is after the DAG's &lt;code&gt;start_date&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;5.1 Nope, the first argument should be the &lt;code&gt;task_id&lt;/code&gt;, which is a string of text, not a DAG.&lt;/p&gt;

&lt;p&gt;5.2 Nope, the unnamed argument should be before the named arguments. &lt;/p&gt;

&lt;p&gt;Back to the original &lt;a href="https://dev.to/mucio/how-much-python-do-you-need-to-know-to-write-an-airflow-dag-part-1-20jp"&gt;post&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>airflow</category>
      <category>python</category>
    </item>
    <item>
      <title>How much Python do you need to know to write an Airflow DAG? Part 1</title>
      <dc:creator>mucio</dc:creator>
      <pubDate>Fri, 29 May 2020 21:55:26 +0000</pubDate>
      <link>https://forem.com/mucio/how-much-python-do-you-need-to-know-to-write-an-airflow-dag-part-1-20jp</link>
      <guid>https://forem.com/mucio/how-much-python-do-you-need-to-know-to-write-an-airflow-dag-part-1-20jp</guid>
      <description>&lt;p&gt;Not sure about you, but I wonder about this a lot when I need to onboard new colleagues, colleagues who were writing mostly SQL only till the day before.&lt;/p&gt;

&lt;p&gt;This post is for people who needs to start writing data pipelines (or DAGs) for &lt;a href="https://airflow.apache.org/" rel="noopener noreferrer"&gt;Apache Airflow&lt;/a&gt;, know already how process their data, but have little or none knowledge of Python. &lt;/p&gt;

&lt;p&gt;This post will not tell you how to install Airflow on your machine, this &lt;a href="https://airflow.apache.org/docs/stable/start.html#" rel="noopener noreferrer"&gt;quick start guide&lt;/a&gt; will, but check with your colleagues how you do thing in your organization. My assumption is that you work in an environment where Airflow is already used in production and you don't need to worry about that. If this is not the case, you will need something more than this post.&lt;/p&gt;

&lt;p&gt;I hope that after reading this you will be able to understand the code written by your colleagues and feel confident enough to start writing your own DAGs.&lt;/p&gt;

&lt;p&gt;I will try to keep this as much practical as possible, as if you should work with me. I assume you know nothing about Python, feel free to skip the sections you are familiar with. Also feel free to ask questions, I am always up for a chat.&lt;/p&gt;

&lt;h3&gt;
  
  
  Boring things
&lt;/h3&gt;

&lt;p&gt;I will write another post about the building blocks of Airflow, for now let me just share these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a DAG is an ETL process (for now 1 DAG == 1 file)&lt;/li&gt;
&lt;li&gt;a DAG is made of multiple Tasks&lt;/li&gt;
&lt;li&gt;a Task is an instance of an Operator&lt;/li&gt;
&lt;li&gt;an Operator does things (moves data, sends emails, writes a post on dev.to)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you need to know what DAG stands for, just click the hearth icon and I will tell you. &lt;/p&gt;

&lt;h1&gt;
  
  
  A Basic DAG
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.operators.dummy_operator&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DummyOperator&lt;/span&gt;

&lt;span class="n"&gt;my_dag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sample_dag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
             &lt;span class="n"&gt;schedule_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0 0 * * *&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
             &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DummyOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                      &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DummyOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At the end of this post you should be able to recognize that this DAG runs every day at midnight and does more or less nothing.&lt;/p&gt;

&lt;p&gt;But let's try to read it line by line.&lt;/p&gt;

&lt;h2&gt;
  
  
  Imports
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdo89k0xmuxf1ika5z6o1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdo89k0xmuxf1ika5z6o1.png" alt="relevant xkcd"&gt;&lt;/a&gt;&lt;br&gt;
Python can do a lot of things, but to save resources not everything is available all the time. What needs to be loaded in the computer memory is left to the developers. &lt;/p&gt;

&lt;p&gt;When you want to use something that is not part of the basic Python, you can &lt;code&gt;import&lt;/code&gt; the needed Python module or library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;antigravity&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now if I want to use a method of &lt;code&gt;antigravity&lt;/code&gt; I need to call it using the module name: &lt;code&gt;antigravity.fly()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If the module name is too long, you can use an alias:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;antigravity&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ag&lt;/span&gt;

&lt;span class="n"&gt;ag&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fly&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Widely used modules have standardized aliases (but this is another story).&lt;/p&gt;

&lt;p&gt;Some modules are quite big and it is possible to import only part of them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.operators.dummy_operator&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DummyOperator&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows us to avoid the module name (see below).&lt;/p&gt;

&lt;p&gt;Not all modules are installed on your machine when you install Python, you can install more module using a tool like &lt;code&gt;pip&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Defining the DAG
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sample_dag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
             &lt;span class="n"&gt;schedule_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0 0 * * *&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
             &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will create our DAG, which Python will store in the variable &lt;code&gt;my_dag&lt;/code&gt; that we will use later on. &lt;/p&gt;

&lt;p&gt;This DAG will be picked up by the Airflow scheduler and executed depending on its &lt;code&gt;schedule_interval&lt;/code&gt; and &lt;code&gt;start_date&lt;/code&gt;. The &lt;code&gt;dag_id&lt;/code&gt; is the identifier used internally by Airflow, you cannot have another DAG with the same name.&lt;/p&gt;

&lt;p&gt;A DAG is created using the arguments we pass to its constructor (&lt;code&gt;DAG()&lt;/code&gt;), if this is the first time you pass arguments to a Python method let me highlight a few things: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we pass three arguments with the format &lt;code&gt;param_name=value&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;we pass three arguments to this DAG, but the DAG class accepts more of them, the missing one are filled with default values. You can the full list of parameters &lt;a href="https://airflow.apache.org/docs/stable/_api/airflow/models/dag/index.html#airflow.models.dag.DAG" rel="noopener noreferrer"&gt;here&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dag_id&lt;/code&gt; is actually the only mandatory parameter. As you probably guessed mandatory parameters have no default and must be passed every time you call a function&lt;/li&gt;
&lt;li&gt;the format &lt;code&gt;param_name=value&lt;/code&gt; is not necessary, but allows us to pass values in the order we prefer. Using names allows us to pass only the needed parameters and we are not forced to change our code if the function we use adds more optional parameters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this reason you could remove &lt;code&gt;dag_id=&lt;/code&gt;, invert the named arguments, and the code will still work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sample_dag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="n"&gt;schedule_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0 0 * * *&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
             &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
             &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While this...&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sample_dag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0 0 * * *&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
             &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
             &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;...makes me to open the link above to see what really are the second and third parameters of the DAG constructor.&lt;/p&gt;

&lt;p&gt;Additional notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;datetime(2020, 4, 29)&lt;/code&gt; returns the 29th of April 2019 in a format understandable for the DAG method&lt;/li&gt;
&lt;li&gt;the &lt;code&gt;schedule_interval&lt;/code&gt; uses a format called crontab expression. This DAG will run every day at midnight. You can learn more about crontab expression using something like &lt;a href="https://crontab.guru/" rel="noopener noreferrer"&gt;crontab.guru&lt;/a&gt; (or clicking the Unicorn, so I will write a post about crontab expression for you)&lt;/li&gt;
&lt;li&gt;technically speaking we are creating an instance of the DAG class (but I will not write a post about this and you should feel bad about thinking that a click will make me your object oriented writer)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this point we are ready to add tasks to our DAG.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tasks
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DummyOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                      &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DummyOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Task definition looks similar to the DAG definition. Here we are using a very important Airflow operator: the DummyOperator which does absolutely nothing, but allows us to focus on other things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a Task needs a unique &lt;code&gt;task_id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;task_id&lt;/code&gt; is mandatory, it is my convention to keep it named &lt;/li&gt;
&lt;li&gt;remember the DAG variable? A Task needs to know its parent DAG, so we use that variable here: &lt;code&gt;dag=my_dag&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finally we have the line&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you guessed that it means the &lt;code&gt;start&lt;/code&gt; execution is followed by the &lt;code&gt;end&lt;/code&gt; execution you have guessed right. We can also write it as &lt;code&gt;end &amp;lt;&amp;lt; start&lt;/code&gt;, but of course it is less immediate. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;gt;&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;&amp;lt;&lt;/code&gt; are called bitshift operator (you can use this new knowledge as icebreaker at the next meetup).&lt;/p&gt;

&lt;p&gt;In case you are reading some old Airflow DAGs, you could find this ancient syntax (but we modern people are &lt;a href="https://airflow.apache.org/docs/stable/concepts.html#bitshift-composition" rel="noopener noreferrer"&gt;better&lt;/a&gt; off sticking to the bitshift operators):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_downstream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Check questions
&lt;/h1&gt;

&lt;p&gt;If you scroll up you should now be able to understand what this DAG is doing.&lt;/p&gt;

&lt;p&gt;Here some questions to check if you got everything right:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;From which module do we import &lt;code&gt;DummyOperator&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;In the DAG definition can I replace &lt;code&gt;start_date=datetime(2020, 4, 29),&lt;/code&gt; with &lt;code&gt;start_date=datetime.datetime(2020, 4, 29),&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;Because the DummyOperator does nothing, will the task &lt;code&gt;start&lt;/code&gt; and &lt;code&gt;end&lt;/code&gt; run together?&lt;/li&gt;
&lt;li&gt;My unbirthday this year is on the 7th of May 2020, will this DAG run that day?&lt;/li&gt;
&lt;li&gt;Will the two task definitions work?
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DummyOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                      &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DummyOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                      &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://dev.to/mucio/how-much-python-do-you-need-to-know-to-write-an-airflow-dag-part-1-answers-1ldm"&gt;Answers&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusions
&lt;/h1&gt;

&lt;p&gt;At this point you should be familiar with Python concepts like imports and how to call methods with more or less arguments. You should have also a general idea about the relationship between Airflow DAGs and Tasks, and how to create dependencies between Tasks.&lt;/p&gt;

&lt;p&gt;In the next episode we will focus on some fundamental Python data structures that can make our DAGs' code simpler and easier to maintain, some less-dummy operators and other nice things. Till then stay... scheduled.&lt;/p&gt;

</description>
      <category>airflow</category>
      <category>python</category>
    </item>
  </channel>
</rss>
