Airflow Docker

Posted on  by

Running Apache Airflow DAG with Docker. In this article, we are going to run the sample dynamic DAG using docker. Before that, let's get a quick idea about the airflow and some of its terms. Airflow is a platform to programmatically author, schedule and monitor workflows. By puckel. Updated 2 years ago.

  1. Airflow Docker Operator Example
  2. Airflow Docker Compose
  3. Airflow Docker Tutorial

Airflow needs to know how to connect to your environment. Informationsuch as hostname, port, login and passwords to other systems and services ishandled in the Admin->Connections section of the UI. The pipeline code youwill author will reference the ‘conn_id’ of the Connection objects.

Connections can be created and managed using either the UI or environmentvariables.

See the Connections Concepts documentation formore information.

Creating a Connection with the UI¶

Open the Admin->Connections section of the UI. Click the Create linkto create a new connection.

  1. Fill in the ConnId field with the desired connection ID. It isrecommended that you use lower-case characters and separate words withunderscores.

  2. Choose the connection type with the ConnType field.

  3. Fill in the remaining fields. SeeConnection Types for a description of the fieldsbelonging to the different connection types.

  4. Click the Save button to create the connection.

Editing a Connection with the UI¶

Open the Admin->Connections section of the UI. Click the pencil icon nextto the connection you wish to edit in the connection list.

Modify the connection properties and click the Save button to save yourchanges.

Creating a Connection from the CLI¶

You may add a connection to the database from the CLI.

Obtain the URI for your connection (see Generating a Connection URI).

Then add connection like so:

Alternatively you may specify each parameter individually:

Storing a Connection in Environment Variables¶

The environment variable naming convention is AIRFLOW_CONN_<conn_id>, all uppercase.

So if your connection id is my_prod_db then the variable name should be AIRFLOW_CONN_MY_PROD_DB.

Note

Single underscores surround CONN. This is in contrast with the way airflow.cfgparameters are stored, where double underscores surround the config section name.

The value of this environment variable must use airflow’s URI format for connections. See the sectionGenerating a Connection URI for more details.

Using .bashrc (or similar)¶

If storing the environment variable in something like ~/.bashrc, add as follows:

Airflow

Using docker .env¶

If using with a docker .env file, you may need to remove the single quotes.

Alternative secrets backend¶

In addition to retrieving connections from environment variables or the metastore database, you can enablean alternative secrets backend to retrieve connections. For more details see Alternative secrets backend

Connection URI format¶

In general, Airflow’s URI format is like so:

Airflow Docker Operator Example

The above URI would produce a Connection object equivalent to the following:

You can verify a URI is parsed correctly like so:

Generating a connection URI¶

To make connection URI generation easier, the Connection class has aconvenience method get_uri(). It can be used like so:

Airflow Docker Compose

Additionally, if you have created a connection via the UI, and you need to switch to an environment variable,you can get the URI like so:

Airflow Docker Tutorial

Connection Types¶