SKILL #ERNSCoding
airflow-dag-patterns
Streamline your data workflows effortlessly
Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. Use when creating data pipelines, orchestrating workflows, or scheduling batch jobs.
the manual
Apache Airflow DAG Patterns
Production-ready patterns for Apache Airflow including DAG design, operators, sensors, testing, and deployment strategies.
When to Use This Skill
- Creating data pipeline orchestration with Airflow
- Designing DAG structures and dependencies
- Implementing custom operators and sensors
- Testing Airflow DAGs locally
- Setting up Airflow in production
- Debugging failed DAG runs
Core Concepts
1. DAG Design Principles
| Principle | Description |
|---|---|
| Idempotent | Running twice produces same result |
| Atomic | Tasks succeed or fail completely |
| Incremental | Process only new/changed data |
| Observable | Logs, metrics, alerts at every step |
2. Task Dependencies
# Linear
task1 >> task2 >> task3
# Fan-out
task1 >> [task2, task3, task4]
# Fan-in
[task1, task2, task3] >> task4
# Complex
task1 >> task2 >> task4
task1 >> task3 >> task4
Quick Start
# dags/example_dag.py
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.operators.empty import EmptyOperator
default_args = {
'owner': 'data-team',
'depends_on_past': False,
'email_on_failure': True,
'email_on_retry': False,
'retries': 3,
'retry_delay': timedelta(minutes=5),
'retry_exponential_backoff': True,
'max_retry_delay': timedelta(hours=1),
}
with DAG(
dag_id='example_etl',
default_args=default_args,
description='Example ETL pipeline',
schedule='0 6 * * *', # Daily at 6 AM
start_date=datetime(2024, 1, 1),
catchup=False,
tags=['etl', 'example'],
max_active_runs=1,
) as dag:
start = EmptyOperator(task_id='start')
def extract_data(**context):
execution_date = context['ds']
# Extract logic here
return {'records': 1000}
extract = PythonOperator(
task_id='extract',
python_callable=extract_data,
)
end = EmptyOperator(task_id='end')
start >> extract >> end
Detailed patterns and worked examples
Detailed pattern documentation lives in references/details.md. Read that file when the navigation tier above is insufficient.
Best Practices
Do's
- Use TaskFlow API - Cleaner code, automatic XCom
- Set timeouts - Prevent zombie tasks
- Use
mode='reschedule'- For sensors, free up workers - Test DAGs - Unit tests and integration tests
- Idempotent tasks - Safe to retry
Don'ts
- Don't use
depends_on_past=True- Creates bottlenecks - Don't hardcode dates - Use
{{ ds }}macros - Don't use global state - Tasks should be stateless
- Don't skip catchup blindly - Understand implications
- Don't put heavy logic in DAG file - Import from modules









