Astro 9/19/2023, 5:19:04 AM
Deploying Airflow in production involves several steps and considerations:
• Infrastructure: Ensure you're running Airflow on an infrastructure that meets your organization's requirements. Local setups like using the Astro CLI are great for testing and DAG development, but not for running DAGs in production.
• Scalability: Consider the scalability of your setup. Depending on your needs, you might need to scale out your Airflow deployment. You can find more information on this in the <airflow-scaling-workers.md|Scaling out Airflow> guide.
• Configuration: Configure your Airflow environment according to best practices. This includes setting up your database, adding DAG tags and owner links, setting up notifications, and more. Check the /start section for more detailed guides on these tasks.
• Images and Charts: Use production-ready resources like the OSS Production Docker Images and the OSS Official Helm Chart.
• Managed Airflow: Consider using a managed service like Astro, which provides a lot of pre-configured settings and allows you to scale as needed.
Remember that Airflow's resource needs can vary significantly based on factors like the complexity of your DAGs, the number of tasks you're running, and more. You should be prepared to continuously monitor your system and adjust your settings as needed to ensure smooth operation.
Human 9/19/2023, 5:18:52 AM
how should I deploy airflow in production?
Sources
It is time to deploy your DAG in production. To do this, first, you need to make sure that the Airflow is itself production-ready. Let's see what precautions you need to take.
There are often questions about minimum requirements for Airflow for production systems, but it is not possible to give a simple answer to that question. The requirements that Airflow might need depend on many factors, including (but not limited to): : - The deployment your Airflow is installed with (see above ways of installing Airflow) - The requirements of the deployment environment (for example Kubernetes, Docker, Helm, etc.) that are completely independent from Airflow (for example DNS resources, sharing the nodes/resources) with more (or less) pods and containers that are needed that might depend on particular choice of the technology/cloud/integration of monitoring etc. - Technical details of database, hardware, network, etc. that your deployment is running on - The complexity of the code you add to your DAGS, configuration, plugins, settings etc. (note, that Airflow runs the code that DAG author and Deployment Manager provide) - The number and choice of providers you install and use (Airflow has more than 80 providers) that can be installed by choice of the Deployment Manager and using them might require more resources. - The choice of parameters that you use when tuning Airflow. Airflow has many configuration parameters that can fine-tuned to your needs - The number of DagRuns and tasks instances you run with parallel instances of each in consideration - How complex are the tasks you run The above "DAG" characteristics will change over time and even will change depending on the time of the day or week, so you have to be prepared to continuously monitor the system and adjust the parameters to make it works smoothly. While we can provide some specific minimum requirements for some development "quick start" - such as in case of our running-airflow-in-docker
{.interpreted-text role="ref"} quick-start guide, it is not possible to provide any minimum requirements for production systems. The best way to think of resource allocation for Airflow instance is to think of it in terms of process control theory - where there are two types of systems: 1. Fully predictable, with few knobs and variables, where you can reliably set the values for the knobs and have an easy way to determine the behaviour of the system 2. Complex systems with multiple variables, that are hard to predict and where you need to monitor the system and adjust the knobs continuously to make sure the system is running smoothly. Airflow (and generally any modern system running usually on cloud services, with multiple layers responsible for resources as well multiple parameters to control their behaviour) is a complex system and they fall much more in the second category. If you decide to run Airflow in production on your own, you should be prepared for the monitor/observe/adjust feedback loop to make sure the system is running smoothly. Having a good monitoring system that will allow you to monitor the system and adjust the parameters is a must to put that in practice. There are few guidelines that you can use for optimizing your resource usage as well. The fine-tuning- scheduler
{.interpreted-text role="ref"} is a good starting point to fine-tune your scheduler, you can also follow the best_practice
{.interpreted-text role="ref"} guide to make sure you are using Airflow in the most efficient way. Also, one of the important things that Managed Services for Airflow provide is that they make a lot of opinionated choices and fine-tune the system for you, so you don't have to worry about it too much. With such managed services, there are usually far less number of knobs to turn and choices to make and one of the things you pay for is that the Managed Service provider manages the system for you and provides paid support and allows you to scale the system as needed and allocate the right resources - following the choices made there when it comes to the kinds of deployment you might have.
All Airflow components should be run on an infrastructure that is appropriate for the requirements of your organization. For example, using the Astro CLI to run Airflow on a local computer can be helpful when testing and for DAG development, but it is insufficient to support running DAGs in production. The following resources can help you manage Airflow components: - OSS [Production Docker Images](https://airflow.apache.org/docs/apache- airflow/stable/installation/index.html#using-production-docker-images) - OSS [Official Helm Chart](https://airflow.apache.org/docs/apache- airflow/stable/installation/index.html#using-official-airflow-helm-chart) - Managed Airflow on Astro Scalability is also an important consideration when setting up your production Airflow environment. See Scaling out Airflow.
Setting up the sandbox in the /start
{.interpreted-text role="doc"} section was easy; building a production-grade environment requires a bit more work! These how-to guides will step you through common tasks in using and configuring an Airflow environment. ::: {.toctree maxdepth="2"} Using the CLI add-dag-tags add-owner-links notifications set-config set-up-database operator/index timetable custom-view-plugin listener-plugin customize-ui custom-operator create-custom-decorator export-more-env-vars connection variable setup-and-teardown run-behind-proxy run-with-systemd define-extra- link email-config dynamic-dag-generation docker-compose/index upgrading- from-1-10/index :::