DZone
Cloud Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Cloud Zone > Inspecting Cloud Composer - Apache Airflow

Inspecting Cloud Composer - Apache Airflow

In this article, we will learn what Cloud Composer is in GCP and how can we set it up. We will also highlight some critical insights about the Cloud Composer.

Sameer Shukla user avatar by
Sameer Shukla
CORE ·
Jan. 11, 22 · Cloud Zone · Analysis
Like (3)
Save
Tweet
4.54K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

Cloud Composer is a fully managed service built on Apache Airflow, it has built-in integration for other GCP services such as Cloud Storage, Cloud Datastore, Cloud Dataflow, Big Query, etc. 

The integration is important from an Airflow scheduling and automation perspective, say a DAG is written for pulling file (.csv) from Cloud Storage, executing transformation using Pandas and again uploading back to Storage or performing some DB Operations. 

Setup

The first thing is we need to create a Cloud Composer Instance, on the GCP console in the search bar, look for a composer, and it will provide us with, Cloud Composer API option, we need to enable the API. 

Screenshot - 1: GCP console

The Composer runs on top of the Kubernetes engine, once the instance setup is done Kubernetes clusters will be created and GCP takes care of it, we don’t have to deal with any Kubernetes configuration.

Step two is to Create an Environment and select the Composer 1 option as it’s better to understand through manual scaling to understand the little bit of background. 

Screenshot - 2: creating an environment  Once option 1 is selected, it asks for details like Name, Location and please note the Node count cannot be less than 3 as you can see the error in the screenshot.

Screenshot - 3: Configuring nodes

Minimum of 20GB node size is required also the Python version option is asked, I am selecting the latest one, composer image version I have selected is the latest one, and the remaining configurations are quite standard configs, like machine type, etc

Screenshot - 4: Standard configuration

The network configuration selected is ‘default’. The Web server configuration is for Apache Airflow UI and the network access control Is “Allow access from all IP addresses”. Select the smallest web-server machine type.

Screenshot - 5: Web server configuration

Let’s create the composer instance after selecting these many properties. The instance creation will take around 30 minutes as internally GCP is creating the entire Kubernetes cluster. 

After the environment creation is done, we can see the Location, Airflow version, etc. 

Screenshot - 6: Completing environment creation

Browse to Kubernetes Engine and Clusters, the cluster details are shown like Total Number of Nodes which is 3, Total vCPUs, etc. 

Screenshot - 7: Kubernetes Engine and Clusters

Let’s go to the workloads section, quite a few workloads are deployed by the GCP.

Screenshot - 8: Workload section

GCP also deployed 3 services

Screenshot - 9: GCP deployed services

In the VM section, a total of 3 VM instances is running because we have provided a total of 3 node counts while configuring the composer.

Screenshot - 10: VM section

Go to the Composer section and click on the Airflow webserver option.

Screenshot - 11: Composer section After google authentication we have been redirected to the Airflow Web-UI page and through this page, we can interact with Airflow, like executing the DAG, uploading a new DAG, etc. By default, one DAG is running with the name ‘airflow_monitoring’.

Screenshot - 12: Airflow Web-UI page

To upload a new DAG, on the Composer Environment screen there is an option for the DAGs folder.

Screenshot - 13: Composer environment screen

On clicking the DAGs folder we can see a bucket is created for us and inside that there is a folder DAGs which has only one file as of now airflow_monitoring.py when we upload our DAGs it will be uploaded to the same location.

Screenshot - 14: DAGs folder

When we upload our own DAG, airflow will automatically detect it and execute it as per the configuration in the file. 

Screenshot - 15: Airflow executing DAG

Once we click on UPLOAD FILES, we can upload our own DAG as you can hello_world.py is uploaded and after that, it’s been executed by the Airflow which can be verified on the DAGs page. 

Summary

One drawback is the Airflow service cannot be stopped, the only option is to delete it. GCP simplified working with Airflow by creating a separate service for it. 

Composer (software) Cloud Apache Airflow Kubernetes

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Java Hashtable, HashMap, ConcurrentHashMap: Performance Impact
  • A Smarter Redis
  • After COVID, Developers Really Are the New Kingmakers
  • Upsert in SQL: What Is an Upsert, and When Should You Use One?

Comments

Cloud Partner Resources

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo