Member-only story
Tutorial: Setup Databricks Workflows To Run Pipelines in Databricks- Part 1
This article describes how to use Databricks workflows to create, run and audit a pipeline in Databricks environment.
Databricks in May 2022 introduced Databricks workflow where a user can create and manage data pipelines within Databricks.
If you are using Azure cloud platform, there are couple of advantages of using Databricks workflows:
π Everything is centralized within Databricks. From the orchestrator to the processing logic to alerts, everything resides within Databricks.
π Easier debugging. The visualization of workflow makes it easier to understand the different tasks and their relation to one another.
π Support for CI/CD properties. We can maintain development and production environments for the workflows.
If you are curious about how it works, read on.
Use Case βWe will create a simple pySpark notebook in Databricks which will be added to workflow, put on an execution schedule and alert a user if it fails.
Step 1 β Create a python notebook in Databricks