Member-only story

Tutorial: Setup Databricks Workflows To Run Pipelines in Databricks- Part 1

This article describes how to use Databricks workflows to create, run and audit a pipeline in Databricks environment.

Tech Zero
5 min readMay 14, 2023

Databricks in May 2022 introduced Databricks workflow where a user can create and manage data pipelines within Databricks.

If you are using Azure cloud platform, there are couple of advantages of using Databricks workflows:

πŸ‘‰ Everything is centralized within Databricks. From the orchestrator to the processing logic to alerts, everything resides within Databricks.

πŸ‘‰ Easier debugging. The visualization of workflow makes it easier to understand the different tasks and their relation to one another.

πŸ‘‰ Support for CI/CD properties. We can maintain development and production environments for the workflows.

If you are curious about how it works, read on.

Use Case β€”We will create a simple pySpark notebook in Databricks which will be added to workflow, put on an execution schedule and alert a user if it fails.

Step 1 β€” Create a python notebook in Databricks

--

--

Tech Zero
Tech Zero

Written by Tech Zero

Product Manager, Data & Governance | Azure, Databricks and Snowflake stack | Here to share my knowledge with everyone

No responses yet