How To Link/Modularize Code In Databricks Notebooks?

This article explains different ways to call or reuse one Databricks notebook from another notebook.

Tech Zero
5 min readAug 28, 2022

If you ever get asked about how you modularize your notebooks using Databricks, remember this — what the interviewer is asking is how do you maintain different code modules in your Databricks workspace and how do those modules get used by your Notebooks.

In Databricks there are two ways to accomplish this —

  1. Using the %run command
  2. Using the Databricks API to call one notebook from another

The %run command

This, by far, is the most commonly used technique for modularizing Databricks workspace. It allows us to invoke all the functions from a specific notebook for reuse into another notebook. Here’s an example —

Databricks call one notebook from another

The same code is also written below in case you need to try it on your own:

class Myclass:
"""
A simple class to be reused in another notebook.

Attributes
-------------
cloud_platform: str
name of the cloud platform like Azure, AWS

resource: str
name of the resource such as Synapse, Lambda

Methods
---------------
concat_strings(cloud_platform,resource):
returns the two input arguments with a hyphen in the middle.

cloud_mapper():
returns name of the corresponding company providing the cloud platform.
"""
def __init__(self,cloud_platform,resource):
self.cloud_platform = cloud_platform
self.resource = resource

def concat_strings(self):
return '{}'.format(self.cloud_platform + '-' + self.resource)

def cloud_mapper(self):
switcher = {
'Azure': 'Microsoft',
'AWS': 'Amazon',
'GCP': 'Google'
}
return switcher.get(self.cloud_platform, 'Invalid Entry')

I created a simple class with two methods — concat_strings and cloud_mapper. This notebook is named as Lib. I’d like to invoke the methods from this class from another notebook.

Below is a second notebook — My_Notebook. To get the functions from Lib notebook, %run is placed at the top and the location of the Lib notebook is passed to it. I called the concat_strings method in the below cell block to check the output.

Databricks call one notebook from another

The output is shown below:

Databricks call one notebook from another

To use the cloud_mapper method, I call it as shown below:

Databricks call one notebook from another

The output is shown below:

Databricks Notebook API

Another method for modularization is the Databricks API for calling a notebook. The syntax is as follows:

Databricks call one notebook from another

How It Works?

Taking my two notebooks above as an example, I want to call the Lib notebook and pass parameters — cloud_platform and resource — to it. To do so, I need to revisit the Lib notebook and add additional steps to it as shown below:

To pass parameters using the Databricks API, the child notebook (callee notebook) needs to have the widgets in place that will store the incoming parameters from the caller notebook (in this case My_Notebook). In the above screenshot, I created two widgets and then extracted their value and stored in the variables with the same name as incoming parameters.

But, how would I know if this Lib notebook executes successfully?

To do so, I can return a value from the Lib notebook to the My_Notebook using dbutils.notebook.exit(“Finished”) at the end of the notebook.

This is how I will call Lib from My_notebook now —

In the above code, I provided the name of the notebook to be called followed by the timeout in seconds and finally the JSON body of the parameters being sent to the notebook.

The returned value is then stored in a returned_val variable.

As we can see from the code above, Databricks creates a notebook workflow indicated by the Notebook job id. We can click this id to see the child notebook’s execution.

Returning Multiple Values From Child Notebook

In situations where I need multiple values to be returned from the Lib notebook, I customize the dbutils.notebook.exit command as shown below —

Databricks call one notebook from another

In the above code, the output of the two methods inside the class will be returned using JSON. The output of the two methods is assigned to the JSON elements — first_method and second_method respectively.

The output from the My_Notebook side looks like below —

🚩 What Happens When Time Runs Out But Callee Notebook Is Still Running?

As we need to provide a timeout with the Databricks API option, the child notebook will be displayed as cancelled in case the time runs out but the notebook did not finish execution. This, in turn, will send a failure back to the caller notebook.

👉The Big Question — When To Use Either Of Them?

I have personally used both techniques. The %run command is generally used to call reusable functions which are not hardcoded. Think of those as boilerplate code that you would need to work with.

A recent scenario in which I needed to use the Databricks API was that the child notebook contained code meant for a certain scenario. I needed this to run inside an if-else condition. The code looked like below —

if <condition_is_True>:
dbutils.notebook.run("Child_notebook",1000,{"argument_1":"value_1"})
else:
"Found Nothing"

--

--

Tech Zero
Tech Zero

Written by Tech Zero

Product Manager, Data & Governance | Azure, Databricks and Snowflake stack | Here to share my knowledge with everyone

No responses yet