Member-only story

Azure Databricks —Run SQL Commands on Dataframe

3 min readFeb 19, 2022

This article explains how to execute SQL statements on a dataframe in Azure Databricks notebook.

As a Data Engineer, SQL will always be my first love. So naturally when I learnt that within Databricks, I can create and run SQL statements on a dataframe without needing a SQL environment, I jumped on the opportunity. In this article, I will explain how to run SQL commands on a dataframe.

I will start by importing Koalas — a pandas API on Apache Spark.

Next, I will create an empty dataframe. The process is similar to creating a pandas dataframe.

A quick display of the dataframe shows us its structure -

Ok. So, what if I want to see all the data in the dataframe but don’t want to use a python command?

ks.sql('select * from {df}')

Koalas with its SQL feature will allow me to run SQL commands against a dataframe. The output of the above command is in a koalas dataframe.

Let’s take it up a notch and create another dataframe. This time, I’d want to join the two dataframes using a SQL inner join.

Azure Databricks —Run SQL Commands on Dataframe

Written by Tech Zero

No responses yet