Member-only story

Azure Databricks —Run SQL Commands on Dataframe

Tech Zero
3 min readFeb 19, 2022

--

This article explains how to execute SQL statements on a dataframe in Azure Databricks notebook.

As a Data Engineer, SQL will always be my first love. So naturally when I learnt that within Databricks, I can create and run SQL statements on a dataframe without needing a SQL environment, I jumped on the opportunity. In this article, I will explain how to run SQL commands on a dataframe.

I will start by importing Koalas — a pandas API on Apache Spark.

Next, I will create an empty dataframe. The process is similar to creating a pandas dataframe.

A quick display of the dataframe shows us its structure -

Ok. So, what if I want to see all the data in the dataframe but don’t want to use a python command?

ks.sql('select * from {df}')
Databricks SQL command on Dataframe
(Databricks SQL command on Dataframe)

Koalas with its SQL feature will allow me to run SQL commands against a dataframe. The output of the above command is in a koalas dataframe.

Let’s take it up a notch and create another dataframe. This time, I’d want to join the two dataframes using a SQL inner join.

--

--

Tech Zero
Tech Zero

Written by Tech Zero

Product Manager, Data & Governance | Azure, Databricks and Snowflake stack | Here to share my knowledge with everyone

No responses yet