Member-only story
This article explains how to execute SQL statements on a dataframe in Azure Databricks notebook.
As a Data Engineer, SQL will always be my first love. So naturally when I learnt that within Databricks, I can create and run SQL statements on a dataframe without needing a SQL environment, I jumped on the opportunity. In this article, I will explain how to run SQL commands on a dataframe.
I will start by importing Koalas — a pandas API on Apache Spark.
Next, I will create an empty dataframe. The process is similar to creating a pandas dataframe.
A quick display of the dataframe shows us its structure -
Ok. So, what if I want to see all the data in the dataframe but don’t want to use a python command?
ks.sql('select * from {df}')
Koalas with its SQL feature will allow me to run SQL commands against a dataframe. The output of the above command is in a koalas dataframe.
Let’s take it up a notch and create another dataframe. This time, I’d want to join the two dataframes using a SQL inner join.