How to Connect Azure Data Lake To Snowflake

This article explains how to establish a connection string between Azure Data lake to Snowflake Data warehouse.

Tech Zero
5 min readApr 22, 2022

Over the past many months, I have been working with Snowflake Data Warehouse. This has definitely been an amazing learning experience and an opportunity for me to see what an absolute beast Snowflake computing power can be. So, I figured why not share the same learning here.

Connect ADLSG2 to Snowflake
(Connect ADLSG2 to Snowflake)

In this article, I will provide a sort-of crash course on how to connect your Azure Data lake to a Snowflake Data warehouse. These are a series of one-time steps that you need to undertake before ADLSG2 can communicate with your Snowflake account.

1. Set up a Storage Integration Service

From the Snowflake official documentation,

a storage integration is a Snowflake object that stores a generated identity and access management (IAM) entity for your external cloud storage, along with an optional set of allowed or blocked storage locations (Amazon S3, Google Cloud Storage, or Microsoft Azure)

In simpler words, if Snowflake has to get access to your locked cloud data lake, Storage Integration is the key. The storage integration object in Snowflake stores the Azure identity and IAM used ID called app registration. Only an Azure admin can grant access to this app for the Azure resources.

The storage integration can be granted access to the ADLSG2 from two methods, one of which is recommended:

  1. ✅ Generate a Service Principal for the Azure storage account
  2. Grant access to Azure storage account through a SAS Key

The syntax for the recommended Service principal option is provided below-

CREATE [ OR REPLACE ] STORAGE INTEGRATION [IF NOT EXISTS]
<name>
TYPE = EXTERNAL_STAGE
cloudProviderParams
ENABLED = { TRUE | FALSE }
STORAGE_ALLOWED_LOCATIONS = ('<cloud>://<bucket>/<path>/', '<cloud>://<bucket>/<path>/')
[ STORAGE_BLOCKED_LOCATIONS = ('<cloud>://<bucket>/<path>/', '<cloud>://<bucket>/<path>/') ]
[ COMMENT = '<string_literal>' ]

Translating the above general syntax to an ADLSG2, we get the following —

CREATE OR REPLACE STORAGE INTEGRATION IF NOT EXISTS
ADLSG2_SF_STI
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = AZURE
ENABLED = TRUE
AZURE_TENANT_ID = '123abc'
STORAGE_ALLOWED_LOCATIONS = ('azure://mydatalake.blob.core.windows.net/container1/')
STORAGE_BLOCKED_LOCATIONS = ('')
COMMENT = 'This is storage integration for container1 in my data lake'

2. Execute The Below Command in Snowflake

desc storage integration ADLSG2_SF_STI

In the above command, ADLSG2_SF_STI is the name I gave to the storage integration object. The output of the above command looks like below-

Store the values for Azure_Consent_URL and Azure_Multi_Tenant_App_Name in a notepad.

2.1 Copy the value for Azure_Consent_URL and paste it into a web browser. You should see a Microsoft Permissions page.

2.2 Click Accept.

2.3 Log into Microsoft Azure portal.

2.4 Go to the storage account in Azure that Snowflake needs to be granted access to.

2.5 On the Access Control (IAM), click Add role assignment.

2.6 Select from either of the roles — Storage Blob Data Reader or Storage Blob Data Contributor. If you plan to use Snowflake to export files to ADLSG2, use Storage Blob Data Contributor role.

2.7 Search for the service principal. This is the value present in Azure_Multi_Tenant_App_Name. For example — if your Azure_Multi_Tenant_App_Name is SnowflakeIAM123_45454545, search for SnowflakeIAM123.

2.8 Click Save.

3. Create an External Stage

A stage in Snowflake indicates where the data files are located. There are two types of stages that can be created in Snowflake:

  1. Internal Stage: points to all files stored internally within Snowflake.
  2. External Stage: points to all files stored in external systems such as AWS S3 buckets or ADLSG2 containers.

📌 Only those with admin credentials can create a storage integration. However, other roles can create stage provided those roles are granted necessary permissions to do so.

##Create Stage Privilege
GRANT CREATE STAGE ON SCHEMA STG IN DATABASE DB_TEST TO ROLE DB_WRITER;

The above command grants permission for creation of a stage on a specified schema.

##Grant Usage on Integration
GRANT USAGE ON INTEGRATION <object_name> TO ROLE <role_name>
##Example
GRANT USAGE ON INTEGRATION ADLSG2_SF_STI TO ROLE DB_WRITER;

In the above command, I granted usage on my storage integration — ADLSG2_SF_STI — to those with role DB_WRITER.

After the above steps are completed, we can execute the below SQL to create a stage:

use schema STG;
create or replace stage my_az_stage
URL = 'azure://mydatalake.blob.core.windows.net/container1/'
storage_integration = ADLSG2_SF_STI
directory = (
enable = true
auto_refresh = true
)
FILE_FORMAT = (
TYPE = CSV
RECORD_DELIMITER = '\n
FIELD_DELIMITER = ','
SKIP_HEADER = 0
DATE_FORMAT = AUTO
TIME_FORMAT = AUTO
)
copyOptions = (
ON_ERROR = ABORT_STATEMENT
RETURN_FAILED_ONLY = FALSE
FORCE = FALSE
)
Comment = 'This is an external stage for Azure Data Lake Gen2.';

The above command creates an external stage named my_az_stage. This stage points to the files present under container1 in the ADLSG2. The file format indicates that the files in the ADLSG2 container are in a CSV format. You can then specify multiple attributes for the CSV files such as headers to skip, delimiter for field or record etc. In the above example, I only showed a sample of such options. You can check the entire list of options from Snowflake’s official documentation.

The copy options indicate to Snowflake what it needs to do when during data load from ADLSG2 to Snowflake, any error occurs.

Finally, the stage uses the ADLSG2_SF_STI storage integration to access the data files in the container1.

👉 To test whether the external stage works correctly or not, execute the below command in Snowflake —

list @<stage_name>

If your stage has been configured correctly, output of the above command should display all the files present under the ADLSG2 container for which the stage was configured.

To conclude, I hope this article was able to provide information you’d need to connect ADLSG2 account with Snowflake.

--

--

Tech Zero
Tech Zero

Written by Tech Zero

Product Manager, Data & Governance | Azure, Databricks and Snowflake stack | Here to share my knowledge with everyone

No responses yet