Member-only story

How To Use Fuzzy Matching In Azure Data Factory

Tech Zero
4 min readJun 19, 2022

--

This article explains how to use Fuzzy Matching in Azure Data Factory Data Flow.

One of the most amazing features Microsoft could have unveiled has been released in its Data Flow capacity — Fuzzy Match.

To give you a quick background of what a fuzzy match is, consider the below scenario —

You have 2 tables both containing addresses. As a Data Engineer, you need to match the data from Table A with Table B and assign the corresponding city to the result. Upon analyzing, we can clearly see that the strings are NOT exactly equal (emphasis on exactly). We know that they are the same address but a computer algorithm doesn’t know that. If you performed a SQL join on the two addresses, you won’t get any results.

Hence, the art of approximate string comparison comes into the picture here which we call as fuzzy matching.

In ADF’s Data Flow Activity, this feature is provided when you select a Join transformation. In the image below, I have selected a .csv file as a data source and then picked Join transformation.

Fuzzy Matching in ADF
(Fuzzy Matching in ADF)

--

--

Tech Zero
Tech Zero

Written by Tech Zero

Product Manager, Data & Governance | Azure, Databricks and Snowflake stack | Here to share my knowledge with everyone

No responses yet