How To Use Fuzzy Matching In Azure Data Factory

Fuzzy Matching in ADF
(Fuzzy Matching in ADF)
ADF Fuzzy Matching Metadata
(ADF Fuzzy Matching Metadata)
Fuzzy Matching in ADF — similarity score
(Fuzzy Matching in ADF — similarity score)

📢 Tip — In my experience of working with fuzzy string matches, the threshold is by far the most step in tagging matches as true matches. It usually is an iterative process where as a Data Engineer, I need to play around with different thresholds to ensure the algorithm is not discarding any true matches.

My recommendation is to start with a conservative similarity threshold between 60% — 70% and see the end results and compare those with the original data. it’s better if you can plot the final matches on a bar graph. This helps in visualizing where most of the match percentages lie. If a majority of the matched data is leaning towards the 80% or higher mark, your threshold is good. If it’s lower, you might want to toggle the threshold to see which baseline value gives you the true matches.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store