The challenge I often face with probabilistic matching (either using Jaro winkler, Fuzzy wuzzy or Levenshtein) is deciding the threshold over which we can categorize pairs as true match.
I had quite often gone back to recalibrating the threshold to restrict the false positives.