What do you do when you need to perform lookups in Excel and the data only sort-of matches? Use Fuzzy Lookup, of course! Here’s how:
You can download the file here to follow along. If you see a preview, look for the download arrow in the upper right hand corner.
Here are two lists I have where I want to extract the ID from the first list and attach it to the second. Normally, I would use a lookup function to accomplish this – either LOOKUP or VLOOKUP. However, the problem here is that the information I would use for a match, first or last names, do have some discrepancies. I’ve highlighted those in matching colors:
If I try to use VLOOKUP with either a TRUE or FALSE match criteria, I get inaccurate data:
So, since that won’t work, we need a way to do a lookup when the data is a bit “fuzzy”. What better than using Fuzzy Lookup! You can download that add-in from Microsoft at the following link:
Once done (you may have to shut down and restart Excel) you should see an additional tab on your ribbon:
Fuzzy Lookup will only work with tables, so you will need to make sure you’ve converted your data ranges into tables, and it is probably best that you name them. Also, you can only do this with two tables, no more.
Once done, click on the Fuzzy Lookup icon on the Fuzzy Lookup tab in the ribbon. A pop-up dialog box will appear allowing you to identify several aspects of the process:
At the top you can identify the tables you want to use. Below that you can choose fields that are to be used for matching between the tables. In the bottom section, you can identify what columns you want to appear in the output. Finally, with the Similarity Threshold sliding bar, you can adjust the level of match sensitivity. Even with Fuzzy Lookup, sometimes the variations are different enough that you need to “relax” the sensitivity.
Once I click GO, here is my result (note, I’ve checked all the options in the Output Columns field to show what can result):
Notice the similarity column added at the end, identifying the percentage of match.
This can be a very useful tool when you have data that isn’t quite identical, such as states written out vs. their abbreviations, or streets that use Ave. vs. Avenue or St. vs. Street.
Hopefully this tool can clear up that “fuzzy” data!