Better autocompletes with Fuzzy Matching
đ Published: ⢠Thomas Queste
This article is an introduction to Fuzzy Matching and how it can improve an Autocomplete widget. Fuzzy Matching is used to find the most appropriate strings into a set of strings, like finding Sinatra
when you misspelled it Senatra
.
We will set up a Sinatra application displaying an Ajax autocomplete widget, which calls the backend to have the best matching results, even if the match is not strictly equal.
Fuzzy Matching ?
Fuzzy Matching, aka Approximate String matching on Wikipedia, is used mainly in spell checkers and in biology to measure the variation between DNA.
In this article, we will use the Levenshtein distance algorithm to fetch results when there would be none using standard methods. Some other matching algorithms are also popular: the DamerauâLevenshtein distance (Levenshtein with transposition of letters), the Soundex (a phonetic algorithm for indexing names by sound) and also the Bitap. Many of them can be found in Ruby, or could also be hand coded.
Using the Levenshtein algorithm, we get a distance between two strings. For example :
$ distance("sinatra", "sinatra") #= 0, equality
$ distance("sinatra", "senatra") #= 1, one permutation
$ distance("sinatra", "rails") #= 6, many permutation
This allows us to display to the user not only the strings that match the input, but also the strings that are approximately equal.
The application
The application uses the following pieces of code :
- A Sinatra application
- JQuery UI for Autocomplete
- The âtextâ gem which has an implementation of the Levenshtein distance algorithm (source: <`https://github.com/threedaymonk/text/blob/master/lib/text/levenshtein.rb>)
The source code is available on: https://github.com/tomsquest/better-autocompletes-with-fuzzy-matching.
Frontend
The JQuery Autocomplete widget is simple to set up. The source is defined to call the âcountriesâ URL, and it will send it the input like countries?term=my_input
.
1. Add an input
<div class="ui-widget">
<label for="country">Country:</label>
<input id="country" />
</div>
2. Bind the JS
$(function () {
$("#country").autocomplete({
source: "countries",
select: function (event, ui) {
$("#results").text(
"input was: '" + this.value + "' and selection was: " + ui.item.value,
);
},
});
});
Backend
Even simpler, the backend is a simple get method that responds with JSON :
get "/countries" do
content_type :json
find_countries(params[:term]).to_json
end
The real stuff is in the âfind_countriesâ method :
def find_countries(term)
# Exact match
countries = COUNTRIES.find{|c| c.downcase == term.downcase}.to_a
# Partial match
if countries.empty?
countries = COUNTRIES.find_all{|c| c.downcase.include? term.downcase }
# Here is where we call the distance method of the text gem. It computes the Levenshtein distance and
# appends the results to the partial match done before
max_distance = 5 # Should be tweaked
countries += COUNTRIES.find_all{|c| distance(c, term) < max_distance}.sort_by{|c| distance(c, term) }
countries.uniq!
end
countries
end
The find_countries method can serve as an example. It uses exact and partial matching and uses the Levenshtein distance to add some more results. A real-world-awesome-production implementation would be different by narrowing the results (fewer results, lower distance).
Wrapping up
With a minimal mathematical background and a minimal technical setup (no indexing, no DB specific feature), we have boosted our autocomplete results.
Weâve seen how to set up a quick-and-simple sinatra app which computes on the Levenshtein distance on the backend. The Frontend was easily done using the JQuery UI Autocomplete widget.
For advanced use cases, we should improve the way we mix the Fuzzy-matching results with the strictly matching results. Switching to a better algorithm, or a set of algorithms (Longest common substring + Dameau-Levenshtein) could also be easily done.
The complete source code is available on GitHub: https://github.com/tomsquest/better-autocompletes-with-fuzzy-matching