Fuzzy Matching Algorithms

Last Updated: 2025-12-28 Status: Complete

This page covers the algorithms and techniques used for name matching in sanctions screening.

Why Exact Matching Isn't Enough

Relying on exact name matches would allow sanctions evasion through:

Typos and misspellings: "Mohammad" vs. "Muhammad"
Transliteration variations: Arabic/Cyrillic names to Latin alphabet
Name order differences: "John Smith" vs. "Smith, John"
Nicknames and aliases: "Bob" vs. "Robert"
Missing middle names or initials
Corporate name variations: "ABC Corp" vs. "ABC Corporation" vs. "ABC Inc."

Fuzzy matching algorithms calculate similarity scores between names to identify potential matches even when spelling differs.

Algorithm Comparison

Algorithm	Method	Best For	Speed	Used By
Levenshtein Distance	Counts minimum edit operations (insert/delete/substitute) to transform one string to another	Complex names, business names	Slower	Most commercial AML solutions
Jaro-Winkler	Character similarity with prefix weighting (rewards matching first characters)	Individual names, typos	Faster	OFAC Sanctions List Search tool
Soundex	Phonetic encoding (encodes names by how they sound)	Similar-sounding names	Very fast	OFAC Sanctions List Search tool
Double Metaphone	Advanced phonetic encoding with multiple encodings per name	Complex pronunciations	Fast	Advanced screening systems

Levenshtein Distance

Calculates the minimum number of single-character edits (insertions, deletions, substitutions) required to change one string into another.

Example:

"Mohammed" → "Muhammad" = 3 edits (o→u, e→a, d→d is same)
Lower distance = higher similarity

Strengths:

Handles typos well
Works for any character set
Intuitive interpretation

Weaknesses:

Slower for long strings
Doesn't account for phonetic similarity

Jaro-Winkler

Measures character-level similarity with a boost for matching prefixes (first 4 characters).

Example:

"SMITH" vs "SMYTH" = 0.96 similarity (high match)
Names with same first letters score higher

Strengths:

Fast calculation
Good for names where first letters are reliable
Standard in OFAC tools

Weaknesses:

Prefix bias may miss some variations
Less effective for names with different start

Soundex

Converts names to phonetic codes based on sound, then compares codes.

Example:

"Smith" → S530
"Smyth" → S530
Same code = potential match

Strengths:

Very fast
Handles phonetic variations
Good for English names

Weaknesses:

Limited for non-English names
Coarse matching (high false positives)

Double Metaphone

Advanced phonetic encoding that generates multiple codes per name to handle pronunciation variations.

Example:

"Schmidt" → ["XMT", "SMT"]
Matches both German and Anglicized pronunciations

Strengths:

Handles international names better
Multiple encoding options
More accurate than Soundex

Weaknesses:

More complex implementation
Still primarily English-focused

OFAC's Official Approach

The OFAC Sanctions List Search tool uses fuzzy matching algorithms (commonly reported to include Jaro-Winkler and Soundex-based approaches) to identify potential matches even when spelling differs.

Key Points:

OFAC's tool is for reference, not compliance
Commercial systems typically use multiple algorithms
PayFacs should not rely solely on OFAC's search tool

Setting Match Thresholds

Common Threshold Ranges

95-100%: Very likely match - investigate immediately
85-94%: Possible match - manual review required
70-84%: Lower probability - may review based on risk appetite
Below 70%: Typically ignored (but configurable)

Risk-Based Threshold Setting

High-risk industries: Lower thresholds (more sensitive)
High-risk geographies: Lower thresholds
Common names: May need higher thresholds (e.g., "Mohammed Ahmed")
Business names: Often need lower thresholds (more variation)

Threshold Tuning Process

Start conservative: Lower thresholds catch more potential matches
Analyze results: Track false positive rates by name type
Adjust incrementally: Raise thresholds for high false-positive categories
Document changes: Maintain audit trail of threshold decisions
Review quarterly: Re-evaluate based on new data

Threshold Tuning is Critical

Initial threshold settings will generate thousands of false positives. Continuous tuning based on review outcomes is essential to balance compliance with operational efficiency.

Multi-Algorithm Approach

Best practice is to combine multiple algorithms:

Score = (0.4 × Levenshtein) + (0.35 × Jaro-Winkler) + (0.25 × Soundex)

Benefits:

Catches different types of variations
Reduces false negatives
More robust matching

Implementation:

Most commercial vendors use proprietary combinations
Tune weights based on your merchant population
Test with known sanctions list entries

Sanctions Screening Overview - Core concepts
False Positive Management - Handling alerts
Screening Operations - Real-time vs batch

Why Exact Matching Isn't Enough​

Algorithm Comparison​

Levenshtein Distance​

Jaro-Winkler​

Soundex​

Double Metaphone​

OFAC's Official Approach​

Setting Match Thresholds​

Common Threshold Ranges​

Risk-Based Threshold Setting​

Threshold Tuning Process​

Multi-Algorithm Approach​

Related Topics​

Why Exact Matching Isn't Enough

Algorithm Comparison

Levenshtein Distance

Jaro-Winkler

Soundex

Double Metaphone

OFAC's Official Approach

Setting Match Thresholds

Common Threshold Ranges

Risk-Based Threshold Setting

Threshold Tuning Process

Multi-Algorithm Approach

Related Topics