潜在拼写列表中值得注意的一件事是,包含的列表只有3个Soundex值(如果您忽略异常值“ Kazzafi”)
G310,K310,Q310
现在,那里存在误报(“ Godby”也为G310),但是通过组合有限的变音器点击,您可以消除它们。
<?
$soundexMatch = array('G310','K310','Q310');
$metaphoneMatch = array('KTF','KTHF','FTF','KHTF','K0F');
$text = "This is a big glob of text about Mr. Gaddafi. Even using compound-Khadafy terms in here, then we might find Mr Qudhafi to be matched fairly well. For example even with apostrophes sprinkled randomly like in Kad'afi, you won't find false positives matched like godfrey, or godby, or even kabbadi";
$wordArray = preg_split('/[\s,.;-]+/',$text);
foreach ($wordArray as $item){
$rate = in_array(soundex($item),$soundexMatch) + in_array(metaphone($item),$metaphoneMatch);
if ($rate > 1){
$matches[] = $item;
}
}
$pattern = implode("|",$matches);
$text = preg_replace("/($pattern)/","<b>$1</b>",$text);
echo $text;
?>
进行一些调整,然后说西里尔字母音译,您将拥有一个相当强大的解决方案。