有一个名为的软件包fuzzywuzzy。通过pip安装:
pip install fuzzywuzzy
简单用法:
>>> from fuzzywuzzy import fuzz
>>> fuzz.ratio("this is a test", "this is a test!")
96
该软件包建立在之上difflib。您问为什么不仅仅使用它?除了更简单之外,它还具有许多不同的匹配方法(例如令牌顺序不敏感,部分字符串匹配),这使其在实践中更加强大。这些process.extract功能特别有用:从集合中找到最匹配的字符串和比率。从他们的自述文件中:
偏比
>>> fuzz.partial_ratio("this is a test", "this is a test!")
100
代币分类率
>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
90
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
100
代币设定比率
>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
100
处理
>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
[('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
("Dallas Cowboys", 90)
[0,1]。