fuzzywuzzy简易的字符匹配工具-技术文档分享

fuzzywuzzy提供了简单的字符串匹配接口。通过编辑距离，来匹配字符串直接的相似度。

fuzzywuzzy优点就是简单易用，轻量级。对两个字符串，通过计算编辑最小的修改次数，来比较两个字符串之间的相似度。可以用在拼写纠正上。对于中文文本的语义分析，文本内容识别上，基本不适用。

pip install fuzzywuzzy

pip install python-Levenshtein

from fuzzywuzzy import fuzz , process

fuzz模块提供了比较两个字符串相似度的几个常用方法:
ratio() 按顺序比较整个字符串的相似度。
partial_ratio() 近似匹配两个字符串
partial_ratio() 忽略单词顺序
token_set_ratio() 忽略重复

from fuzzywuzzy import fuzz,process

if __name__=='__main__':
    st1 ='thank you very mach'
    st2 ='thanks you '
    xs1 = fuzz.ratio(st1,st2)
    xs2 = fuzz.partial_ratio(st1,st2)
    xs3 = fuzz.token_sort_ratio(st1,st2)
    xs4 = fuzz.token_set_ratio(st1,st2)

    print(xs1,xs2,xs3,xs4)

输出结果依次为： 67 91 62 62

process模块提供了从一组字符串列表中，找出与目标字符串最匹配的结果。常用方法：
extract() 以数组形式，返回匹配最高的几个结果
extractOne() 返回最匹配的一个结果。

extract|extractOne方法参数：
query 目标字符串
choices 要匹配的字符串列表或者字典
除 query choices 两个必填之外，可以设置匹配的模式

def extract(query, choices, processor=default_processor, scorer=default_scorer, limit=5):
    """Select the best match in a list or dictionary of choices.

    Find best matches in a list or dictionary of choices, return a
    list of tuples containing the match and its score. If a dictionary
    is used, also returns the key for each match.

def extractOne(query, choices, processor=default_processor, scorer=default_scorer, score_cutoff=0):
    """Find the single best match above a score in a list of choices.

    This is a convenience method which returns the single best choice.
    See extract() for the full arguments list.

    Args:
        query: A string to match against
        choices: A list or dictionary of choices, suitable for use with
            extract().
        processor: Optional function for transforming choices before matching.
            See extract().
        scorer: Scoring function for extract().
        score_cutoff: Optional argument for score threshold. If the best
            match is found, but it is not greater than this number, then
            return None anyway ("not a good enough match").  Defaults to 0.

    Returns:
        A tuple containing a single match and its score, if a match
        was found that was above score_cutoff. Otherwise, returns None.
    """

fuzzywuzzy简易的字符匹配工具

评论抢沙发

置顶推荐

词云

热门文章

评论 抢沙发

置顶推荐

词云

热门文章

评论抢沙发