不区分大小写的替换

173

在Python中执行不区分大小写的字符串替换的最简单方法是什么？

python string case-insensitive

— 亚当·恩斯特
source

217

该string类型不支持此功能。您最好使用带有re.IGNORECASE选项的正则表达式子方法。

>>> import re
>>> insensitive_hippo = re.compile(re.escape('hippo'), re.IGNORECASE)
>>> insensitive_hippo.sub('giraffe', 'I want a hIPpo for my birthday')
'I want a giraffe for my birthday'

— 布莱尔·康拉德
source

11

如果您仅执行一次替换，或者想保存代码行，则使用带有re.sub和（？i）标志的单个替换会更有效：re.sub（'（？i）'+ re .escape（'hippo'），'giraffe'，'I want a hIPpo for my Birthday'）

— D Coetzee

3

为什么只重新转义一串字母？谢谢。

— Elena 2014年

8

@Elena，不是必需的'hippo'，但是如果将to-replace值传递到函数中将很有用，因此，它比其他任何东西实际上都是一个很好的例子。

— 布莱尔·康拉德

2

除了要花时间，re.escape这里还有另一个陷阱，这个问题无法避免，请参见stackoverflow.com/a/15831118/1709587：由于re.sub处理转义序列，如docs.python.org/library/re.html#re中所述.sub，您需要转义替换字符串中的所有反斜杠或使用lambda。

— Mark Amery

84

import re
pattern = re.compile("hello", re.IGNORECASE)
pattern.sub("bye", "hello HeLLo HELLO")
# 'bye bye bye'

— 未知
source

17

还是单线： re.sub('hello', 'bye', 'hello HeLLo HELLO', flags=re.IGNORECASE)

— Louis Yang

请注意，re.sub自Python 2.7起仅支持此标志。

— fuenfundachtzig

47

在一行中：

import re
re.sub("(?i)hello","bye", "hello HeLLo HELLO") #'bye bye bye'
re.sub("(?i)he\.llo","bye", "he.llo He.LLo HE.LLO") #'bye bye bye'

或者，使用可选的“标志”参数：

import re
re.sub("hello", "bye", "hello HeLLo HELLO", flags=re.I) #'bye bye bye'
re.sub("he\.llo", "bye", "he.llo He.LLo HE.LLO", flags=re.I) #'bye bye bye'

— 维贝尔
source

14

继续bFloch的回答，此功能将不改变任何一种，而是将所有旧出现的内容更改为新内容-以不区分大小写的方式。

def ireplace(old, new, text):
    idx = 0
    while idx < len(text):
        index_l = text.lower().find(old.lower(), idx)
        if index_l == -1:
            return text
        text = text[:index_l] + new + text[index_l + len(old):]
        idx = index_l + len(new) 
    return text

— rsmoorthy
source

干的很好。比正则表达式好得多；它处理各种字符，而正则表达式对于任何非字母数字的字符都非常挑剔。首选答案恕我直言。

— fyngyrz

您要做的就是逃避正则表达式：接受的答案比这短得多，而且更容易阅读。

— 疯狂物理学家

转义仅适用于匹配，目标中的反斜杠可能会使事情更加混乱。

— ideaman42

4

就像布莱尔·康拉德（Blair Conrad）所说的那样，string.replace不支持这一点。

使用regex re.sub，但请记住先转义替换字符串。请注意，在2.6中没有for的flags-option re.sub，因此您必须使用Embedded修饰符'(?i)'（或RE对象，请参阅Blair Conrad的答案）。另外，另一个陷阱是，如果给出了字符串，sub将在替换文本中处理反斜杠转义。为了避免这种情况，可以传入lambda。

这是一个函数：

import re
def ireplace(old, repl, text):
    return re.sub('(?i)'+re.escape(old), lambda m: repl, text)

>>> ireplace('hippo?', 'giraffe!?', 'You want a hiPPO?')
'You want a giraffe!?'
>>> ireplace(r'[binfolder]', r'C:\Temp\bin', r'[BinFolder]\test.exe')
'C:\\Temp\\bin\\test.exe'

— 约翰
source

4

此函数同时使用str.replace()和re.findall()函数。它将以不区分大小写的方式替换patternin中所有出现的情况。stringrepl

def replace_all(pattern, repl, string) -> str:
   occurences = re.findall(pattern, string, re.IGNORECASE)
   for occurence in occurences:
       string = string.replace(occurence, repl)
       return string

— 尼科·巴科（Nico Bako）
source

3

这不需要RegularExp

def ireplace(old, new, text):
    """ 
    Replace case insensitive
    Raises ValueError if string not found
    """
    index_l = text.lower().index(old.lower())
    return text[:index_l] + new + text[index_l + len(old):]

— bFloch
source

3

好人，但是，这并不能将所有新出现的旧事件替换为新事件，而是仅更改第一个事件。

— rsmoorthy 2011年

5

它不如正则表达式版本可读。无需在这里重新发明轮子。

— Johannes Bittner

在此版本与升级版本之间进行性能比较可能会很有趣，这可能会更快，这对某些应用程序很重要。否则它可能会变慢，因为它在解释后的Python中做更多的工作。

— D Coetzee

2

关于语法细节和选项的有趣观察：

在Win32上的Python 3.7.2（tags / v3.7.2：9a3ffc0492，2018年12月23日，23:09:28）[MSC v.1916 64位（AMD64）]

import re
old = "TREEROOT treeroot TREerOot"
re.sub(r'(?i)treeroot', 'grassroot', old)

'草根草根草根'

re.sub(r'treeroot', 'grassroot', old)

'TREEROOT草根TREerOot'

re.sub(r'treeroot', 'grassroot', old, flags=re.I)

'草根草根草根'

re.sub(r'treeroot', 'grassroot', old, re.I)

'TREEROOT草根TREerOot'

因此，match表达式中的（？i）前缀或添加“ flags = re.I”作为第四个参数将导致不区分大小写的匹配。但是，仅使用“ re.I”作为第四个参数不会导致不区分大小写的匹配。

为了比较，

re.findall(r'treeroot', old, re.I)

['TREEROOT'，'treeroot'，'TREerOot']

re.findall(r'treeroot', old)

['treeroot']

— 默里
source

这不能为问题提供答案。请修改您的答案，以确保它可以改善此问题中已经存在的其他答案。

— hongsy

1

我正在将\ t转换为转义序列（向下滚动），因此我注意到re.sub将反斜杠的转义字符转换为转义序列。

为了防止这种情况，我写了以下内容：

替换不区分大小写。

import re
    def ireplace(findtxt, replacetxt, data):
        return replacetxt.join(  re.compile(findtxt, flags=re.I).split(data)  )

另外，如果您希望将其替换为转义字符，例如此处的其他答案，这些特殊含义是将bashslash字符转换为转义序列，则只需对您的查找和解码，或替换字符串即可。在Python 3中，可能必须执行类似.decode（“ unicode_escape”）＃python3的操作

findtxt = findtxt.decode('string_escape') # python2
replacetxt = replacetxt.decode('string_escape') # python2
data = ireplace(findtxt, replacetxt, data)

在Python 2.7.8中测试

希望有帮助。

— 斯坦·S。
source

0

之前从未发布过答案，并且该线程确实很旧，但是我想出了另一种解决方案，并认为我可以得到您的回应，我在Python编程中经验不足，因此，如果它有明显的缺点，请指出来，因为它的良好学习是：）

i='I want a hIPpo for my birthday'
key='hippo'
swp='giraffe'

o=(i.lower().split(key))
c=0
p=0
for w in o:
    o[c]=i[p:p+len(w)]
    p=p+len(key+w)
    c+=1
print(swp.join(o))

— 安丹
source

2

学习：通常在搜索并替换字符串时，最好不必先将其转换为数组。这就是为什么第一个答案可能是最好的。当使用外部模块时，它将字符串视为一个完整的字符串。这也使过程中发生的事情更加清楚。

— isaaclw 2012年

学习：对于没有上下文的开发人员来说，阅读此代码并破译其工作非常困难:)

— Todd