如何替换一个字符串的多个子字符串？

284

我想使用.replace函数替换多个字符串。

我目前有

string.replace("condition1", "")

但想有类似的东西

string.replace("condition1", "").replace("condition2", "text")

尽管那听起来不像是好的语法

正确的方法是什么？有点像如何在grep / regex中进行操作\1以及\2如何将字段替换为某些搜索字符串

python text replace

— 品质管理
source

7

您尝试了提供的所有解决方案吗？哪一个更快？

— tommy.carstensen

我已花时间测试不同情况下的所有答案。参见stackoverflow.com/questions/59072514/…–

— Pablo

1

老实说，我更喜欢您的连锁方法。我在寻找解决方案时降落在这里，并使用了您的解决方案，效果很好。

— frakman1

@ frakman1 +1。不知道为什么没有更多的支持。所有其他方法使代码难以阅读。如果有函数传递数组要替换，则可以使用。但是您的链接方法最为清晰（至少具有静态替换数）

— IceFire

269

这是一个简短的示例，应该使用正则表达式来解决问题：

import re

rep = {"condition1": "", "condition2": "text"} # define desired replacements here

# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems()) 
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)

例如：

>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'

— 安德鲁·克拉克
source

7

更换一次完成。

— 安德鲁·克拉克

26

dkamins：不太聪明，也不是应该的聪明（我们应该在使用“ |”连接键之前对它们进行正则转义）。为什么没有过度设计？因为这样一来，我们就可以一次完成（= fast），并且我们可以同时进行所有替换操作，避免像"spamham sha".replace("spam", "eggs").replace("sha","md5")被"eggmd5m md5"取代那样发生冲突"eggsham md5"

— 飞羊

8

@AndrewClark如果您能解释lambda的最后一行发生了什么，我将不胜感激。

— 矿物

11

嗨，我创建了一个小片段，其中包含此片段的更清晰版本。它也应该稍微更有效：gist.github.com/bgusach/a967e0587d6e01e889fd1d776c5f3729

— bgusach 2016年

15

对于python 3，请使用items（）而不是iteritems（）。

— Jangari

127

您可以制作一个不错的小循环功能。

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

其中，text是完整的字符串，dic是一个字典—每个定义都是一个字符串，它将替换与该词匹配的字符串。

注意：在Python 3中，iteritems()已替换为items()

注意： Python字典没有可靠的迭代顺序。该解决方案仅在以下情况下解决您的问题：

更换顺序无关紧要
可以更改以前的替换结果

例如：

d = { "cat": "dog", "dog": "pig"}
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, d)
print(my_sentence)

可能的输出＃1：

“这是我的猪，这是我的猪。”

可能的输出＃2

“这是我的狗，这是我的猪。”

一种可能的解决方法是使用OrderedDict。

from collections import OrderedDict
def replace_all(text, dic):
    for i, j in dic.items():
        text = text.replace(i, j)
    return text
od = OrderedDict([("cat", "dog"), ("dog", "pig")])
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, od)
print(my_sentence)

输出：

"This is my pig and this is my pig."

小心＃2：如果您的text字符串太大或字典中有很多对，效率会很低。

— 约瑟夫·汉森
source

37

应用不同替换的顺序将很重要-因此，与其使用标准字典，不如考虑使用OrderedDict-或2元组列表。

— slothrop 2011年

5

这会使字符串重复两次……不利于性能。

— 瓦伦丁·洛伦兹

6

在性能方面，它比Valentin所说的还要糟糕-它会遍历文本多达dic中的所有项！如果'text'小，则很好，但是对于大文本则很糟糕。

— JDonner

3

在某些情况下，这是一个很好的解决方案。例如，我只想减去2个字符，而不必担心它们的顺序，因为替换键不匹配任何值。但我确实希望清楚发生了什么。

— 弥敦道（Nathan Garabedian）2013年

5

请注意，由于第一次迭代中新插入的文本可以在第二次迭代中进行匹配，因此这可能会产生意外的结果。例如，如果我们天真地尝试将所有“ A”替换为“ B”，并将所有“ B”替换为“ C”，则字符串“ AB”将转换为“ CC”，而不是“ BC”。

— 2013年

105

为什么不提供这样的解决方案？

s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
    s = s.replace(*r)

#output will be:  The quick red fox jumps over the quick dog

— 恩里科·比安奇（Enrico Bianchi）
source

2

这是超级有用，简单且可移植的。

— 切丝

看起来不错，但不替换正则表达式，例如：for in（（r'\ s。'，'。'），（r'\ s，'，'，'））中的r：

— Martin

2

使它成为1线：ss = [s.replace（* r）for r in（（“ brown”，“ red”），（“ lazy”，“ quick”））] [0]

— Mark K

94

这是第一种使用reduce的解决方案的变体，以防您喜欢功能。:)

repls = {'hello' : 'goodbye', 'world' : 'earth'}
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)

martineau的更好版本：

repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)

— 比约恩·林德奎斯特
source

8

创建repls一个元组序列并取消iteritems()调用会更简单。即repls = ('hello', 'goodbye'), ('world', 'earth')和reduce(lambda a, kv: a.replace(*kv), repls, s)。也会出现在Python 3工作不变

— 马蒂诺

真好！如果您使用python3，请使用项目而不是iteritems（现已在dicts内容中删除）。

— e.arbitrio

2

@martineau：由于reduce已被删除，因此在python3中保持不变是不正确的。

— normanius

5

@normanius：reduce仍然存在，但是它已成为Python 3中functools模块的一部分（请参阅docs），所以当我说不变时，我的意思是可以运行相同的代码-尽管可以接受的是，如果需要的话，reduce必须对其进行import编辑因为它不再是内置的。

— martineau

35

这只是对FJ和MiniQuark最佳答案的更简明扼要的回顾。要实现多个同时的字符串替换，您所需要做的就是以下功能：

def multiple_replace(string, rep_dict):
    pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
    return pattern.sub(lambda x: rep_dict[x.group(0)], string)

用法：

>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'

如果您愿意，您可以从此简单的功能开始制作自己的专用替换功能。

— j
source

1

尽管这是一个很好的解决方案，但是并发字符串替换不会产生与顺序执行（链接）它们相同的结果-尽管这可能并不重要。

— martineau 2013年

2

当然，rep_dict = {"but": "mut", "mutton": "lamb"}该字符串"button"会"mutton"与您的代码一起出现，但是"lamb"如果替换链接在一起，则会给出一个接一个的提示。

— martineau 2013年

2

这是此代码的主要功能，而不是缺陷。使用链式替换时，无法实现理想的行为，就像在我的示例中那样，同时并相互替换两个单词。

— mmj 2013年

1

如果您不需要它，它似乎不是一个很棒的功能。但是在这里我们谈论的是同时替换，这确实是主要功能。使用“链接”替换时，示例的输出将是Do you prefer cafe? No, I prefer cafe.，这根本是不可取的。

— mmj 2013年

@David写下您自己的答案，您的编辑过于激进

— UmNyobe 2014年

29

我基于FJ的出色答案：

import re

def multiple_replacer(*key_values):
    replace_dict = dict(key_values)
    replacement_function = lambda match: replace_dict[match.group(0)]
    pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
    return lambda string: pattern.sub(replacement_function, string)

def multiple_replace(string, *key_values):
    return multiple_replacer(*key_values)(string)

一杆用法：

>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.

请注意，由于更换仅需一遍，因此“café”更改为“ tea”，但不会更改为“café”。

如果您需要多次进行相同的替换，则可以轻松创建替换功能：

>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
                       u'Does this work?\tYes it does',
                       u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
...     print my_escaper(line)
... 
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"

改进之处：

把代码变成函数
增加了多行支持
修复了转义中的错误
易于为特定的多个替换创建函数

请享用！:-)

— 迷你夸克
source

1

有人可以为像我这样的python noobs一步一步解释吗？

— 朱利安·苏亚雷斯

在这里是python noob的同伴，所以我将不完整地了解它。将key_values分解为填充替换（由“ |”连接的键）和逻辑（如果匹配是键，则返回值）b。制作一个正则表达式解析器（查找键并使用给定逻辑的“模式”）-将其包装在lambda函数中并返回。我现在正在查找的东西：re.M，以及替换逻辑必须使用lambda。

— 福克斯

1

@Fox你明白了。您可以定义一个函数而不是使用lambda，这只是为了使代码更短。但是请注意，pattern.sub期望函数只有一个参数（要替换的文本），因此该函数需要访问replace_dict。 re.M允许多行替换（在doc.docs.python.org/2/library/re.html#re.M中对此进行了很好的说明）。

— MiniQuark '16

22

我想提出字符串模板的用法。只需将要替换的字符串放在字典中，就可以完成所有操作！来自docs.python.org的示例

>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

— 弗雷德里克·皮尔（Fredrik Pihl）
source

看起来不错，但是添加未提供的密钥substitute会引发异常，因此从用户那里获取模板时请务必小心。

— 巴特·

2

这种方法的缺点是模板必须包含全部（而不是全部）要替换的$ strings，请参见此处

— -RolfBly

17

就我而言，我需要用名称简单替换唯一键，所以我想到了：

a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
    a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'

— 詹姆斯·科斯
source

3

只要您没有替换冲突，此方法就起作用。如果替换i为s您，则会得到怪异的行为。

— bgusach '16

1

如果顺序很重要，则可以使用一个数组来代替上面的b = [ ['i', 'Z'], ['s', 'Y'] ]; for x,y in (b): a = a.replace(x, y) 命令：如果您谨慎地对数组对进行排序，则可以确保不递归地替换（）。

— CODE-READ 2013年

看来dict现在从Python 3.7.0开始保持order。我测试了它和它的作品，以我的机器上使用最新的稳定的Python 3

— 詹姆斯科斯

15

从开始Python 3.8，并引入赋值表达式（PEP 572）（:=运算符），我们可以在列表推导中应用替换项：

# text = "The quick brown fox jumps over the lazy dog"
# replacements = [("brown", "red"), ("lazy", "quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'

— Xavier Guihot
source

您知道这是否比在循环中使用替换更有效？我正在测试所有性能答案，但我还没有3.8。

— Pablo

为什么要在列表中得到输出？

— johnrao07

1

@ johnrao07好，列表理解会建立一个列表。因此，在这种情况下，您得到了['The quick red fox jumps over the lazy dog', 'The quick red fox jumps over the quick dog']。但是赋值表达式（text := text.replace）也可以text通过对其进行突变来迭代构建新版本。理解列表之后，可以使用text包含修改后的文本的变量。

— Xavier Guihot

1

如果要返回text一个单行的新版本，还可以使用[text := text.replace(a, b) for a, b in replacements][-1]（请注意[-1]），它提取列表推导的最后一个元素；即的最新版本text。

— Xavier Guihot

13

这是我的$ 0.02。它基于安德鲁·克拉克（Andrew Clark）的回答，稍微清晰了一点，并且还涵盖了替换字符串是另一个替换字符串的子字符串（更长的字符串获胜）的情况。

def multireplace(string, replacements):
    """
    Given a string and a replacement map, it returns the replaced string.

    :param str string: string to execute replacements on
    :param dict replacements: replacement dictionary {value to find: value to replace}
    :rtype: str

    """
    # Place longer ones first to keep shorter substrings from matching
    # where the longer ones should take place
    # For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against 
    # the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
    substrs = sorted(replacements, key=len, reverse=True)

    # Create a big OR regex that matches any of the substrings to replace
    regexp = re.compile('|'.join(map(re.escape, substrs)))

    # For each match, look up the new string in the replacements
    return regexp.sub(lambda match: replacements[match.group(0)], string)

在此主旨中，如有任何建议，请随时对其进行修改。

— 布古萨赫
source

1

相反，这应该是已被接受的答案，因为正则表达式由所有键构成，方法是按长度的降序对它们进行排序，然后将它们与| 正则表达式交替运算符。并且排序是必要的，以便在有其他选择的情况下选择所有可能的选择中最长的一个。

— Sachin S

我同意这是最好的解决方案，这要归功于排序。除了排序与我的原始答案相同外，我还为我的解决方案借用了排序，以确保没有人会错过如此重要的功能。

— mmj

6

我需要一种解决方案，其中要替换的字符串可以是正则表达式，例如，通过用一个替换多个空格字符来帮助规范长文本。我基于其他人（包括MiniQuark和mmj）的答案链，得出了以下结论：

def multiple_replace(string, reps, re_flags = 0):
    """ Transforms string, replacing keys from re_str_dict with values.
    reps: dictionary, or list of key-value pairs (to enforce ordering;
          earlier items have higher priority).
          Keys are used as regular expressions.
    re_flags: interpretation of regular expressions, such as re.DOTALL
    """
    if isinstance(reps, dict):
        reps = reps.items()
    pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
                                  for i, re_str in enumerate(reps)),
                         re_flags)
    return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)

它适用于其他答案中给出的示例，例如：

>>> multiple_replace("(condition1) and --condition2--",
...                  {"condition1": "", "condition2": "text"})
'() and --text--'

>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'

>>> multiple_replace("Do you like cafe? No, I prefer tea.",
...                  {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'

对我来说，最主要的是您还可以使用正则表达式，例如仅替换整个单词，或规范化空白：

>>> s = "I don't want to change this name:\n  Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"

如果您想将字典键用作普通字符串，则可以使用以下函数在调用multiple_replace之前对它们进行转义：

def escape_keys(d):
    """ transform dictionary d by applying re.escape to the keys """
    return dict((re.escape(k), v) for k, v in d.items())

>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n  Philip II of Spain"

以下函数可以帮助您在字典键中查找错误的正则表达式（因为来自multiple_replace的错误消息不是很清楚）：

def check_re_list(re_list):
    """ Checks if each regular expression in list is well-formed. """
    for i, e in enumerate(re_list):
        try:
            re.compile(e)
        except (TypeError, re.error):
            print("Invalid regular expression string "
                  "at position {}: '{}'".format(i, e))

>>> check_re_list(re_str_dict.keys())

请注意，它不会链接替换项，而是同时执行它们。这样可以提高效率，而不会限制它可以做什么。为了模拟链接的效果，您可能只需要添加更多的字符串替换对并确保对的预期顺序即可：

>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
...                             ("but", "mut"), ("mutton", "lamb")])
'lamb'

很好，谢谢。是否可以改进以允许在替换中使用反向引用？我还没有立刻想出如何添加它。

— cmarqu

我上面的问题的答案是stackoverflow.com/questions/45630940/…–

— cmarqu

4

这是一个示例，在长字符串上有很多小替换项时，效率更高。

source = "Here is foo, it does moo!"

replacements = {
    'is': 'was', # replace 'is' with 'was'
    'does': 'did',
    '!': '?'
}

def replace(source, replacements):
    finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
    result = []
    pos = 0
    while True:
        match = finder.search(source, pos)
        if match:
            # cut off the part up until match
            result.append(source[pos : match.start()])
            # cut off the matched part and replace it in place
            result.append(replacements[source[match.start() : match.end()]])
            pos = match.end()
        else:
            # the rest after the last match
            result.append(source[pos:])
            break
    return "".join(result)

print replace(source, replacements)

关键是要避免长字符串的许多串联。我们将源字符串切成片段，在形成列表时替换一些片段，然后将整个内容重新组合成字符串。

— 9000
source

2

您真的不应该这样，但是我觉得它太酷了：

>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>>     cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)

现在，answer是所有替换的结果

再次，这很 hacky，不是您应该定期使用的东西。但是，很高兴知道您可以根据需要执行以下操作。

— 检查员
source

2

我也在这个问题上挣扎。string.replace在进行许多替换后，正则表达式很难解决，并且比循环慢大约四倍（在我的实验条件下）。

你绝对应该尝试使用Flashtext库（此处的博客文章，Github上这里）。就我而言，每个文档的速度从1.8 s到0.015 s（正则表达式花费7.7 s）快了两个数量级。

在上面的链接中很容易找到使用示例，但这是一个有效的示例：

    from flashtext import KeywordProcessor
    self.processor = KeywordProcessor(case_sensitive=False)
    for k, v in self.my_dict.items():
        self.processor.add_keyword(k, v)
    new_string = self.processor.replace_keywords(string)

请注意，Flashtext会在一次通过中进行替换（以避免-> b和b-> c将'a'转换为'c'）。Flashtext还会查找整个单词（因此，“ is”将与“ th is ” 不匹配）。如果您的目标是几个单词（用“ Hello”代替“ This is”），则效果很好。

— 巴勃罗
source

如果您需要替换HTML标签，该如何工作？例如，更换<p>有/n。我尝试了您的方法，但使用标签flashtext似乎无法解析它？

— alias51

1

我不确定为什么它没有按您期望的那样工作。一种可能性是这些标签不能用空格分隔，记住Flashtext会查找整个单词。一种解决方法是先使用简单替换，以便“ Hi <p> there”变为“ Hi <p> there”。完成后，您需要小心删除多余的空间（也可以简单地替换吗？）。希望能有所帮助。

— 巴勃罗

谢谢，您可以设置<并>标记单词的结尾（但包含在替换单词中）吗？

— alias51

1

我相信“单词”仅用空格标记。也许可以在“ KeywordProcessor”中设置一些可选参数。否则，请考虑上述方法：用“ <”替换“ <”，应用Flashtext，然后再替换回去（例如，将“ <”替换为“ <”，将“ \ n”替换为“ \ n”）。

— 巴勃罗

2

我觉得这个问题需要单行递归lambda函数答案才能完整，仅因为如此。所以那里：

>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.popitem()), d)

用法：

>>> mrep('abcabc', {'a': '1', 'c': '2'})
'1b21b2'

笔记：

这消耗了输入字典。
Python字典从3.6开始保留键顺序；其他答案中的相应警告不再适用。为了向后兼容，可以采用基于元组的版本：

>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.pop()), d)
>>> mrep('abcabc', [('a', '1'), ('c', '2')])

注意：与python中的所有递归函数一样，太大的递归深度（即太大的替换字典）将导致错误。参见例如这里。

— 麦索尼
source

使用大型字典时遇到RecursionError！

— Pablo

@Pablo有趣。多大？请注意，所有递归函数都会发生这种情况。参见示例：stackoverflow.com/questions/3323001/…–

— mcsoini

我换人的解释是接近100K条款......到目前为止，使用与string.replace是迄今为止最好的办法。

— Pablo

1

在这种情况下，@ Pablo不能使用递归函数。一般而言sys.getrecursionlimit()，最大为1000。使用循环或类似方法，或尝试简化替换。

— mcsoini

是的，我怕这里真的没有捷径。

— Pablo

1

我不知道速度，但这是我的工作日快速解决方案：

reduce(lambda a, b: a.replace(*b)
    , [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
    , 'tomato' #The string from which to replace values
    )

...但是我喜欢上面的＃1正则表达式答案。注意-如果一个新值是另一个值的子字符串，则该操作不是可交换的。

— del_hol
source

1

您可以使用支持完全匹配以及正则表达式替换的pandas库和replace函数。例如：

df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})

to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']

print(df.text.replace(to_replace, replace_with, regex=True))

修改后的文本是：

0    name is going to visit city in month
1                      I was born in date
2                 I will be there at time

您可以在此处找到示例。请注意，文本上的替换是按照它们在列表中出现的顺序进行的

— 乔治·皮皮斯
source

1

要只替换一个字符，请使用translate和str.maketrans是我最喜欢的方法。

tl; dr> result_string = your_string.translate(str.maketrans(dict_mapping))

演示

my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
    result_bad = result_bad.replace(x, y)
print(result_good)  # ThsS sS a teSt Strsng.
print(result_bad)   # ThSS SS a teSt StrSng.

— 卡森
source

0

从安德鲁的宝贵答案开始，我开发了一个脚本，该脚本从文件中加载字典并详细说明打开的文件夹中的所有文件以进行替换。该脚本从可在其中设置分隔符的外部文件中加载映射。我是一个初学者，但是在多个文件中进行多次替换时，我发现此脚本非常有用。它以秒为单位加载了包含1000多个条目的字典。这不是优雅，但对我有用

import glob
import re

mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")

rep = {} # creation of empy dictionary

with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
    for line in temprep:
        (key, val) = line.strip('\n').split(sep)
        rep[key] = val

for filename in glob.iglob(mask): # recursion on all the files with the mask prompted

    with open (filename, "r") as textfile: # load each file in the variable text
        text = textfile.read()

        # start replacement
        #rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
        pattern = re.compile("|".join(rep.keys()))
        text = pattern.sub(lambda m: rep[m.group(0)], text)

        #write of te output files with the prompted suffice
        target = open(filename[:-4]+"_NEW.txt", "w")
        target.write(text)
        target.close()

— 汤玛索·桑迪（Tommaso Sandi）
source

0

这是我解决问题的方法。我在聊天机器人中使用它立即替换了不同的单词。

def mass_replace(text, dct):
    new_string = ""
    old_string = text
    while len(old_string) > 0:
        s = ""
        sk = ""
        for k in dct.keys():
            if old_string.startswith(k):
                s = dct[k]
                sk = k
        if s:
            new_string+=s
            old_string = old_string[len(sk):]
        else:
            new_string+=old_string[0]
            old_string = old_string[1:]
    return new_string

print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})

这将成为 The cat hunts the dog

— emorjon2
source

0

另一个例子：输入列表

error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']

所需的输出将是

words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']

代码：

[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]]

— 阿基尔·泰耶（Akhil Thayyil）
source

-2

或者只是为了快速破解：

for line in to_read:
    read_buffer = line              
    stripped_buffer1 = read_buffer.replace("term1", " ")
    stripped_buffer2 = stripped_buffer1.replace("term2", " ")
    write_to_file = to_write.write(stripped_buffer2)

— 布兰登·H
source

-2

这是使用字典的另一种方法：

listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)

— 斯蒂芬·格鲁恩瓦尔德（Stefan Gruenwald）
source