Remove substring only at the end of string


68

I have a bunch of strings, some of them have ' rec'. I want to remove that only if those are the last 4 characters.

So in other words I have

somestring = 'this is some string rec'

and I want it to become

somestring = 'this is some string'

What is the Python way to approach this?



Answers:


87
def rchop(s, suffix):
    if suffix and s.endswith(suffix):
        return s[:-len(suffix)]
    return s

somestring = 'this is some string rec'
rchop(somestring, ' rec')  # returns 'this is some string'

1
Note that endswith can also take a tuple of suffixes to look for. If someone passes a tuple as the suffix with this function, you'll get the wrong result. It will check a list of strings but remove the length of the list of strings, not the length of the matching string.
Boris

24

由于len(trailing)无论如何都必须得到(如果trailing要删除的字符串,要删除的字符串在哪里),我建议避免.endswith这种情况下可能引起的工作重复。当然,代码的证明是在时间上,因此,让我们做一些测量(在受访者提出功能后命名它们):

import re

astring = 'this is some string rec'
trailing = ' rec'

def andrew(astring=astring, trailing=trailing):
    regex = r'(.*)%s$' % re.escape(trailing)
    return re.sub(regex, r'\1', astring)

def jack0(astring=astring, trailing=trailing):
    if astring.endswith(trailing):
        return astring[:-len(trailing)]
    return astring

def jack1(astring=astring, trailing=trailing):
    regex = r'%s$' % re.escape(trailing)
    return re.sub(regex, '', astring)

def alex(astring=astring, trailing=trailing):
    thelen = len(trailing)
    if astring[-thelen:] == trailing:
        return astring[:-thelen]
    return astring

假设我们已经命名了这个python文件a.py,它在当前目录中;现在,...:

$ python2.6 -mtimeit -s'import a' 'a.andrew()'
100000 loops, best of 3: 19 usec per loop
$ python2.6 -mtimeit -s'import a' 'a.jack0()'
1000000 loops, best of 3: 0.564 usec per loop
$ python2.6 -mtimeit -s'import a' 'a.jack1()'
100000 loops, best of 3: 9.83 usec per loop
$ python2.6 -mtimeit -s'import a' 'a.alex()'
1000000 loops, best of 3: 0.479 usec per loop

如您所见,基于RE的解决方案被“绝望地淘汰了”(当一个“过度杀伤”一个问题时经常发生-这可能是RE在Python社区中表现不佳的原因之一!),尽管@Jack的评论比@Andrew的原始评论好得多。正如预期的那样,基于字符串的解决方案endswith令人生畏,我的-避免方案比@Jack的解决方案具有微不足道的优势(仅提高了15%)。因此,这两种纯字符串的想法都是好的(以及简洁明了的想法)-我更喜欢我的变体,仅是因为我是一个节俭的人(有人可能会说小气;-)。 。 “浪费不可”!-)


您在导入a''a.xxx中有什么空格?
Blankman 2010年

@Blankman,它是运行Python的bash命令:setup(-s)是一个参数,另一个对代码计时。每个都有引号,因此我不必担心它,包括空格和/或特殊字符。您总是在bash(和大多数其他shell,包括Windows自己的cmd.exe,所以我对您的问题感到惊讶!)中用空格分隔参数,并在shell命令中引用参数以在每个参数中保留空格和特殊字符也绝对不是我所谓的任何外壳的特殊,稀有或高级用法...!-)
Alex Martelli 2010年

哦,我看到您endswith像我在杰克的回答中提到的那样绕开了。缓存len还可以避免Python(和C!)的可怕调用开销。
马特·乔纳

我想知道如果正则表达式只编译一次并重用多次,将会有什么样的表现。
Conchylicultor

20

如果速度不重要,请使用正则表达式:

import re

somestring='this is some string rec'

somestring = re.sub(' rec$', '', somestring)

6

这是杰克·凯利(Jack Kelly)的答案及其同级的单线版本:

def rchop(s, sub):
    return s[:-len(sub)] if s.endswith(sub) else s

def lchop(s, sub):
    return s[len(sub):] if s.startswith(sub) else s

如果sub是一个空字符串,将不起作用。您错过了支票。
Iceflower S


4

您也可以使用正则表达式:

from re import sub

str = r"this is some string rec"
regex = r"(.*)\srec$"
print sub(regex, r"\1", str)

10
在这里捕获群组实在是太过分了。sub(' rec$', '', str)作品。
杰克·凯利2010年

0

作为一种班轮发电机的加入:

test = """somestring='this is some string rec'
this is some string in the end word rec
This has not the word."""
match = 'rec'
print('\n'.join((line[:-len(match)] if line.endswith(match) else line)
      for line in test.splitlines()))
""" Output:
somestring='this is some string rec'
this is some string in the end word 
This has not the word.
"""

0

使用more_itertools,我们可以rstrip传递通过谓词的字符串。

安装

> pip install more_itertools

import more_itertools as mit


iterable = "this is some string rec".split()
" ".join(mit.rstrip(iterable, pred=lambda x: x in {"rec", " "}))
# 'this is some string'

" ".join(mit.rstrip(iterable, pred=lambda x: x in {"rec", " "}))
# 'this is some string'

在这里,我们传递希望从末尾剥离的所有尾随项​​目。

另请参阅more_itertools文档以了解详细信息。



0

def remove_trailing_string(content, trailing):
    """
    Strip trailing component `trailing` from `content` if it exists.
    """
    if content.endswith(trailing) and content != trailing:
        return content[:-len(trailing)]
    return content

-2

采用:

somestring.rsplit(' rec')[0]

2
这并不总是有效。它将在所有`rec`出现时在字符串处分割并返回第一个片段。如果分隔符字符串可以重叠或指定了最大分割数,则前缀l和r只会有所不同。
jan-glx
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.