Remove substring only at the end of string

68

I have a bunch of strings, some of them have ' rec'. I want to remove that only if those are the last 4 characters.

So in other words I have

somestring = 'this is some string rec'

and I want it to become

somestring = 'this is some string'

What is the Python way to approach this?

python string

— Alex Gordon
source

possible duplicate of Python Remove last 3 characters of a string

— outis

2

possible duplicate of How do I remove a substring from the end of a string in Python?

— Ciro Santilli 郝海东冠状病六四事件法轮功

87

def rchop(s, suffix):
    if suffix and s.endswith(suffix):
        return s[:-len(suffix)]
    return s

somestring = 'this is some string rec'
rchop(somestring, ' rec')  # returns 'this is some string'

— Jack Kelly
source

1

Note that endswith can also take a tuple of suffixes to look for. If someone passes a tuple as the suffix with this function, you'll get the wrong result. It will check a list of strings but remove the length of the list of strings, not the length of the matching string.

— Boris

24

由于len(trailing)无论如何都必须得到（如果trailing要删除的字符串，要删除的字符串在哪里），我建议避免.endswith这种情况下可能引起的工作重复。当然，代码的证明是在时间上，因此，让我们做一些测量（在受访者提出功能后命名它们）：

import re

astring = 'this is some string rec'
trailing = ' rec'

def andrew(astring=astring, trailing=trailing):
    regex = r'(.*)%s$' % re.escape(trailing)
    return re.sub(regex, r'\1', astring)

def jack0(astring=astring, trailing=trailing):
    if astring.endswith(trailing):
        return astring[:-len(trailing)]
    return astring

def jack1(astring=astring, trailing=trailing):
    regex = r'%s$' % re.escape(trailing)
    return re.sub(regex, '', astring)

def alex(astring=astring, trailing=trailing):
    thelen = len(trailing)
    if astring[-thelen:] == trailing:
        return astring[:-thelen]
    return astring

假设我们已经命名了这个python文件a.py，它在当前目录中；现在，...：

$ python2.6 -mtimeit -s'import a' 'a.andrew()'
100000 loops, best of 3: 19 usec per loop
$ python2.6 -mtimeit -s'import a' 'a.jack0()'
1000000 loops, best of 3: 0.564 usec per loop
$ python2.6 -mtimeit -s'import a' 'a.jack1()'
100000 loops, best of 3: 9.83 usec per loop
$ python2.6 -mtimeit -s'import a' 'a.alex()'
1000000 loops, best of 3: 0.479 usec per loop

如您所见，基于RE的解决方案被“绝望地淘汰了”（当一个“过度杀伤”一个问题时经常发生-这可能是RE在Python社区中表现不佳的原因之一！），尽管@Jack的评论比@Andrew的原始评论好得多。正如预期的那样，基于字符串的解决方案endswith令人生畏，我的-避免方案比@Jack的解决方案具有微不足道的优势（仅提高了15％）。因此，这两种纯字符串的想法都是好的（以及简洁明了的想法）-我更喜欢我的变体，仅是因为我是一个节俭的人（有人可能会说小气；-）。。 “浪费不可”！-）

— 亚历克斯·马特利
source

您在导入a''a.xxx中有什么空格？

— Blankman 2010年

@Blankman，它是运行Python的bash命令：setup（-s）是一个参数，另一个对代码计时。每个都有引号，因此我不必担心它，包括空格和/或特殊字符。您总是在bash（和大多数其他shell，包括Windows自己的cmd.exe，所以我对您的问题感到惊讶！）中用空格分隔参数，并在shell命令中引用参数以在每个参数中保留空格和特殊字符也绝对不是我所谓的任何外壳的特殊，稀有或高级用法...！-）

— Alex Martelli 2010年

哦，我看到您endswith像我在杰克的回答中提到的那样绕开了。缓存len还可以避免Python（和C！）的可怕调用开销。

— 马特·乔纳

我想知道如果正则表达式只编译一次并重用多次，将会有什么样的表现。

— Conchylicultor

20

如果速度不重要，请使用正则表达式：

import re

somestring='this is some string rec'

somestring = re.sub(' rec$', '', somestring)

— Per Mejdal Rasmussen
source

6

这是杰克·凯利（Jack Kelly）的答案及其同级的单线版本：

def rchop(s, sub):
    return s[:-len(sub)] if s.endswith(sub) else s

def lchop(s, sub):
    return s[len(sub):] if s.startswith(sub) else s

— 地精
source

如果sub是一个空字符串，将不起作用。您错过了支票。

— Iceflower S

5

从开始Python 3.9，您可以使用removesuffix：

'this is some string rec'.removesuffix(' rec')
# 'this is some string'

— Xavier Guihot
source

作为补充，这是由PEP616（带有str.removeprefix）引入的-Conchylicultor

— 6

4

您也可以使用正则表达式：

from re import sub

str = r"this is some string rec"
regex = r"(.*)\srec$"
print sub(regex, r"\1", str)

— 安德鲁·黑尔
source

10

在这里捕获群组实在是太过分了。sub(' rec$', '', str)作品。

— 杰克·凯利2010年

0

作为一种班轮发电机的加入：

test = """somestring='this is some string rec'
this is some string in the end word rec
This has not the word."""
match = 'rec'
print('\n'.join((line[:-len(match)] if line.endswith(match) else line)
      for line in test.splitlines()))
""" Output:
somestring='this is some string rec'
this is some string in the end word 
This has not the word.
"""

— 托尼·韦嘉兰宁
source

0

使用more_itertools，我们可以rstrip传递通过谓词的字符串。

安装

> pip install more_itertools

码

import more_itertools as mit


iterable = "this is some string rec".split()
" ".join(mit.rstrip(iterable, pred=lambda x: x in {"rec", " "}))
# 'this is some string'

" ".join(mit.rstrip(iterable, pred=lambda x: x in {"rec", " "}))
# 'this is some string'

在这里，我们传递希望从末尾剥离的所有尾随项目。

另请参阅more_itertools文档以了解详细信息。

— pylang
source

0

借鉴@ David Foster的灵感，我会做

def _remove_suffix(text, suffix):
    if text is not None and suffix is not None:
        return text[:-len(suffix)] if text.endswith(suffix) else text
    else:
        return text

参考：Python字符串切片

— y2k-shubham
source

0


def remove_trailing_string(content, trailing):
    """
    Strip trailing component `trailing` from `content` if it exists.
    """
    if content.endswith(trailing) and content != trailing:
        return content[:-len(trailing)]
    return content

— 埃桑·艾哈迈迪（Ehsan Ahmadi）
source

-2

采用：

somestring.rsplit(' rec')[0]

— 用户名
source

2

这并不总是有效。它将在所有`rec`出现时在字符串处分割并返回第一个片段。如果分隔符字符串可以重叠或指定了最大分割数，则前缀l和r只会有所不同。

— jan-glx