如何在Python中从字符串末尾删除子字符串？

381

我有以下代码：

url = 'abcdc.com'
print(url.strip('.com'))

我期望： abcdc

我有： abcd

现在我做

url.rsplit('.com', 1)

有没有更好的办法？

python string

— 拉米亚
source

6

strip去除字符串两端给出的字符，在您的情况下，它去除“。”，“ c”，“ o”和“ m”。

— truppo

6

它还将从字符串的开头删除那些字符。如果您只想从最后删除它，请使用rstrip（）

— Andre Miller 2009年

42

是的 str.strip不会执行您认为的操作。str.strip删除字符串开头和结尾中指定的任何字符。因此，“ acbacda” .strip（“ ad”）给出了'cbac'; 开头的a和结尾的da被删除。干杯。

— scvalex

2

另外，这将以任何顺序删除字符：“ site.ocm”>“ site”。

— Eric O Lebigot

1

@scvalex，哇才意识到这个已经用它的年龄这样-这是危险的，因为代码经常发生在工作反正

— 闪光

555

strip并不意味着“删除此子字符串”。x.strip(y)视为y一组字符，并从的末尾剥离该组中的所有字符x。

相反，您可以使用endswith和切片：

url = 'abcdc.com'
if url.endswith('.com'):
    url = url[:-4]

或使用正则表达式：

import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)

— Steef
source

4

是的，我本人认为第一个示例使用endswith（）测试会更好。正则表达式会涉及一些性能损失（解析正则表达式等）。我不会使用rsplit（），但是那是因为我不知道您到底想实现什么。我认为如果并且仅当它出现在URL的末尾时，它将删除.com？如果您要在“ www.commercialthingie.co.uk”等域名上使用rsplit解决方案，则会给您带来麻烦

— Steef

13

url = url[:-4] if any(url.endswith(x) for x in ('.com','.net')) else url

— Burhan Khalid

1

如果我写EXAMLPLE.COM域名不区分大小写该怎么办。（这是对正则表达式解决方案的投票）

— Jasen 2015年

3

这不是重写，rsplit()解决方案的行为endswith()与原始字符串末尾没有子字符串而是在中间某处时的行为不同。例如："www.comeandsee.com".rsplit(".com",1)[0] == "www.comeandsee"但是"www.comeandsee.net".rsplit(".com",1)[0] == "www"

— Steef，

1

语法s[:-n]有一个警告：for n = 0，它不会返回被切掉最后零个字符的字符串，而是返回空字符串。

— BlenderBender

90

如果您确定字符串仅出现在末尾，则最简单的方法是使用“替换”：

url = 'abcdc.com'
print(url.replace('.com',''))

— 查尔斯·科利斯
source

56

这也将替换url之类的www.computerhope.com。做检查，endswith()应该没问题。

— ghostdog74 2010年

72

"www.computerhope.com".endswith(".com")是真的，它仍然会破裂！

1

“如果您确定该字符串仅出现在末尾”是否表示“如果您确定该子字符串仅出现一次”？当子字符串位于中间时，替换似乎也起作用，但正如其他评论所建议的那样，它将替换子字符串的任何出现，为什么它应该在结尾我不明白

— idclev 463035818

49

def strip_end(text, suffix):
    if not text.endswith(suffix):
        return text
    return text[:len(text)-len(suffix)]

— 耶尔楚
source

4

如果您知道后缀不为空（例如当它为常量时），则：return text [：

— -len

4

谢谢。最后一行可以缩短：return text[:-len(suffix)]

— Jabba 2013年

3

@Jabba：可悲的是，这对空后缀是无效的，正如fuenfundachtzig所提到的。

— yairchu 2013年

46

由于似乎没有人指出这一点：

url = "www.example.com"
new_url = url[:url.rfind(".")]

这应该比split()不使用任何新列表对象的方法更有效，并且此解决方案适用于带有多个点的字符串。

— 用户名
source

哇，真是个绝招。我无法使它失败，但是我也很难想出可能会失败的方式。我喜欢它，但是它非常“神奇”，仅通过查看就很难知道它的作用。我必须在思维上处理线的每个部分以“得到它”。

— DevPlayer 2015年

14

如果不存在搜索到的字符串，此操作将失败，并且会错误地删除最后一个字符。

— robbat2

25

取决于您对网址的了解以及您要尝试的内容。如果您知道它将始终以“ .com”（或“ .net”或“ .org”）结尾，则

 url=url[:-4]

是最快的解决方案。如果它是更通用的URL，那么最好研究一下python随附的urlparse库。

另一方面，如果您只是想删除最后一个“。”之后的所有内容。然后是一个字符串

url.rsplit('.',1)[0]

将工作。或者，如果您只想让所有内容都达到第一个“。”。然后尝试

url.split('.',1)[0]

— 达格
source

16

如果您知道这是一个扩展，那么

url = 'abcdc.com'
...
url.rsplit('.', 1)[0]  # split at '.', starting from the right, maximum 1 split

这与abcdc.comor www.abcdc.com或or 同样有效，abcdc.[anything]并且可扩展性更高。

— 约翰·梅塔
source

12

一行：

text if not text.endswith(suffix) or len(suffix) == 0 else text[:-len(suffix)]

— 大卫·福斯特
source

8

怎么url[:-4]样

— 达伦·托马斯（Daren Thomas）
source

7

对于url（在给定的示例中，它似乎是主题的一部分），可以执行以下操作：

import os
url = 'http://www.stackoverflow.com'
name,ext = os.path.splitext(url)
print (name, ext)

#Or:
ext = '.'+url.split('.')[-1]
name = url[:-len(ext)]
print (name, ext)

两者都将输出： ('http://www.stackoverflow', '.com')

str.endswith(suffix)如果您只需要分割“ .com”或其他特定内容，也可以将其结合使用。

— 霍尔塔
source

5

url.rsplit（'。com'，1）

不太正确。

您实际需要写的是

url.rsplit('.com', 1)[0]

，而且恕我直言。

但是，我个人偏爱此选项，因为它仅使用一个参数：

url.rpartition('.com')[0]

— winni2k
source

1

当只需要一个拆分时，最好使用+1分区，因为它总是返回一个答案，不会发生IndexError。

— Gringo Suave

3

从开始Python 3.9，您可以removesuffix改用：

'abcdc.com'.removesuffix('.com')
# 'abcdc'

— Xavier Guihot
source

2

如果需要剥离某个字符串的某个末端（如果存在），否则什么也不做。我最好的解决方案。您可能会想使用前两个实现之一，但是为了完整起见，我包括了第三个实现。

对于恒定的后缀：

def remove_suffix(v, s):
    return v[:-len(s) if v.endswith(s) else v
remove_suffix("abc.com", ".com") == 'abc'
remove_suffix("abc", ".com") == 'abc'

对于正则表达式：

def remove_suffix_compile(suffix_pattern):
    r = re.compile(f"(.*?)({suffix_pattern})?$")
    return lambda v: r.match(v)[1]
remove_domain = remove_suffix_compile(r"\.[a-zA-Z0-9]{3,}")
remove_domain("abc.com") == "abc"
remove_domain("sub.abc.net") == "sub.abc"
remove_domain("abc.") == "abc."
remove_domain("abc") == "abc"

对于常量后缀的集合，用于大量调用的渐近最快方法：

def remove_suffix_preprocess(*suffixes):
    suffixes = set(suffixes)
    try:
        suffixes.remove('')
    except KeyError:
        pass

    def helper(suffixes, pos):
        if len(suffixes) == 1:
            suf = suffixes[0]
            l = -len(suf)
            ls = slice(0, l)
            return lambda v: v[ls] if v.endswith(suf) else v
        si = iter(suffixes)
        ml = len(next(si))
        exact = False
        for suf in si:
            l = len(suf)
            if -l == pos:
                exact = True
            else:
                ml = min(len(suf), ml)
        ml = -ml
        suffix_dict = {}
        for suf in suffixes:
            sub = suf[ml:pos]
            if sub in suffix_dict:
                suffix_dict[sub].append(suf)
            else:
                suffix_dict[sub] = [suf]
        if exact:
            del suffix_dict['']
            for key in suffix_dict:
                suffix_dict[key] = helper([s[:pos] for s in suffix_dict[key]], None)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v[:pos])
        else:
            for key in suffix_dict:
                suffix_dict[key] = helper(suffix_dict[key], ml)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v)
    return helper(tuple(suffixes), None)
domain_remove = remove_suffix_preprocess(".com", ".net", ".edu", ".uk", '.tv', '.co.uk', '.org.uk')

最后一个在pypy中可能要比cpython快得多。对于几乎所有不涉及潜在后缀的巨大词典的情况，regex变体可能比此方法更快，至少在cPython中这些潜在后缀无法轻易地表示为regex。

在PyPy中，即使re模块使用DFA编译正则表达式引擎，对于大量调用或长字符串来说，正则表达式变体几乎肯定会变慢，因为JIT会优化lambda的大部分开销。

但是，在cPython中，您几乎可以肯定地比较了正在运行的regex的c代码这一事实，这几乎可以证明后缀集合版本在算法上的优势。

— 用户名
source

2

如果您只打算去除扩展名：

'.'.join('abcdc.com'.split('.')[:-1])
# 'abcdc'

它适用于任何扩展名，文件名中也可能存在其他点。它只是将字符串拆分为点列表，并在没有最后一个元素的情况下将其加入。

— 直流电
source

2

import re

def rm_suffix(url = 'abcdc.com', suffix='\.com'):
    return(re.sub(suffix+'$', '', url))

我想重复这个答案，以此作为最有表现力的方式。当然，以下操作会减少CPU时间：

def rm_dotcom(url = 'abcdc.com'):
    return(url[:-4] if url.endswith('.com') else url)

但是，如果CPU是瓶颈，为什么要用Python编写？

无论如何，CPU何时会成为瓶颈？在司机中，也许。

使用正则表达式的优点是代码可重用性。如果下一个要删除只有三个字符的'.me'怎么办？

相同的代码可以解决问题：

>>> rm_sub('abcdc.me','.me')
'abcdc'

— 用户名
source

1

就我而言，我需要提出一个例外，所以我做到了：

class UnableToStripEnd(Exception):
    """A Exception type to indicate that the suffix cannot be removed from the text."""

    @staticmethod
    def get_exception(text, suffix):
        return UnableToStripEnd("Could not find suffix ({0}) on text: {1}."
                                .format(suffix, text))


def strip_end(text, suffix):
    """Removes the end of a string. Otherwise fails."""
    if not text.endswith(suffix):
        raise UnableToStripEnd.get_exception(text, suffix)
    return text[:len(text)-len(suffix)]

— 胡安·伊萨扎
source

1

在这里，我有一个最简单的代码。

url=url.split(".")[0]

— Anshuman Jayaprakash
source

1

假定您要删除域，无论它是什么（.com，.net等）。我建议找到，.然后从此删除所有内容。

url = 'abcdc.com'
dot_index = url.rfind('.')
url = url[:dot_index]

在这里，我rfind用来解决url之类的问题abcdc.com.net，应该将其简化为name abcdc.com。

如果您还担心www.s，则应明确检查它们：

if url.startswith("www."):
   url = url.replace("www.","", 1)

替换中的1用于奇怪的边缘情况，例如 www.net.www.com

如果您的网址比该网址更野，请查看人们响应的正则表达式答案。

— Xavier Guay
source

1

我使用内置的rstrip函数来执行此操作，如下所示：

string = "test.com"
suffix = ".com"
newstring = string.rstrip(suffix)
print(newstring)
test

— 亚历克斯
source

馊主意。尝试"test.ccom"。

— Shital Shah，

但这不是问题的重点。只是要求从另一个末尾删除一个已知的子字符串。这完全符合预期。

— Alex

1

您可以使用split：

'abccomputer.com'.split('.com',1)[0]
# 'abccomputer'

— 卢卡斯
source

5

当a = 'www.computerbugs.com'结果为'www'

— yairchu

0

这是正则表达式的完美用法：

>>> import re
>>> re.match(r"(.*)\.com", "hello.com").group(1)
'hello'

— 亚伦·曼帕（Aaron Maenpaa）
source

5

您还应该添加一个$，以确保匹配以“ .com” 结尾的主机名。

— Cristian Ciupitu 09年

0

Python> = 3.9：

'abcdc.com'.removesuffix('.com')

Python <3.9：

def remove_suffix(text, suffix):
    if text.endswith(suffix):
        text = text[:-len(suffix)]
    return text

remove_suffix('abcdc.com', '.com')

— 无限
source

1

您对Python 3.9的答案与上面的答案相同。在此线程中，您对以前版本的答案也已得到多次回答，并且如果字符串没有后缀，则不会返回任何内容。

— Xavier Guihot