从C#字符串中删除字符


150

如何删除字符串中的字符?例如:"My name @is ,Wan.;'; Wan"

我想'@', ',', '.', ';', '\''从该字符串中删除字符,以使其成为"My name is Wan Wan"

Answers:


177
var str = "My name @is ,Wan.;'; Wan";
var charsToRemove = new string[] { "@", ",", ".", ";", "'" };
foreach (var c in charsToRemove)
{
    str = str.Replace(c, string.Empty);
}

但如果您想删除所有非字母字符,我可能会建议另一种方法

var str = "My name @is ,Wan.;'; Wan";
str = new string((from c in str
                  where char.IsWhiteSpace(c) || char.IsLetterOrDigit(c)
                  select c
       ).ToArray());

12
也可以这样完成:str = new string(str.Where(x => char.IsWhiteSpace(x)|| char.IsLetterOrDigit(x))。ToArray());
Adnan Bhatti

1
我必须查找一下string.Empty不会为比较创建一个字符串,因此它比“”更有效。(stackoverflow.com/questions/151472/...
汤姆Cerul

6
我是唯一一个收到“参数2:无法从'string'转换为'char'” om string.Empty的人吗?
OddDev

2
@OddDev,仅当循环遍历的数组是字符列表时,才应出现此错误。如果它们是字符串,则应该可以工作
Newteq Developer

3
另外,请注意,要使“ str.Replace”功能正常工作,如果要使用string.Empty作为第二个参数,则第一个参数必须为“ string”。如果将char(即'a')用作第一个参数,则还需要使用char作为第二个参数。否则,您将收到上述@OddDev提到的“参数2:无法从'字符串'转换为'字符'的错误”
Leo

68

简单:

String.Join("", "My name @is ,Wan.;'; Wan".Split('@', ',' ,'.' ,';', '\''));

64

听起来像RegEx的理想应用程序-一种用于快速文本操作的引擎。在这种情况下:

Regex.Replace("He\"ll,o Wo'r.ld", "[@,\\.\";'\\\\]", string.Empty)

3
似乎这样比基于迭代器的方法要有效得多,尤其是如果您可以使用编译的Regex;
阿德·米勒

这应该是公认的答案,尤其是因为像@AdeMiller所说的那样,它会更加高效。
黑曜石

14
这并不比循环快,这是一个普遍的误解,即正则表达式总是比循环快。正则表达式不是魔术,它们的核心必须在某个时候遍历字符串以执行其操作,而正则表达式本身的开销可能会慢得多。当涉及到极其复杂的操作时(它们需要数十行代码和多个循环),它们确实非常出色。在一个未经优化的简单循环中测试此正则表达式的编译版本50000次,该正则表达式慢6倍。
Tony Cheetham

内存效率如何?在分配新的字符串方面,正则表达式会不会更有效?
Marek '18

2
当我断言RegEx速度很快时,也许我弄错了。除非这是一个非常紧密的循环的中心,否则对于这样的小型操作,其他考虑因素(如可读性和可维护性)可能会在性能上占主导地位。
约翰·梅尔维尔,

21

对您的问题不太具体,可以通过在正则表达式中白列出可接受的字符来删除字符串(空格除外)中的所有标点符号:

string dirty = "My name @is ,Wan.;'; Wan";

// only space, capital A-Z, lowercase a-z, and digits 0-9 are allowed in the string
string clean = Regex.Replace(dirty, "[^A-Za-z0-9 ]", "");

请注意,在9后面有一个空格,以免从句子中删除空格。第三个参数是一个空字符串,用于替换不属于正则表达式的任何子字符串。


19

比较各种建议(以及在将单个字符替换为目标的各种大小和位置的情况下进行比较)。

在这种特殊情况下,在目标上拆分并加入替换项(在这种情况下为空字符串)最快是至少3倍。最终,性能会因替换项的数量而异,其中替换项位于源以及源的大小。#ymmv

结果

(完整结果在这里

| Test                      | Compare | Elapsed                                                            |
|---------------------------|---------|--------------------------------------------------------------------|
| SplitJoin                 | 1.00x   | 29023 ticks elapsed (2.9023 ms) [in 10K reps, 0.00029023 ms per]   |
| Replace                   | 2.77x   | 80295 ticks elapsed (8.0295 ms) [in 10K reps, 0.00080295 ms per]   |
| RegexCompiled             | 5.27x   | 152869 ticks elapsed (15.2869 ms) [in 10K reps, 0.00152869 ms per] |
| LinqSplit                 | 5.43x   | 157580 ticks elapsed (15.758 ms) [in 10K reps, 0.0015758 ms per]   |
| Regex, Uncompiled         | 5.85x   | 169667 ticks elapsed (16.9667 ms) [in 10K reps, 0.00169667 ms per] |
| Regex                     | 6.81x   | 197551 ticks elapsed (19.7551 ms) [in 10K reps, 0.00197551 ms per] |
| RegexCompiled Insensitive | 7.33x   | 212789 ticks elapsed (21.2789 ms) [in 10K reps, 0.00212789 ms per] |
| Regex Insentive           | 7.52x   | 218164 ticks elapsed (21.8164 ms) [in 10K reps, 0.00218164 ms per] |

测试线束(LinqPad)

(注意:PerfVs我编写的时间扩展

void test(string title, string sample, string target, string replacement) {
    var targets = target.ToCharArray();

    var tox = "[" + target + "]";
    var x = new Regex(tox);
    var xc = new Regex(tox, RegexOptions.Compiled);
    var xci = new Regex(tox, RegexOptions.Compiled | RegexOptions.IgnoreCase);

    // no, don't dump the results
    var p = new Perf/*<string>*/();
        p.Add(string.Join(" ", title, "Replace"), n => targets.Aggregate(sample, (res, curr) => res.Replace(new string(curr, 1), replacement)));
        p.Add(string.Join(" ", title, "SplitJoin"), n => String.Join(replacement, sample.Split(targets)));
        p.Add(string.Join(" ", title, "LinqSplit"), n => String.Concat(sample.Select(c => targets.Contains(c) ? replacement : new string(c, 1))));
        p.Add(string.Join(" ", title, "Regex"), n => Regex.Replace(sample, tox, replacement));
        p.Add(string.Join(" ", title, "Regex Insentive"), n => Regex.Replace(sample, tox, replacement, RegexOptions.IgnoreCase));
        p.Add(string.Join(" ", title, "Regex, Uncompiled"), n => x.Replace(sample, replacement));
        p.Add(string.Join(" ", title, "RegexCompiled"), n => xc.Replace(sample, replacement));
        p.Add(string.Join(" ", title, "RegexCompiled Insensitive"), n => xci.Replace(sample, replacement));

    var trunc = 40;
    var header = sample.Length > trunc ? sample.Substring(0, trunc) + "..." : sample;

    p.Vs(header);
}

void Main()
{
    // also see /programming/7411438/remove-characters-from-c-sharp-string

    "Control".Perf(n => { var s = "*"; });


    var text = "My name @is ,Wan.;'; Wan";
    var clean = new[] { '@', ',', '.', ';', '\'' };

    test("stackoverflow", text, string.Concat(clean), string.Empty);


    var target = "o";
    var f = "x";
    var replacement = "1";

    var fillers = new Dictionary<string, string> {
        { "short", new String(f[0], 10) },
        { "med", new String(f[0], 300) },
        { "long", new String(f[0], 1000) },
        { "huge", new String(f[0], 10000) }
    };

    var formats = new Dictionary<string, string> {
        { "start", "{0}{1}{1}" },
        { "middle", "{1}{0}{1}" },
        { "end", "{1}{1}{0}" }
    };

    foreach(var filler in fillers)
    foreach(var format in formats) {
        var title = string.Join("-", filler.Key, format.Key);
        var sample = string.Format(format.Value, target, filler.Value);

        test(title, sample, target, replacement);
    }
}

1
最后一些数字!干得好@drzaus!
Marek



6

另一个简单的解决方案:

var forbiddenChars = @"@,.;'".ToCharArray();
var dirty = "My name @is ,Wan.;'; Wan";
var clean = new string(dirty.Where(c => !forbiddenChars.Contains(c)).ToArray());

5
new List<string> { "@", ",", ".", ";", "'" }.ForEach(m => str = str.Replace(m, ""));

4

字符串只是一个字符数组,因此请使用Linq进行替换(类似于上面的Albin,不同之处在于使用linq contains语句进行替换):

var resultString = new string(
        (from ch in "My name @is ,Wan.;'; Wan"
         where ! @"@,.;\'".Contains(ch)
         select ch).ToArray());

第一个字符串是替换字符的字符串,第二个字符串是包含字符的简单字符串


Albin的Linq解决方案可能更好,除非您希望过滤掉其他字符(不包括空格,字母和数字)。
alistair 2011年

3

我最好把它丢在这里。

进行扩展以从字符串中删除字符:

public static string RemoveChars(this string input, params char[] chars)
{
    var sb = new StringBuilder();
    for (int i = 0; i < input.Length; i++)
    {
        if (!chars.Contains(input[i]))
            sb.Append(input[i]);
    }
    return sb.ToString();
}

可以这样使用:

string str = "My name @is ,Wan.;'; Wan";
string cleanedUpString = str.RemoveChars('@', ',', '.', ';', '\'');

或像这样:

string str = "My name @is ,Wan.;'; Wan".RemoveChars('@', ',', '.', ';', '\'');

这是最好的解决方案,因为它分配了最少的内存。我还将原始字符串的长度设置为字符串生成器的初始容量,例如:new StringBuilder(input.Length),目的是使内存分配的数量最少。
treaschf

3

似乎最短的方法是将LINQ和string.Concat

var input = @"My name @is ,Wan.;'; Wan";
var chrs = new[] {'@', ',', '.', ';', '\''};
var result = string.Concat(input.Where(c => !chrs.Contains(c)));
// => result = "My name is Wan Wan" 

参见C#演示。请注意,这string.Concatstring.Join("", ...)

请注意,尽管人们认为正则表达式的运行速度较慢,但​​是使用正则表达式删除单个已知字符仍然可以动态构建。但是,这是构建这种动态正则表达式的一种方法(您只需要一个字符类):

var pattern = $"[{Regex.Escape(new string(chrs))}]+";
var result = Regex.Replace(input, pattern, string.Empty);

参见另一个C#演示。正则表达式看起来像[@,\.;']+(一个或多个匹配(+)的连续出现@,.;'字符),其中的点没有进行转义,但Regex.Escape有必要逃避必须转义,像其他字符\^]或者-其位置在无法预测的角色类中。


在某些情况下,LINQ方式缓慢地令人恐惧
drzaus

3

这是我写的方法,但方法略有不同。我没有指定要删除的字符,而是告诉我的方法要保留的字符-它将删除所有其他字符。

在OP的示例中,他只想保留字母字符和空格。这是调用我的方法的样子(C#演示):

var str = "My name @is ,Wan.;'; Wan";

// "My name is Wan Wan"
var result = RemoveExcept(str, alphas: true, spaces: true);

这是我的方法:

/// <summary>
/// Returns a copy of the original string containing only the set of whitelisted characters.
/// </summary>
/// <param name="value">The string that will be copied and scrubbed.</param>
/// <param name="alphas">If true, all alphabetical characters (a-zA-Z) will be preserved; otherwise, they will be removed.</param>
/// <param name="numerics">If true, all alphabetical characters (a-zA-Z) will be preserved; otherwise, they will be removed.</param>
/// <param name="dashes">If true, all alphabetical characters (a-zA-Z) will be preserved; otherwise, they will be removed.</param>
/// <param name="underlines">If true, all alphabetical characters (a-zA-Z) will be preserved; otherwise, they will be removed.</param>
/// <param name="spaces">If true, all alphabetical characters (a-zA-Z) will be preserved; otherwise, they will be removed.</param>
/// <param name="periods">If true, all decimal characters (".") will be preserved; otherwise, they will be removed.</param>
public static string RemoveExcept(string value, bool alphas = false, bool numerics = false, bool dashes = false, bool underlines = false, bool spaces = false, bool periods = false) {
    if (string.IsNullOrWhiteSpace(value)) return value;
    if (new[] { alphas, numerics, dashes, underlines, spaces, periods }.All(x => x == false)) return value;

    var whitelistChars = new HashSet<char>(string.Concat(
        alphas ? "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" : "",
        numerics ? "0123456789" : "",
        dashes ? "-" : "",
        underlines ? "_" : "",
        periods ? "." : "",
        spaces ? " " : ""
    ).ToCharArray());

    var scrubbedValue = value.Aggregate(new StringBuilder(), (sb, @char) => {
        if (whitelistChars.Contains(@char)) sb.Append(@char);
        return sb;
    }).ToString();

    return scrubbedValue;
}

很棒的答案!
edtheprogrammerguy

非常好!数字字符串有两次0。
约翰·库兹

@JohnKurtz好的收获-现在不见了。
Mass Dot Net

2

这里有很多很好的答案,这是我的补充,还有几个可以用来帮助测试正确性的单元测试,我的解决方案类似于上面的@Rianne,但是使用ISet在替换字符上提供O(1)查找时间(以及类似于@Albin Sunnanbo的Linq解决方案)。

    using System;
    using System.Collections.Generic;
    using System.Linq;

    /// <summary>
    /// Returns a string with the specified characters removed.
    /// </summary>
    /// <param name="source">The string to filter.</param>
    /// <param name="removeCharacters">The characters to remove.</param>
    /// <returns>A new <see cref="System.String"/> with the specified characters removed.</returns>
    public static string Remove(this string source, IEnumerable<char> removeCharacters)
    {
        if (source == null)
        {
            throw new  ArgumentNullException("source");
        }

        if (removeCharacters == null)
        {
            throw new ArgumentNullException("removeCharacters");
        }

        // First see if we were given a collection that supports ISet
        ISet<char> replaceChars = removeCharacters as ISet<char>;

        if (replaceChars == null)
        {
            replaceChars = new HashSet<char>(removeCharacters);
        }

        IEnumerable<char> filtered = source.Where(currentChar => !replaceChars.Contains(currentChar));

        return new string(filtered.ToArray());
    }

NUnit(2.6+)测试在这里

using System;
using System.Collections;
using System.Collections.Generic;
using NUnit.Framework;

[TestFixture]
public class StringExtensionMethodsTests
{
    [TestCaseSource(typeof(StringExtensionMethodsTests_Remove_Tests))]
    public void Remove(string targetString, IEnumerable<char> removeCharacters, string expected)
    {
        string actual = StringExtensionMethods.Remove(targetString, removeCharacters);

        Assert.That(actual, Is.EqualTo(expected));
    }

    [TestCaseSource(typeof(StringExtensionMethodsTests_Remove_ParameterValidation_Tests))]
    public void Remove_ParameterValidation(string targetString, IEnumerable<char> removeCharacters)
    {
        Assert.Throws<ArgumentNullException>(() => StringExtensionMethods.Remove(targetString, removeCharacters));
    }
}

internal class StringExtensionMethodsTests_Remove_Tests : IEnumerable
{
    public IEnumerator GetEnumerator()
    {
        yield return new TestCaseData("My name @is ,Wan.;'; Wan", new char[] { '@', ',', '.', ';', '\'' }, "My name is Wan Wan").SetName("StringUsingCharArray");
        yield return new TestCaseData("My name @is ,Wan.;'; Wan", new HashSet<char> { '@', ',', '.', ';', '\'' }, "My name is Wan Wan").SetName("StringUsingISetCollection");
        yield return new TestCaseData(string.Empty, new char[1], string.Empty).SetName("EmptyStringNoReplacementCharactersYieldsEmptyString");
        yield return new TestCaseData(string.Empty, new char[] { 'A', 'B', 'C' }, string.Empty).SetName("EmptyStringReplacementCharsYieldsEmptyString");
        yield return new TestCaseData("No replacement characters", new char[1], "No replacement characters").SetName("StringNoReplacementCharactersYieldsString");
        yield return new TestCaseData("No characters will be replaced", new char[] { 'Z' }, "No characters will be replaced").SetName("StringNonExistantReplacementCharactersYieldsString");
        yield return new TestCaseData("AaBbCc", new char[] { 'a', 'C' }, "ABbc").SetName("CaseSensitivityReplacements");
        yield return new TestCaseData("ABC", new char[] { 'A', 'B', 'C' }, string.Empty).SetName("AllCharactersRemoved");
        yield return new TestCaseData("AABBBBBBCC", new char[] { 'A', 'B', 'C' }, string.Empty).SetName("AllCharactersRemovedMultiple");
        yield return new TestCaseData("Test That They Didn't Attempt To Use .Except() which returns distinct characters", new char[] { '(', ')' }, "Test That They Didn't Attempt To Use .Except which returns distinct characters").SetName("ValidateTheStringIsNotJustDistinctCharacters");
    }
}

internal class StringExtensionMethodsTests_Remove_ParameterValidation_Tests : IEnumerable
{
    public IEnumerator GetEnumerator()
    {
        yield return new TestCaseData(null, null);
        yield return new TestCaseData("valid string", null);
        yield return new TestCaseData(null, new char[1]);
    }
}

2

我通常在相同情况下使用的强大方法:

private string Normalize(string text)
{
        return string.Join("",
            from ch in text
            where char.IsLetterOrDigit(ch) || char.IsWhiteSpace(ch)
            select ch);
}

请享用...


1

原学校复制/踩脚:

  private static string RemoveDirtyCharsFromString(string in_string)
     {
        int index = 0;
        int removed = 0;

        byte[] in_array = Encoding.UTF8.GetBytes(in_string);

        foreach (byte element in in_array)
        {
           if ((element == ' ') ||
               (element == '-') ||
               (element == ':'))
           {
              removed++;
           }
           else
           {
              in_array[index] = element;
              index++;
           }
        }

        Array.Resize<byte>(ref in_array, (in_array.Length - removed));
        return(System.Text.Encoding.UTF8.GetString(in_array, 0, in_array.Length));
     }

不确定其他方法的效率(即,所有函数调用和实例化的开销都是C#执行中的副作用)。


1

我将其设为扩展方法并使用字符串数组,我认为string[]它比char[]因为char也可以是字符串更有用:

public static class Helper
{
    public static string RemoverStrs(this string str, string[] removeStrs)
    {
        foreach (var removeStr in removeStrs)
            str = str.Replace(removeStr, "");
        return str;
    }
}

那么您可以在任何地方使用它:

string myname = "My name @is ,Wan.;'; Wan";
string result = myname.RemoveStrs(new[]{ "@", ",", ".", ";", "\\"});

1

我需要从XML文件中删除特殊字符。这是我的方法。char.ToString()是此代码中的英雄。

string item = "<item type="line" />"
char DC4 = (char)0x14;
string fixed = item.Replace(DC4.ToString(), string.Empty);

1
new[] { ',', '.', ';', '\'', '@' }
.Aggregate("My name @is ,Wan.;'; Wan", (s, c) => s.Replace(c.ToString(), string.Empty)); 

1

从@drzaus获得性能数据,这是使用最快算法的扩展方法。

public static class StringEx
{
    public static string RemoveCharacters(this string s, params char[] unwantedCharacters) 
        => s == null ? null : string.Join(string.Empty, s.Split(unwantedCharacters));
}

用法

var name = "edward woodward!";
var removeDs = name.RemoveCharacters('d', '!');
Assert.Equal("ewar woowar", removeDs); // old joke
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.