如何用一个空格替换多个空格


108

假设我有一个字符串,例如:

"Hello     how are   you           doing?"

我想要一个将多个空间变成一个空间的函数。

所以我会得到:

"Hello how are you doing?"

我知道我可以使用正则表达式或致电

string s = "Hello     how are   you           doing?".replace("  "," ");

但是我必须多次调用它,以确保所有连续的空格都仅被一个空格替换。

已经有内置的方法了吗?


您能否澄清:您是只处理空格还是“所有”空白?
乔恩·斯基特

您是否要将任何非空格空格转换为空格?
乔恩·斯基特

我只是说所有系列中的空白最多应为1
Matt Matt


2需要考虑的事项:1. char.IsWhiteSpace包括回车,换行等2“空白”可能是更准确地Char.GetUnicodeCategory(CH)测试= Globalization.UnicodeCategory.SpaceSeparator
smirkingman

Answers:


195
string cleanedString = System.Text.RegularExpressions.Regex.Replace(dirtyString,@"\s+"," ");

40
imo,如果您对正则表达式感到满意则避免正则表达式过早优化
Tim Hoolihan 09年

8
如果您的应用程序不是时间紧迫的应用程序,那么它可以提供1微秒的处理开销。
丹尼尔(Daniel)2009年

16
请注意,“ \ s”不仅替换空格,还替换换行符。
巴特·基尔斯

12
不错的选择,如果您只想将空格切换为“ [] +”
提姆·胡里汉

9
您是否不应该使用'{2,}'而不是'+'来避免替换单个空格?
angularsen 2011年

52

这个问题并不像其他张贴者所指出的那样简单(正如我最初认为的那样),因为这个问题并不十分精确。

“空格”和“空白”之间有区别。如果表示空格,则应使用的正则表达式" {2,}"。如果你的意思是任何空白,这是一个不同的问题。是否所有空白都应转换为空格?在开始和结束时空间应该发生什么?

对于下面的基准,我假设您只关心空格,并且即使在开始和结束时,您也不希望对单个空格执行任何操作。

请注意,正确性几乎总是比性能更重要。就您指定的要求(当然可能不完整)而言,“拆分/合并”解决方案会删除任何前导/后缀空格(甚至只是单个空格)这一事实是不正确的。

该基准使用MiniBench

using System;
using System.Text.RegularExpressions;
using MiniBench;

internal class Program
{
    public static void Main(string[] args)
    {

        int size = int.Parse(args[0]);
        int gapBetweenExtraSpaces = int.Parse(args[1]);

        char[] chars = new char[size];
        for (int i=0; i < size/2; i += 2)
        {
            // Make sure there actually *is* something to do
            chars[i*2] = (i % gapBetweenExtraSpaces == 1) ? ' ' : 'x';
            chars[i*2 + 1] = ' ';
        }
        // Just to make sure we don't have a \0 at the end
        // for odd sizes
        chars[chars.Length-1] = 'y';

        string bigString = new string(chars);
        // Assume that one form works :)
        string normalized = NormalizeWithSplitAndJoin(bigString);


        var suite = new TestSuite<string, string>("Normalize")
            .Plus(NormalizeWithSplitAndJoin)
            .Plus(NormalizeWithRegex)
            .RunTests(bigString, normalized);

        suite.Display(ResultColumns.All, suite.FindBest());
    }

    private static readonly Regex MultipleSpaces = 
        new Regex(@" {2,}", RegexOptions.Compiled);

    static string NormalizeWithRegex(string input)
    {
        return MultipleSpaces.Replace(input, " ");
    }

    // Guessing as the post doesn't specify what to use
    private static readonly char[] Whitespace =
        new char[] { ' ' };

    static string NormalizeWithSplitAndJoin(string input)
    {
        string[] split = input.Split
            (Whitespace, StringSplitOptions.RemoveEmptyEntries);
        return string.Join(" ", split);
    }
}

一些测试运行:

c:\Users\Jon\Test>test 1000 50
============ Normalize ============
NormalizeWithSplitAndJoin  1159091 0:30.258 22.93
NormalizeWithRegex        26378882 0:30.025  1.00

c:\Users\Jon\Test>test 1000 5
============ Normalize ============
NormalizeWithSplitAndJoin  947540 0:30.013 1.07
NormalizeWithRegex        1003862 0:29.610 1.00


c:\Users\Jon\Test>test 1000 1001
============ Normalize ============
NormalizeWithSplitAndJoin  1156299 0:29.898 21.99
NormalizeWithRegex        23243802 0:27.335  1.00

在这里,第一个数字是迭代次数,第二个是花费的时间,第三个是缩放分数,其中1.0为最佳。

这表明,至少在某些情况下(包括该情况),正则表达式的性能可能会优于Split / Join解决方案,有时甚至会非常明显。

但是,如果更改为“所有空格”要求,则“拆分/合并” 确实会获胜。通常,细节就是魔鬼……


1
很好的分析。因此看来我们俩在不同程度上都是正确的。我的答案中的代码来自一个较大的函数,该函数具有对字符串内以及开头和结尾处的所有空格和/或控制字符进行规范化的功能。
Scott Dorman

1
仅使用您指定的空白字符,在我的大多数测试中,正则表达式和“拆分/合并”几乎相等-S / J带来了微小的好处,但以正确性和复杂性为代价。由于这些原因,我通常更喜欢使用正则表达式。不要误会我的意思-我不是regex的忠实支持者,但是我不喜欢为了性能而编写更复杂的代码,而无需先真正地测试性能。
乔恩·斯基特

NormalizeWithSplitAndJoin会创建更多的垃圾,很难判断一个真正的问题是否会比该banchmark花费更多的GC时间。
伊恩·林格罗斯

@IanRingrose可以创建哪种垃圾?
Dronz

18

定期表达是最简单的方法。如果您以正确的方式编写正则表达式,则无需多次调用。

更改为此:

string s = System.Text.RegularExpressions.Regex.Replace(s, @"\s{2,}", " "); 

我的一个问题@"\s{2,}"是它无法用空格替换单个制表符和其他Unicode空格字符。如果要用空格替换2个选项卡,则可能应该用空格替换1个选项卡。@"\s+"将为您做到这一点。
David Specht

17

虽然现有的答案都很好,我想指出的一种方法,其工作:

public static string DontUseThisToCollapseSpaces(string text)
{
    while (text.IndexOf("  ") != -1)
    {
        text = text.Replace("  ", " ");
    }
    return text;
}

这可以永远循环。有人在乎为什么吗?(几年前,当我被问到是新闻组问题时,我才遇到过这个问题……实际上有人遇到了问题。)


我想我记得这个问题早就被问到了。IndexOf会忽略Replace不会的某些字符。因此,双重空间始终存在,而从未移除。
布兰登

19
这是因为IndexOf忽略了一些Unicode字符,这种情况下的具体罪魁祸首是一些亚洲字符iirc。嗯,根据Google的零宽度非联接。
小贩


我学到了很难的方法。特别是具有两个零宽度非连接器(\ u200C \ u200C)。IndexOf返回此“双倍空格”的索引,但是Replace不会替换它。我认为这是因为对于IndexOf,您需要指定StringComparsion(Ordinal)以使其表现与Replace相同。这样,这两个都不会定位“双倍空格”。更多关于StringComparsion docs.microsoft.com/en-us/dotnet/api/...
马丁Brabec的

4

正如已经指出的,这可以通过正则表达式轻松完成。我只想补充一点,您可能想在其中添加.trim()来消除前导/尾随空格。


4

这是我使用的解决方案。没有RegEx和String.Split。

public static string TrimWhiteSpace(this string Value)
{
    StringBuilder sbOut = new StringBuilder();
    if (!string.IsNullOrEmpty(Value))
    {
        bool IsWhiteSpace = false;
        for (int i = 0; i < Value.Length; i++)
        {
            if (char.IsWhiteSpace(Value[i])) //Comparion with WhiteSpace
            {
                if (!IsWhiteSpace) //Comparison with previous Char
                {
                    sbOut.Append(Value[i]);
                    IsWhiteSpace = true;
                }
            }
            else
            {
                IsWhiteSpace = false;
                sbOut.Append(Value[i]);
            }
        }
    }
    return sbOut.ToString();
}

这样你就可以:

string cleanedString = dirtyString.TrimWhiteSpace();

4

一种快速的多余空白消除器...这是最快的一种,基于Felipe Machado的就地复制。

static string InPlaceCharArray(string str)
{
    var len = str.Length;
    var src = str.ToCharArray();
    int dstIdx = 0;
    bool lastWasWS = false;
    for (int i = 0; i < len; i++)
    {
        var ch = src[i];
        if (src[i] == '\u0020')
        {
            if (lastWasWS == false)
            {
                src[dstIdx++] = ch;
                lastWasWS = true;
            }
        }
        else
        { 
            lastWasWS = false;
            src[dstIdx++] = ch;
        }
    }
    return new string(src, 0, dstIdx);
}

基准...

FePlacee Machado在CodeProject 2015上使用InPlaceCharArraySpaceOnly,并由Sunsetquest进行了修改,以实现多空间移除。 时间:3.75点

Felipe Machado编写的InPlaceCharArray,2015年由Sunsetquest进行了少许修改,以去除多空间。 时间6.50 cks (也支持标签)

Jon Skeet的 SplitAndJoinOnSpace 。 时间:13.25秒

fubo的 StringBuilder 时间:13.5 滴答声(也支持标签页)

正则表达式,由Jon Skeet编译。 时间:17次

David S制作的 StringBuilder 2013 时间:30.5次

Brandon 非编译正则表达式时间:63.25滴答

StringBuilder by user214147 时间:77.125滴答声

使用非编译的正则表达式Tim Hoolihan 时间:147.25分

基准代码...

using System;
using System.Text.RegularExpressions;
using System.Diagnostics;
using System.Threading;
using System.Text;

static class Program
{
    public static void Main(string[] args)
    {
    long seed = ConfigProgramForBenchmarking();

    Stopwatch sw = new Stopwatch();

    string warmup = "This is   a Warm  up function for best   benchmark results." + seed;
    string input1 = "Hello World,    how are   you           doing?" + seed;
    string input2 = "It\twas\t \tso    nice  to\t\t see you \tin 1950.  \t" + seed;
    string correctOutput1 = "Hello World, how are you doing?" + seed;
    string correctOutput2 = "It\twas\tso nice to\tsee you in 1950. " + seed;
    string output1,output2;

    //warm-up timer function
    sw.Restart();
    sw.Stop();

    sw.Restart();
    sw.Stop();
    long baseVal = sw.ElapsedTicks;

    // InPlace Replace by Felipe Machado but modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin)
    output1 = InPlaceCharArraySpaceOnly (warmup);
    sw.Restart();
    output1 = InPlaceCharArraySpaceOnly (input1);
    output2 = InPlaceCharArraySpaceOnly (input2);
    sw.Stop();
    Console.WriteLine("InPlaceCharArraySpaceOnly : " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    // InPlace Replace by Felipe R. Machado and slightly modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin)
    output1 = InPlaceCharArray(warmup);
    sw.Restart();
    output1 = InPlaceCharArray(input1);
    output2 = InPlaceCharArray(input2);
    sw.Stop();
    Console.WriteLine("InPlaceCharArray: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //Regex with non-compile Tim Hoolihan (https://stackoverflow.com/a/1279874/2352507)
    string cleanedString = 
    output1 = Regex.Replace(warmup, @"\s+", " ");
    sw.Restart();
    output1 = Regex.Replace(input1, @"\s+", " ");
    output2 = Regex.Replace(input2, @"\s+", " ");
    sw.Stop();
    Console.WriteLine("Regex by Tim Hoolihan: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //Regex with compile by Jon Skeet (https://stackoverflow.com/a/1280227/2352507)
    output1 = MultipleSpaces.Replace(warmup, " ");
    sw.Restart();
    output1 = MultipleSpaces.Replace(input1, " ");
    output2 = MultipleSpaces.Replace(input2, " ");
    sw.Stop();
    Console.WriteLine("Regex with compile by Jon Skeet: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //Split And Join by Jon Skeet (https://stackoverflow.com/a/1280227/2352507)
    output1 = SplitAndJoinOnSpace(warmup);
    sw.Restart();
    output1 = SplitAndJoinOnSpace(input1);
    output2 = SplitAndJoinOnSpace(input2);
    sw.Stop();
    Console.WriteLine("Split And Join by Jon Skeet: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //Regex by Brandon (https://stackoverflow.com/a/1279878/2352507
    output1 = Regex.Replace(warmup, @"\s{2,}", " ");
    sw.Restart();
    output1 = Regex.Replace(input1, @"\s{2,}", " ");
    output2 = Regex.Replace(input2, @"\s{2,}", " ");
    sw.Stop();
    Console.WriteLine("Regex by Brandon: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //StringBuilder by user214147 (https://stackoverflow.com/a/2156660/2352507
    output1 = user214147(warmup);
    sw.Restart();
    output1 = user214147(input1);
    output2 = user214147(input2);
    sw.Stop();
    Console.WriteLine("StringBuilder by user214147: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //StringBuilder by fubo (https://stackoverflow.com/a/27502353/2352507
    output1 = fubo(warmup);
    sw.Restart();
    output1 = fubo(input1);
    output2 = fubo(input2);
    sw.Stop();
    Console.WriteLine("StringBuilder by fubo: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));


    //StringBuilder by David S 2013 (https://stackoverflow.com/a/16035044/2352507)
    output1 = SingleSpacedTrim(warmup);
    sw.Restart();
    output1 = SingleSpacedTrim(input1);
    output2 = SingleSpacedTrim(input2);
    sw.Stop();
    Console.WriteLine("StringBuilder(SingleSpacedTrim) by David S: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));
}

// InPlace Replace by Felipe Machado and slightly modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin)
static string InPlaceCharArray(string str)
{
    var len = str.Length;
    var src = str.ToCharArray();
    int dstIdx = 0;
    bool lastWasWS = false;
    for (int i = 0; i < len; i++)
    {
        var ch = src[i];
        if (src[i] == '\u0020')
        {
            if (lastWasWS == false)
            {
                src[dstIdx++] = ch;
                lastWasWS = true;
            }
        }
        else
        { 
            lastWasWS = false;
            src[dstIdx++] = ch;
        }
    }
    return new string(src, 0, dstIdx);
}

// InPlace Replace by Felipe R. Machado but modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin)
static string InPlaceCharArraySpaceOnly (string str)
{
    var len = str.Length;
    var src = str.ToCharArray();
    int dstIdx = 0;
    bool lastWasWS = false; //Added line
    for (int i = 0; i < len; i++)
    {
        var ch = src[i];
        switch (ch)
        {
            case '\u0020': //SPACE
            case '\u00A0': //NO-BREAK SPACE
            case '\u1680': //OGHAM SPACE MARK
            case '\u2000': // EN QUAD
            case '\u2001': //EM QUAD
            case '\u2002': //EN SPACE
            case '\u2003': //EM SPACE
            case '\u2004': //THREE-PER-EM SPACE
            case '\u2005': //FOUR-PER-EM SPACE
            case '\u2006': //SIX-PER-EM SPACE
            case '\u2007': //FIGURE SPACE
            case '\u2008': //PUNCTUATION SPACE
            case '\u2009': //THIN SPACE
            case '\u200A': //HAIR SPACE
            case '\u202F': //NARROW NO-BREAK SPACE
            case '\u205F': //MEDIUM MATHEMATICAL SPACE
            case '\u3000': //IDEOGRAPHIC SPACE
            case '\u2028': //LINE SEPARATOR
            case '\u2029': //PARAGRAPH SEPARATOR
            case '\u0009': //[ASCII Tab]
            case '\u000A': //[ASCII Line Feed]
            case '\u000B': //[ASCII Vertical Tab]
            case '\u000C': //[ASCII Form Feed]
            case '\u000D': //[ASCII Carriage Return]
            case '\u0085': //NEXT LINE
                if (lastWasWS == false) //Added line
                {
                    src[dstIdx++] = ch; //Added line
                    lastWasWS = true; //Added line
                }
            continue;
            default:
                lastWasWS = false; //Added line 
                src[dstIdx++] = ch;
                break;
        }
    }
    return new string(src, 0, dstIdx);
}

static readonly Regex MultipleSpaces =
    new Regex(@" {2,}", RegexOptions.Compiled);

//Split And Join by Jon Skeet (https://stackoverflow.com/a/1280227/2352507)
static string SplitAndJoinOnSpace(string input)
{
    string[] split = input.Split(new char[] { ' '}, StringSplitOptions.RemoveEmptyEntries);
    return string.Join(" ", split);
}

//StringBuilder by user214147 (https://stackoverflow.com/a/2156660/2352507
public static string user214147(string S)
{
    string s = S.Trim();
    bool iswhite = false;
    int iwhite;
    int sLength = s.Length;
    StringBuilder sb = new StringBuilder(sLength);
    foreach (char c in s.ToCharArray())
    {
        if (Char.IsWhiteSpace(c))
        {
            if (iswhite)
            {
                //Continuing whitespace ignore it.
                continue;
            }
            else
            {
                //New WhiteSpace

                //Replace whitespace with a single space.
                sb.Append(" ");
                //Set iswhite to True and any following whitespace will be ignored
                iswhite = true;
            }
        }
        else
        {
            sb.Append(c.ToString());
            //reset iswhitespace to false
            iswhite = false;
        }
    }
    return sb.ToString();
}

//StringBuilder by fubo (https://stackoverflow.com/a/27502353/2352507
public static string fubo(this string Value)
{
    StringBuilder sbOut = new StringBuilder();
    if (!string.IsNullOrEmpty(Value))
    {
        bool IsWhiteSpace = false;
        for (int i = 0; i < Value.Length; i++)
        {
            if (char.IsWhiteSpace(Value[i])) //Comparison with WhiteSpace
            {
                if (!IsWhiteSpace) //Comparison with previous Char
                {
                    sbOut.Append(Value[i]);
                    IsWhiteSpace = true;
                }
            }
            else
            {
                IsWhiteSpace = false;
                sbOut.Append(Value[i]);
            }
        }
    }
    return sbOut.ToString();
}

//David S. 2013 (https://stackoverflow.com/a/16035044/2352507)
public static String SingleSpacedTrim(String inString)
{
    StringBuilder sb = new StringBuilder();
    Boolean inBlanks = false;
    foreach (Char c in inString)
    {
        switch (c)
        {
            case '\r':
            case '\n':
            case '\t':
            case ' ':
                if (!inBlanks)
                {
                    inBlanks = true;
                    sb.Append(' ');
                }
                continue;
            default:
                inBlanks = false;
                sb.Append(c);
                break;
        }
    }
    return sb.ToString().Trim();
}

/// <summary>
/// We want to run this item with max priory to lower the odds of
/// the OS from doing program context switches in the middle of our code. 
/// source:https://stackoverflow.com/a/16157458 
/// </summary>
/// <returns>random seed</returns>
private static long ConfigProgramForBenchmarking()
{
    //prevent the JIT Compiler from optimizing Fkt calls away
    long seed = Environment.TickCount;
    //use the second Core/Processor for the test
    Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2);
    //prevent "Normal" Processes from interrupting Threads
    Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
    //prevent "Normal" Threads from interrupting this thread
    Thread.CurrentThread.Priority = ThreadPriority.Highest;
    return seed;
}

}

基准注释:发行模式,未调试器连接,i7处理器,4次平均运行,仅测试了短字符串


1
很高兴看到这里引用了我的文章!(我是Felipe Machado),我将使用称为BenchmarkDotNet的适当基准测试工具对其进行更新!我将尝试设置所有运行时中的运行(现在我们拥有DOT NET CORE之类的文件……
Loudenvier

1
@Loudenvier-在这方面做得很好。您的速度最快,达400%!.Net Core就像免费提供150-200%的性能提升一样。它越来越接近c ++性能,但更容易编写代码。感谢您的评论。
Sunsetquest

这只会产生空格,而不会产生其他空格字符。也许您想要char.IsWhiteSpace(ch)而不是src [i] =='\ u0020'。我注意到这已由社区进行编辑。他们夸大了吗?
Evil Pigeon

3

我正在分享我的使用情况,因为看来我提出了一些不同的建议。我已经使用了一段时间了,它对我来说足够快。我不确定它如何与其他产品相提并论。我在带分隔符的文件编写器中使用它,并一次通过它在一个字段中运行大型数据表。

    public static string NormalizeWhiteSpace(string S)
    {
        string s = S.Trim();
        bool iswhite = false;
        int iwhite;
        int sLength = s.Length;
        StringBuilder sb = new StringBuilder(sLength);
        foreach(char c in s.ToCharArray())
        {
            if(Char.IsWhiteSpace(c))
            {
                if (iswhite)
                {
                    //Continuing whitespace ignore it.
                    continue;
                }
                else
                {
                    //New WhiteSpace

                    //Replace whitespace with a single space.
                    sb.Append(" ");
                    //Set iswhite to True and any following whitespace will be ignored
                    iswhite = true;
                }  
            }
            else
            {
                sb.Append(c.ToString());
                //reset iswhitespace to false
                iswhite = false;
            }
        }
        return sb.ToString();
    }

2

使用乔恩·斯凯特(Jon Skeet)发布的测试程序,我尝试查看是否可以得到一个手写的循环以更快地运行。
我每次都可以击败NormalizeWithSplitAndJoin,但是只有输入1000、5时才能击败NormalizeWithRegex。

static string NormalizeWithLoop(string input)
{
    StringBuilder output = new StringBuilder(input.Length);

    char lastChar = '*';  // anything other then space 
    for (int i = 0; i < input.Length; i++)
    {
        char thisChar = input[i];
        if (!(lastChar == ' ' && thisChar == ' '))
            output.Append(thisChar);

        lastChar = thisChar;
    }

    return output.ToString();
}

我没有看过抖动产生的机器代码,但是我希望问题是调用StringBuilder.Append()所花费的时间,要想做得更好,就需要使用不安全的代码。

所以Regex.Replace()非常快,很难被击败!


2

VB.NET

Linha.Split(" ").ToList().Where(Function(x) x <> " ").ToArray

C#

Linha.Split(" ").ToList().Where(x => x != " ").ToArray();

享受LINQ = D的力量


究竟!对我来说,这也是最优雅的方法。因此,根据记录,在C#中将是:string.Join(" ", myString.Split(' ').Where(s => s != " ").ToArray())
Efrain

1
上的较小改进,Split以捕获所有空白并删除该Where子句:myString.Split(null as char[], StringSplitOptions.RemoveEmptyEntries)
David

1
Regex regex = new Regex(@"\W+");
string outputString = regex.Replace(inputString, " ");

这将所有非单词字符替换为空格。因此,它也将替换括号和引号等内容,而这可能并不是您想要的。
赫尔曼(Herman)2015年

0

最小的解决方案:

var regExp = / \ s + / g,newString = oldString.replace(regExp,'');


0

您可以尝试以下方法:

    /// <summary>
    /// Remove all extra spaces and tabs between words in the specified string!
    /// </summary>
    /// <param name="str">The specified string.</param>
    public static string RemoveExtraSpaces(string str)
    {
        str = str.Trim();
        StringBuilder sb = new StringBuilder();
        bool space = false;
        foreach (char c in str)
        {
            if (char.IsWhiteSpace(c) || c == (char)9) { space = true; }
            else { if (space) { sb.Append(' '); }; sb.Append(c); space = false; };
        }
        return sb.ToString();
    }

0

替换组提供了impler方法,解决了用同一单个字符替换多个空格字符的问题:

    public static void WhiteSpaceReduce()
    {
        string t1 = "a b   c d";
        string t2 = "a b\n\nc\nd";

        Regex whiteReduce = new Regex(@"(?<firstWS>\s)(?<repeatedWS>\k<firstWS>+)");
        Console.WriteLine("{0}", t1);
        //Console.WriteLine("{0}", whiteReduce.Replace(t1, x => x.Value.Substring(0, 1))); 
        Console.WriteLine("{0}", whiteReduce.Replace(t1, @"${firstWS}"));
        Console.WriteLine("\nNext example ---------");
        Console.WriteLine("{0}", t2);
        Console.WriteLine("{0}", whiteReduce.Replace(t2, @"${firstWS}"));
        Console.WriteLine();
    }

请注意,第二个示例保持单个,\n而接受的答案将用空格替换行尾。

如果您需要用第一个空格替换空白字符的任何组合,只需\k从模式中删除反向引用即可。


-1

没有内置的方法可以做到这一点。您可以尝试以下方法:

private static readonly char[] whitespace = new char[] { ' ', '\n', '\t', '\r', '\f', '\v' };
public static string Normalize(string source)
{
   return String.Join(" ", source.Split(whitespace, StringSplitOptions.RemoveEmptyEntries));
}

这将删除开头和结尾的空格,并将所有内部空格折叠为单个空格字符。如果您真的只想折叠空间,那么使用正则表达式的解决方案会更好。否则,这种解决方案会更好。(请参阅Jon Skeet所做的分析。)


7
如果对正则表达式进行编译和缓存,则我不确定这样做是否会比拆分和联接产生更多开销,拆分和联接会创建大量中间垃圾字符串。在假设您的方法更快之前,您是否已经对这两种方法进行了仔细的基准测试?
乔恩·斯基特

1
此处未声明空格
Tim Hoolihan,2009年

3
说到开销,您source.ToCharArray()到底为什么要打电话然后扔掉结果?
乔恩·斯基特

2
调用ToCharArray()string.Join的结果,只是创建一个新字符串...哇,因为在帖子中抱怨开销太了不起了。-1。
乔恩·斯基特

1
哦,假设whitespacenew char[] { ' ' },如果输入字符串以空格开头或结尾,则会给出错误的结果。
乔恩·斯基特
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.