在Java中将字符串拆分为相等长度的子字符串

125

如何"Thequickbrownfoxjumps"在Java 中将字符串拆分为相等大小的子字符串。例如。"Thequickbrownfoxjumps"大小相等的4个应该给出输出。

["Theq","uick","brow","nfox","jump","s"]

类似的问题：

在Scala中将字符串拆分为等长子字符串

— 埃米尔
source

4

你尝试了什么？为什么这样不起作用？

— Thilo 2010年

2

您是否需要为此使用正则表达式？只是因为正则表达式标签而问...

— Tim Pietzcker

他发布的@Thilo链接是针对Scala的，他正在用Java询问同样的问题

— Jaydeep Patel 2010年

@Thilo：我在问如何在Java中执行此操作，就像为scala给出的答案一样。

— 埃米尔（Emil）2010年

226

这是regex一线版：

System.out.println(Arrays.toString(
    "Thequickbrownfoxjumps".split("(?<=\\G.{4})")
));

\G是一个零宽度的断言，它与上一个匹配结束的位置匹配。如果是以前没有的比赛，它的输入的开始处匹配，同\A。后面的封闭式匹配从最后一场比赛开始算起的四个字符的位置。

既落后又\G是高级正则表达式功能，并非所有版本都支持。此外，\G在支持它的所有口味上实现的方式不一致。此技巧将（例如）在Java，Perl，.NET和JGSoft中起作用，但不适用于PHP（PCRE），Ruby 1.9+或TextMate（均为Oniguruma）。JavaScript的/y（粘滞标记）不如灵活\G，即使JS支持后向，也不能以这种方式使用。

我应该提到，如果您有其他选择，我不一定会推荐此解决方案。其他答案中的非正则表达式解决方案可能会更长一些，但它们也可以自我记录。这恰恰相反。;）

另外，这在Android中不起作用，它不支持\G在lookbehinds中使用。

— 艾伦·摩尔
source

2

在PHP 5.2.4中可以使用以下代码：return preg_split（'/（？<= \ G。{'。$ len。'}）/ u'，$ str，-1，PREG_SPLIT_NO_EMPTY）;

— 伊戈尔（Igor）2012年

5

作为记录，使用String.substring()而不是正则表达式，同时需要一些额外的代码行，它将以大约5倍的速度运行……

— 提请摩尔2014年

2

在Java中，这不适用于带有换行符的字符串。它只会检查第一个换行符，并且如果该换行符恰好在split-size之前，则不会分割字符串。还是我错过了什么？

— joensson 2014年

5

为了完整起见：在多行上分割文本需要(?s)在正则表达式中加前缀：(?s)(?<=\\G.{4})。

— bobbel '16

1

Java完全在编译时就java.util.regex.PatternSyntaxException: Look-behind pattern matches must have a bounded maximum length

— 拒绝了

132

好吧，使用简单的算术和字符串运算就很容易做到这一点：

public static List<String> splitEqually(String text, int size) {
    // Give the list the right capacity to start with. You could use an array
    // instead if you wanted.
    List<String> ret = new ArrayList<String>((text.length() + size - 1) / size);

    for (int start = 0; start < text.length(); start += size) {
        ret.add(text.substring(start, Math.min(text.length(), start + size)));
    }
    return ret;
}

我认为使用正则表达式确实不值得。

编辑：我不使用正则表达式的理由：

这不使用任何正则表达式的实际模式匹配。这只是在数。
我怀疑上面的方法会更有效，尽管在大多数情况下都没有关系
如果您需要在不同的地方使用可变大小，则可以使用重复或辅助函数来基于参数-ick构建正则表达式本身。
另一个答案中提供的正则表达式首先不编译（无效的转义），然后不起作用。我的代码是第一次工作。这更证明了正则表达式与纯代码IMO的可用性。

— 乔恩·斯基特
source

8

@Emil：实际上，您没有要求使用正则表达式。它在标签中，但问题本身没有要求使用正则表达式的问题。您将此方法放在一个位置，然后可以在代码中任何位置的一个易于阅读的语句中拆分字符串。

— 乔恩·斯基特

3

Emil这不是正则表达式的用途。期。

— 克里斯，2010年

3

@Emil：如果您希望使用单线分割字符串，我建议Splitter.fixedLength(4)按照seanizer的建议使用番石榴。

— ColinD 2010年

2

@Jay：来吧，你不必那么讽刺，我敢肯定，可以使用正则表达式仅用一行就可以完成，固定长度的子字符串也是一种模式，你对这个答案怎么说？stackoverflow.com/questions/3760152/…。

— 埃米尔（Emil）2010年

4

@Emil：我不打算那样粗鲁，只是异想天开。我要说的最重要的部分是，虽然可以，但是我敢肯定，您可以提出一个Regex来做到这一点-我看到Alan Moore有一个他声称可以工作的东西-这很神秘，因此对于后来的程序员来说很难了解和维护。子字符串解决方案可以直观且可读。请参阅乔恩·斯基特（Jon Skeet）的第4个项目符号：我同意那100％。

— 杰伊（Jay）

71

使用Google Guava非常简单：

for(final String token :
    Splitter
        .fixedLength(4)
        .split("Thequickbrownfoxjumps")){
    System.out.println(token);
}

输出：

Theq
uick
brow
nfox
jump
s

或者，如果您需要将结果作为数组，则可以使用以下代码：

String[] tokens =
    Iterables.toArray(
        Splitter
            .fixedLength(4)
            .split("Thequickbrownfoxjumps"),
        String.class
    );

参考：

注意：拆分器的构造如上所示，但由于拆分器是不可变的且可重复使用的，因此将它们存储在常量中是一个好习惯：

private static final Splitter FOUR_LETTERS = Splitter.fixedLength(4);

// more code

for(final String token : FOUR_LETTERS.split("Thequickbrownfoxjumps")){
    System.out.println(token);
}

— 肖恩·帕特里克·弗洛伊德（Sean Patrick Floyd）
source

谢谢你的帖子（感谢我让我知道了番石榴库的方法）。但是我必须接受正则表达式的答案stackoverflow.com/questions/3760152/…，因为它不需要任何第三方库和一线库。

— 埃米尔（Emil）2010年

1

仅仅为了执行此简单任务而包含数百KB的库代码几乎是不正确的。

— 杰弗里·布拉特曼

2

包括番石榴在内的@JeffreyBlattman可能就算是过大了，是的。但是无论如何，我还是将它用作所有Java代码中的通用库，所以为什么不使用这一附加功能

— Sean Patrick Floyd

有什么办法可以加入分隔符吗？

— Aquarius

1

@AquariusPowerString.join(separator, arrayOrCollection)

— Holger

14

如果您使用的是Google的番石榴通用库（老实说，任何新的Java项目都应该使用），那么Splitter类的确很简单：

for (String substring : Splitter.fixedLength(4).split(inputString)) {
    doSomethingWith(substring);
}

就是这样。容易！

— 科万
source

8

public static String[] split(String src, int len) {
    String[] result = new String[(int)Math.ceil((double)src.length()/(double)len)];
    for (int i=0; i<result.length; i++)
        result[i] = src.substring(i*len, Math.min(src.length(), (i+1)*len));
    return result;
}

— 扫罗
source

由于src.length()和len均为ints，因此您的通话ceiling 无法完成您想要的操作-查看其他响应的响应方式：（src.length（）+ len-1）/ len

— Michael Brewer-Davis

@迈克尔：好点。我没有用非整数长度的字符串进行测试。现在已修复。

— 萨尔2010年

6

public String[] splitInParts(String s, int partLength)
{
    int len = s.length();

    // Number of parts
    int nparts = (len + partLength - 1) / partLength;
    String parts[] = new String[nparts];

    // Break into parts
    int offset= 0;
    int i = 0;
    while (i < nparts)
    {
        parts[i] = s.substring(offset, Math.min(offset + partLength, len));
        offset += partLength;
        i++;
    }

    return parts;
}

— 格罗德里格斯
source

6

出于兴趣，您是否有反对for循环的东西？

— 乔恩·斯基特

一for环确实是一个更加“自然”选择使用这个:-)感谢您指出了这一点。

— Grodriguez

3

您可以使用substringfrom String.class（处理异常）或Apache lang commons（它为您处理异常）

static String   substring(String str, int start, int end)

将其放入循环中，您就可以开始了。

— 帕科雷
source

1

substring标准String类中的方法有什么问题？

— Grodriguez

Commons版本避免了异常（越界等）

— Thilo

7

我懂了; 我会说我更喜欢通过控制调用代码中的参数来“避免异常”。

— Grodriguez

2

我宁愿这个简单的解决方案：

String content = "Thequickbrownfoxjumps";
while(content.length() > 4) {
    System.out.println(content.substring(0, 4));
    content = content.substring(4);
}
System.out.println(content);

— 猎豹编码器
source

不要这样！字符串是不可变的，因此您的代码需要每4个字符复制整个剩余字符串。因此，您的代码段占用String大小的二次时间而不是线性时间。

— Tobias's

@Tobias：即使String是可变的，此代码段也会执行上述的冗余副本，除非存在与此相关的复杂编译过程。使用此代码段的唯一原因是代码简单。

— 猎豹编码者

自您首次发布代码以来，您是否更改过代码？最新版本实际上并不能复制-substring（）有效运行（恒定的时间，至少在Java的旧版本上）；它保留了对整个字符串的char []的引用（至少在Java的旧版本中），但这在这种情况下很好，因为您保留了所有字符。因此，您在这里拥有的最新代码实际上是可以的（以模数表示，如果内容以空字符串开头，则代码将显示一行空行，这可能不是您想要的）。

— Tobias

@Tobias：我不记得有什么变化。

— 猎豹编码者

@Tobias的substring实现在2012年中随着Java 7更新6进行了更改，当时从类中删除了offsetand count字段String。因此，substring在做出这个答案之前，很早就转向线性。但是对于像这样的小字符串来说，它仍然可以足够快地运行，而对于更长的字符串来说……在实践中很少执行此任务。

— Holger

2

这是使用Java8流的一个线性实施：

String input = "Thequickbrownfoxjumps";
final AtomicInteger atomicInteger = new AtomicInteger(0);
Collection<String> result = input.chars()
                                    .mapToObj(c -> String.valueOf((char)c) )
                                    .collect(Collectors.groupingBy(c -> atomicInteger.getAndIncrement() / 4
                                                                ,Collectors.joining()))
                                    .values();

它提供以下输出：

[Theq, uick, brow, nfox, jump, s]

— 潘卡·辛格（Pankaj Singhal）
source

1

这是一个可怕的解决方案，与API的意图背道而驰，使用有状态功能，并且比普通循环复杂得多，更不用说装箱和字符串连接的开销了。如果您需要流解决方案，请使用类似

String[] result = IntStream.range(0, (input.length()+3)/4) .mapToObj(i -> input.substring(i *= 4, Math.min(i + 4, input.length()))) .toArray(String[]::new);

— Holger

2

这里的一个班轮它使用版本的Java 8 IntStream确定切片开始的指标：

String x = "Thequickbrownfoxjumps";

String[] result = IntStream
                    .iterate(0, i -> i + 4)
                    .limit((int) Math.ceil(x.length() / 4.0))
                    .mapToObj(i ->
                        x.substring(i, Math.min(i + 4, x.length())
                    )
                    .toArray(String[]::new);

— Marko Previsic
source

1

如果您想将字符串均等地向后拆分，例如，从右向左拆分，例如，拆分1010001111为[10, 1000, 1111]，则代码如下：

/**
 * @param s         the string to be split
 * @param subLen    length of the equal-length substrings.
 * @param backwards true if the splitting is from right to left, false otherwise
 * @return an array of equal-length substrings
 * @throws ArithmeticException: / by zero when subLen == 0
 */
public static String[] split(String s, int subLen, boolean backwards) {
    assert s != null;
    int groups = s.length() % subLen == 0 ? s.length() / subLen : s.length() / subLen + 1;
    String[] strs = new String[groups];
    if (backwards) {
        for (int i = 0; i < groups; i++) {
            int beginIndex = s.length() - subLen * (i + 1);
            int endIndex = beginIndex + subLen;
            if (beginIndex < 0)
                beginIndex = 0;
            strs[groups - i - 1] = s.substring(beginIndex, endIndex);
        }
    } else {
        for (int i = 0; i < groups; i++) {
            int beginIndex = subLen * i;
            int endIndex = beginIndex + subLen;
            if (endIndex > s.length())
                endIndex = s.length();
            strs[i] = s.substring(beginIndex, endIndex);
        }
    }
    return strs;
}

— 黄
source

1

我使用以下Java 8解决方案：

public static List<String> splitString(final String string, final int chunkSize) {
  final int numberOfChunks = (string.length() + chunkSize - 1) / chunkSize;
  return IntStream.range(0, numberOfChunks)
                  .mapToObj(index -> string.substring(index * chunkSize, Math.min((index + 1) * chunkSize, string.length())))
                  .collect(toList());
}

— 罗洛夫
source

0

Java 8解决方案（像这样，但是更简单）：

public static List<String> partition(String string, int partSize) {
  List<String> parts = IntStream.range(0, string.length() / partSize)
    .mapToObj(i -> string.substring(i * partSize, (i + 1) * partSize))
    .collect(toList());
  if ((string.length() % partSize) != 0)
    parts.add(string.substring(string.length() / partSize * partSize));
  return parts;
}

— 蒂莫菲·戈尔什科夫
source

-1

我在接受的解决方案的评论中问@Alan Moore如何处理带换行符的字符串。他建议使用DOTALL。

根据他的建议，我创建了一个小示例：

public void regexDotAllExample() throws UnsupportedEncodingException {
    final String input = "The\nquick\nbrown\r\nfox\rjumps";
    final String regex = "(?<=\\G.{4})";

    Pattern splitByLengthPattern;
    String[] split;

    splitByLengthPattern = Pattern.compile(regex);
    split = splitByLengthPattern.split(input);
    System.out.println("---- Without DOTALL ----");
    for (int i = 0; i < split.length; i++) {
        byte[] s = split[i].getBytes("utf-8");
        System.out.println("[Idx: "+i+", length: "+s.length+"] - " + s);
    }
    /* Output is a single entry longer than the desired split size:
    ---- Without DOTALL ----
    [Idx: 0, length: 26] - [B@17cdc4a5
     */


    //DOTALL suggested in Alan Moores comment on SO: https://stackoverflow.com/a/3761521/1237974
    splitByLengthPattern = Pattern.compile(regex, Pattern.DOTALL);
    split = splitByLengthPattern.split(input);
    System.out.println("---- With DOTALL ----");
    for (int i = 0; i < split.length; i++) {
        byte[] s = split[i].getBytes("utf-8");
        System.out.println("[Idx: "+i+", length: "+s.length+"] - " + s);
    }
    /* Output is as desired 7 entries with each entry having a max length of 4:
    ---- With DOTALL ----
    [Idx: 0, length: 4] - [B@77b22abc
    [Idx: 1, length: 4] - [B@5213da08
    [Idx: 2, length: 4] - [B@154f6d51
    [Idx: 3, length: 4] - [B@1191ebc5
    [Idx: 4, length: 4] - [B@30ddb86
    [Idx: 5, length: 4] - [B@2c73bfb
    [Idx: 6, length: 2] - [B@6632dd29
     */

}

但我也喜欢https://stackoverflow.com/a/3760193/1237974中的 @Jon Skeets解决方案。为了在并非所有人都具有同等经验的大型项目中实现可维护性，我可能会使用Jons解决方案。

— 琼森
source

-1

另一种暴力解决方案可能是

    String input = "thequickbrownfoxjumps";
    int n = input.length()/4;
    String[] num = new String[n];

    for(int i = 0, x=0, y=4; i<n; i++){
    num[i]  = input.substring(x,y);
    x += 4;
    y += 4;
    System.out.println(num[i]);
    }

代码仅在带有子字符串的字符串中逐步执行

— 哈比
source

-1

    import static java.lang.System.exit;
   import java.util.Scanner;
   import Java.util.Arrays.*;


 public class string123 {

public static void main(String[] args) {


  Scanner sc=new Scanner(System.in);
    System.out.println("Enter String");
    String r=sc.nextLine();
    String[] s=new String[10];
    int len=r.length();
       System.out.println("Enter length Of Sub-string");
    int l=sc.nextInt();
    int last;
    int f=0;
    for(int i=0;;i++){
        last=(f+l);
            if((last)>=len) last=len;
        s[i]=r.substring(f,last);
     // System.out.println(s[i]);

      if (last==len)break;
       f=(f+l);
    } 
    System.out.print(Arrays.tostring(s));
    }}

结果

 Enter String
 Thequickbrownfoxjumps
 Enter length Of Sub-string
 4

 ["Theq","uick","brow","nfox","jump","s"]

— 拉维尚德拉
source

-1

@Test
public void regexSplit() {
    String source = "Thequickbrownfoxjumps";
    // define matcher, any char, min length 1, max length 4
    Matcher matcher = Pattern.compile(".{1,4}").matcher(source);
    List<String> result = new ArrayList<>();
    while (matcher.find()) {
        result.add(source.substring(matcher.start(), matcher.end()));
    }
    String[] expected = {"Theq", "uick", "brow", "nfox", "jump", "s"};
    assertArrayEquals(result.toArray(), expected);
}

— 阿德里安·博格丹·伊内斯库
source

-1

这是我基于RegEx和Java 8流的版本。值得一提的是，该Matcher.results()方法自Java 9开始可用。

包括测试。

public static List<String> splitString(String input, int splitSize) {
    Matcher matcher = Pattern.compile("(?:(.{" + splitSize + "}))+?").matcher(input);
    return matcher.results().map(MatchResult::group).collect(Collectors.toList());
}

@Test
public void shouldSplitStringToEqualLengthParts() {
    String anyValidString = "Split me equally!";
    String[] expectedTokens2 = {"Sp", "li", "t ", "me", " e", "qu", "al", "ly"};
    String[] expectedTokens3 = {"Spl", "it ", "me ", "equ", "all"};

    Assert.assertArrayEquals(expectedTokens2, splitString(anyValidString, 2).toArray());
    Assert.assertArrayEquals(expectedTokens3, splitString(anyValidString, 3).toArray());
}

— Itachi
source

-1

public static String[] split(String input, int length) throws IllegalArgumentException {

    if(length == 0 || input == null)
        return new String[0];

    int lengthD = length * 2;

    int size = input.length();
    if(size == 0)
        return new String[0];

    int rep = (int) Math.ceil(size * 1d / length);

    ByteArrayInputStream stream = new ByteArrayInputStream(input.getBytes(StandardCharsets.UTF_16LE));

    String[] out = new String[rep];
    byte[]  buf = new byte[lengthD];

    int d = 0;
    for (int i = 0; i < rep; i++) {

        try {
            d = stream.read(buf);
        } catch (IOException e) {
            e.printStackTrace();
        }

        if(d != lengthD)
        {
            out[i] = new String(buf,0,d, StandardCharsets.UTF_16LE);
            continue;
        }

        out[i] = new String(buf, StandardCharsets.UTF_16LE);
    }
    return out;
}

— 用户8461
source

-1

public static List<String> getSplittedString(String stringtoSplit,
            int length) {

        List<String> returnStringList = new ArrayList<String>(
                (stringtoSplit.length() + length - 1) / length);

        for (int start = 0; start < stringtoSplit.length(); start += length) {
            returnStringList.add(stringtoSplit.substring(start,
                    Math.min(stringtoSplit.length(), start + length)));
        }

        return returnStringList;
    }

— 拉吉·希拉尼（Raj Hirani）
source