如何计算正则表达式的匹配数?


97

假设我有一个包含以下内容的字符串:

HelloxxxHelloxxxHello

我编译一个模式以查找“ Hello”

Pattern pattern = Pattern.compile("Hello");
Matcher matcher = pattern.matcher("HelloxxxHelloxxxHello");

它应该找到三个匹配项。我如何计算有多少场比赛?

我尝试了各种循环并使用,matcher.groupCount()但没有成功。


您的搜索字符串在输入字符串中可能有重叠出现的机会吗?
aioobe 2011年

Answers:


177

matcher.find()找不到所有匹配项,仅找到下一个匹配项。

Java 9+的解决方案

long matches = matcher.results().count();

Java 8及更早版本的解决方案

您必须执行以下操作。(从Java 9开始,有一个更好的解决方案

int count = 0;
while (matcher.find())
    count++;

顺便说一句,matcher.groupCount()是完全不同的东西。

完整的例子

import java.util.regex.*;

class Test {
    public static void main(String[] args) {
        String hello = "HelloxxxHelloxxxHello";
        Pattern pattern = Pattern.compile("Hello");
        Matcher matcher = pattern.matcher(hello);

        int count = 0;
        while (matcher.find())
            count++;

        System.out.println(count);    // prints 3
    }
}

处理重叠的比赛

当计算上述片段aaaaaa的时,将为您提供2

aaaa
aa
  aa

要获得3个匹配项,即此行为:

aaaa
aa
 aa
  aa

您必须在索引处搜索匹配项,<start of last match> + 1如下所示:

String hello = "aaaa";
Pattern pattern = Pattern.compile("aa");
Matcher matcher = pattern.matcher(hello);

int count = 0;
int i = 0;
while (matcher.find(i)) {
    count++;
    i = matcher.start() + 1;
}

System.out.println(count);    // prints 3

计算字符串中发生的匹配数。java.util.regex.Matcher.region(int start,int end)方法设置此匹配器区域的限制。该区域是输入序列的一部分,将对其进行搜索以找到匹配项。调用此方法将重置匹配器,然后将区域设置为从start参数指定的索引处开始,到end参数指定的索引处结束。试试这个。while(matcher.find()){ matcher.region(matcher.end()-1, str.length()); count++; }
Mukesh Kumar Gupta

17

这应该适用于可能重叠的匹配项:

public static void main(String[] args) {
    String input = "aaaaaaaa";
    String regex = "aa";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(input);
    int from = 0;
    int count = 0;
    while(matcher.find(from)) {
        count++;
        from = matcher.start() + 1;
    }
    System.out.println(count);
}


3

如果您想使用Java 8流并且对while循环过敏,则可以尝试以下操作:

public static int countPattern(String references, Pattern referencePattern) {
    Matcher matcher = referencePattern.matcher(references);
    return Stream.iterate(0, i -> i + 1)
            .filter(i -> !matcher.find())
            .findFirst()
            .get();
}

免责声明:这仅适用于不相交的比赛。

例:

public static void main(String[] args) throws ParseException {
    Pattern referencePattern = Pattern.compile("PASSENGER:\\d+");
    System.out.println(countPattern("[ \"PASSENGER:1\", \"PASSENGER:2\", \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
    System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
    System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\", \"PASSENGER:1\" ]", referencePattern));
    System.out.println(countPattern("[  ]", referencePattern));
}

打印输出:

2
0
1
0

这是与流脱节匹配的解决方案:

public static int countPattern(String references, Pattern referencePattern) {
    return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
            new Iterator<Integer>() {
                Matcher matcher = referencePattern.matcher(references);
                int from = 0;

                @Override
                public boolean hasNext() {
                    return matcher.find(from);
                }

                @Override
                public Integer next() {
                    from = matcher.start() + 1;
                    return 1;
                }
            },
            Spliterator.IMMUTABLE), false).reduce(0, (a, c) -> a + c);
}

1

使用以下代码查找正则表达式在您的输入中找到的匹配项数

        Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);// "regex" here indicates your predefined regex.
        Matcher m = p.matcher(pattern); // "pattern" indicates your string to match the pattern against with
        boolean b = m.matches();
        if(b)
        count++;
        while (m.find())
        count++;

这是通用代码,但不是特定的代码,请对其进行调整以满足您的需要

如果有任何错误,请随时纠正我。

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.