使用正则表达式提取Java中的值

169

我有一些粗略的字符串：

[some text] [some number] [some more text]

我想使用Java Regex类提取[一些]中的文本。

我大致知道我想使用什么正则表达式（尽管欢迎所有建议）。我真正感兴趣的是Java调用以获取正则表达式字符串并将其用于源数据以产生[some number]的值。

编辑：我应该补充一点，我只对单个[一些数字]（基本上是第一个实例）感兴趣。源字符串很短，我不会寻找[some number]的多次出现。

java regex

— 克雷格·沃克
source

11

...现在我要去研究。在我自己弄清楚SO之前，让我们看看SO是否能为我找到答案。：-P

— Craig Walker

这是在银行/投资/交易公司进行软件工程的面试问题，对吗？：P

— 6

@ennth不，甚至没有接近！许多个月前，这是在小型公司网站上用于生产代码的。

— 克雷格·沃克

1

该死的好几天前，我在JP Morgan Chase软件工程编码考试中被问到几乎完全相同的问题：P

— 6

316

完整示例：

private static final Pattern p = Pattern.compile("^([a-zA-Z]+)([0-9]+)(.*)");
public static void main(String[] args) {
    // create matcher for pattern p and given string
    Matcher m = p.matcher("Testing123Testing");

    // if an occurrence if a pattern was found in a given string...
    if (m.find()) {
        // ...then you can use group() methods.
        System.out.println(m.group(0)); // whole matched expression
        System.out.println(m.group(1)); // first expression from round brackets (Testing)
        System.out.println(m.group(2)); // second one (123)
        System.out.println(m.group(3)); // third one (Testing)
    }
}

由于您要查找第一个数字，因此可以使用以下正则表达式：

^\D+(\d+).*

并m.group(1)会返回您的第一个电话号码。请注意，带符号的数字可以包含减号：

^\D+(-?\d+).*

— 艾伦·拉隆德（Allain Lalonde）
source

62

不要忘记重用Patter对象。模式的编译需要大量时间。

— 拉斯蒂斯拉夫·科马拉

14

同意通常，我会将模式定义为私有的静态最终模式PATTERN = Pattern.compile（“ ...”）; 但这就是我。

— 艾伦·拉隆德

6

我们可以简单地使用Pattern p = Pattern.compile（“ \\ d +”）;

— javaMan 2011年

15

没有解释，这是一个糟糕的答案。

— 马丁·斯帕默

您也可以重用Matcher。在每次使用之间调用Matcher的reset（）方法。如果要在多个并发线程之间共享匹配器，则应同步操作。

— Marquez 2014年

41

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Regex1 {
    public static void main(String[]args) {
        Pattern p = Pattern.compile("\\d+");
        Matcher m = p.matcher("hello1234goodboy789very2345");
        while(m.find()) {
            System.out.println(m.group());
        }
    }
}

输出：

1234
789
2345

— javaMan
source

该问题专门要求仅首先出现数字。

— NoBrainer 2015年

34

Allain基本上具有Java代码，因此您可以使用它。但是，仅当您的数字前面仅带有单词字符流时，他的表达式才匹配。

"(\\d+)"

应该能够找到第一个数字字符串。如果您确定它是第一个数字字符串，则无需指定它之前的内容。同样，除非有必要，否则没有必要指定后面的内容。如果您只想要数字，并确保它是一个或多个数字的第一个字符串，那么这就是您所需要的。

如果您希望它被空格抵消，则指定它会使其更加鲜明

"\\s+(\\d+)\\s+"

可能会更好。

如果您需要全部三个部分，则可以这样做：

"(\\D+)(\\d+)(.*)"

编辑 Allain和Jack给出的表达式建议您需要指定一些非数字子集以捕获数字。如果您告诉正则表达式引擎，\d那么它将忽略数字前的所有内容。如果J或A的表达适合你的模式，那么整个比赛等于该输入字符串。而且没有理由指定它。如果没有完全忽略它，它可能会使干净的比赛变慢。

— 阿克斯曼
source

您可以通过运行样本测试并检查其与A / J解决方案的性能来测试Axemans的假设。

— anjanb

您不需要指定字符串的开头和结尾。否则即使124xxx123xxx之类的内容不符合他的语法，也会被匹配？还是^和$是隐式的？

— Allain Lalonde

阿兰，你的也会失败。您和杰克都假设非数字字符将在数字之前。他们要么做，要么不做。在这种情况下，这些表达式都不会解析该行。我重复指定的内容，数字的模式就足够了。

— Axeman's

11

除了Pattern之外，Java String类还具有几种可以使用正则表达式的方法，在您的情况下，代码将是：

"ab123abc".replaceFirst("\\D*(\\d*).*", "$1")

其中\\D是一个非数字字符。

— 维塔利（Vitalii Fedorenko）
source

10

在Java 1.4及更高版本中：

String input = "...";
Matcher matcher = Pattern.compile("[^0-9]+([0-9]+)[^0-9]+").matcher(input);
if (matcher.find()) {
    String someNumberStr = matcher.group(1);
    // if you need this to be an int:
    int someNumberInt = Integer.parseInt(someNumberStr);
}

— 杰克·刘
source

8

此函数从字符串中收集所有匹配的序列。在此示例中，它从字符串中获取所有电子邮件地址。

static final String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@"
        + "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";

public List<String> getAllEmails(String message) {      
    List<String> result = null;
    Matcher matcher = Pattern.compile(EMAIL_PATTERN).matcher(message);

    if (matcher.find()) {
        result = new ArrayList<String>();
        result.add(matcher.group());

        while (matcher.find()) {
            result.add(matcher.group());
        }
    }

    return result;
}

为此message = "adf@gmail.com, <another@osiem.osiem>>>> lalala@aaa.pl"它将创建3个元素的列表。

— 卢卡斯·塔拉斯卡（LukaszTaraszka）
source

3

尝试做这样的事情：

Pattern p = Pattern.compile("^.+(\\d+).+");
Matcher m = p.matcher("Testing123Testing");

if (m.find()) {
    System.out.println(m.group(1));
}

— 丁娜宁·温
source

3

-1。由于.+贪婪地消耗字符，因此\d+仅捕获"3"from "123"。另外，在字符串文字内部，您需要转义反斜杠（您的示例将无法编译）。

— Bart Kiers

3

简单的解决方案

// Regexplanation:
// ^       beginning of line
// \\D+    1+ non-digit characters
// (\\d+)  1+ digit characters in a capture group
// .*      0+ any character
String regexStr = "^\\D+(\\d+).*";

// Compile the regex String into a Pattern
Pattern p = Pattern.compile(regexStr);

// Create a matcher with the input String
Matcher m = p.matcher(inputStr);

// If we find a match
if (m.find()) {
    // Get the String from the first capture group
    String someDigits = m.group(1);
    // ...do something with someDigits
}

实用程序类中的解决方案

public class MyUtil {
    private static Pattern pattern = Pattern.compile("^\\D+(\\d+).*");
    private static Matcher matcher = pattern.matcher("");

    // Assumptions: inputStr is a non-null String
    public static String extractFirstNumber(String inputStr){
        // Reset the matcher with a new input String
        matcher.reset(inputStr);

        // Check if there's a match
        if(matcher.find()){
            // Return the number (in the first capture group)
            return matcher.group(1);
        }else{
            // Return some default value, if there is no match
            return null;
        }
    }
}

...

// Use the util function and print out the result
String firstNum = MyUtil.extractFirstNumber("Testing4234Things");
System.out.println(firstNum);

— 无脑
source

1

看你可以用StringTokenizer做到

String str = "as:"+123+"as:"+234+"as:"+345;
StringTokenizer st = new StringTokenizer(str,"as:");

while(st.hasMoreTokens())
{
  String k = st.nextToken();    // you will get first numeric data i.e 123
  int kk = Integer.parseInt(k);
  System.out.println("k string token in integer        " + kk);

  String k1 = st.nextToken();   //  you will get second numeric data i.e 234
  int kk1 = Integer.parseInt(k1);
  System.out.println("new string k1 token in integer   :" + kk1);

  String k2 = st.nextToken();   //  you will get third numeric data i.e 345
  int kk2 = Integer.parseInt(k2);
  System.out.println("k2 string token is in integer   : " + kk2);
}

由于我们将这些数值数据分为三个不同的变量，因此可以在代码中的任何位置使用此数据（以供进一步使用）

— 寿纳克
source

0

怎么样[^\\d]*([0-9]+[\\s]*[.,]{0,1}[\\s]*[0-9]*).*，我认为它会照顾数与小数部分。我加入了空格，并加入,了可能的分隔符。我正在尝试从包含浮点数的字符串中获取数字，并考虑到用户在键入数字时可能会出错并包含空格。

— 阿图罗
source

0

有时，您可以使用java.lang.String中提供的简单.split（“ REGEXP”）方法。例如：

String input = "first,second,third";

//To retrieve 'first' 
input.split(",")[0] 
//second
input.split(",")[1]
//third
input.split(",")[2]

— 用户名
source

0

Pattern p = Pattern.compile("(\\D+)(\\d+)(.*)");
Matcher m = p.matcher("this is your number:1234 thank you");
if (m.find()) {
    String someNumberStr = m.group(2);
    int someNumberInt = Integer.parseInt(someNumberStr);
}

— 穆罕默德雷扎（Mohammadreza Tavakoli）
source

1

请编辑更多信息。不建议使用仅代码和“尝试此”答案，因为它们不包含可搜索的内容，并且不解释为什么有人应该“尝试此”。我们在这里努力成为知识的资源。

— Brian Tompsett-汤莱恩2016年

1

Downvote只重复很久以前给出的正确答案，而没有增加任何附加值

— Forage

-1

如果您正在读取文件，则可以为您提供帮助

              try{
             InputStream inputStream = (InputStream) mnpMainBean.getUploadedBulk().getInputStream();
             BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
             String line;
             //Ref:03
             while ((line = br.readLine()) != null) {
                if (line.matches("[A-Z],\\d,(\\d*,){2}(\\s*\\d*\\|\\d*:)+")) {
                     String[] splitRecord = line.split(",");
                     //do something
                 }
                 else{
                     br.close();
                     //error
                     return;
                 }
             }
                br.close();

             }
         }
         catch (IOException  ioExpception){
             logger.logDebug("Exception " + ioExpception.getStackTrace());
         }

— 寻求者
source