Swift提取正则表达式匹配


175

我想从匹配正则表达式模式的字符串中提取子字符串。

所以我正在寻找这样的东西:

func matchesForRegexInText(regex: String!, text: String!) -> [String] {
   ???
}

这就是我所拥有的:

func matchesForRegexInText(regex: String!, text: String!) -> [String] {

    var regex = NSRegularExpression(pattern: regex, 
        options: nil, error: nil)

    var results = regex.matchesInString(text, 
        options: nil, range: NSMakeRange(0, countElements(text))) 
            as Array<NSTextCheckingResult>

    /// ???

    return ...
}

问题是,这matchesInString为我提供了NSTextCheckingResultwhere NSTextCheckingResult.range类型的数组NSRange

NSRange与不兼容Range<String.Index>,因此它阻止了我使用text.substringWithRange(...)

任何想法如何在没有太多代码行的情况下迅速实现这一简单目标?

Answers:


313

即使该matchesInString()方法将a String作为第一个参数,它也将在内部使用NSString,并且range参数必须使用NSStringlength而不是Swift字符串的长度来给出。否则,对于“扩展字素簇”(例如“标志”),它将失败。

Swift 4(Xcode 9)开始,Swift标准库提供了在Range<String.Index> 和之间进行转换的函数NSRange

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let results = regex.matches(in: text,
                                    range: NSRange(text.startIndex..., in: text))
        return results.map {
            String(text[Range($0.range, in: text)!])
        }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

例:

let string = "🇩🇪€4€9"
let matched = matches(for: "[0-9]", in: string)
print(matched)
// ["4", "9"]

注意:强制展开Range($0.range, in: text)!是安全的,因为NSRange引用是指给定string的子字符串text。但是,如果要避免这种情况,请使用

        return results.flatMap {
            Range($0.range, in: text).map { String(text[$0]) }
        }

代替。


(Swift 3和更早版本的较旧答案:)

因此,您应该将给定的Swift字符串转换为NSString,然后提取范围。结果将自动转换为Swift字符串数组。

(可在编辑历史记录中找到Swift 1.2的代码。)

Swift 2(Xcode 7.3.1):

func matchesForRegexInText(regex: String, text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matchesInString(text,
                                            options: [], range: NSMakeRange(0, nsString.length))
        return results.map { nsString.substringWithRange($0.range)}
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

例:

let string = "🇩🇪€4€9"
let matches = matchesForRegexInText("[0-9]", text: string)
print(matches)
// ["4", "9"]

斯威夫特3(Xcode 8)

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let nsString = text as NSString
        let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range)}
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

例:

let string = "🇩🇪€4€9"
let matched = matches(for: "[0-9]", in: string)
print(matched)
// ["4", "9"]

9
你救了我免于发疯。不开玩笑。非常感谢!
mitchkman

1
@MathijsSegers:我已经更新了Swift 1.2 / Xcode 6.3的代码。谢谢你让我知道!
Martin R

1
但是,如果我想搜索标签之间的字符串怎么办?我需要相同的结果(匹配信息),例如:regex101.com/r/cU6jX8/2。您会建议哪种正则表达式模式?
Peter Kreinz

更新为夫特1.2,不夫特2.代码不与夫特2.编译
PatrickNLT

1
谢谢!如果您只想提取正则表达式()之间的实际内容,该怎么办?例如,在“ [0-9] {3}([0-9] {6})”中,我只想获取最后6个数字。
p4bloch 2015年

64

我的答案建立在给定答案的基础上,但通过添加其他支持,使正则表达式匹配更加可靠:

  • 不仅返回匹配项,还返回每个匹配项的所有捕获组(请参见下面的示例)
  • 此解决方案不返回空数组,而是支持可选匹配
  • 避免do/catch通过不打印到控制台而使用guard结构
  • matchingStrings作为扩展添加String

斯威夫特4.2

//: Playground - noun: a place where people can play

import Foundation

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.range(at: $0).location != NSNotFound
                    ? nsString.substring(with: result.range(at: $0))
                    : ""
            }
        }
    }
}

"prefix12 aaa3 prefix45".matchingStrings(regex: "fix([0-9])([0-9])")
// Prints: [["fix12", "1", "2"], ["fix45", "4", "5"]]

"prefix12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["prefix12", "12"]]

"12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["12", "12"]], other answers return an empty array here

// Safely accessing the capture of the first match (if any):
let number = "prefix12suffix".matchingStrings(regex: "fix([0-9]+)su").first?[1]
// Prints: Optional("12")

迅捷3

//: Playground - noun: a place where people can play

import Foundation

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.rangeAt($0).location != NSNotFound
                    ? nsString.substring(with: result.rangeAt($0))
                    : ""
            }
        }
    }
}

"prefix12 aaa3 prefix45".matchingStrings(regex: "fix([0-9])([0-9])")
// Prints: [["fix12", "1", "2"], ["fix45", "4", "5"]]

"prefix12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["prefix12", "12"]]

"12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["12", "12"]], other answers return an empty array here

// Safely accessing the capture of the first match (if any):
let number = "prefix12suffix".matchingStrings(regex: "fix([0-9]+)su").first?[1]
// Prints: Optional("12")

迅捷2

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matchesInString(self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.rangeAtIndex($0).location != NSNotFound
                    ? nsString.substringWithRange(result.rangeAtIndex($0))
                    : ""
            }
        }
    }
}

1
关于捕获组的好主意。但是,为什么“后卫”比“做/抓”更快?
马丁R

我同意nshipster.com/guard-and-defer之类的人的意见,他们说Swift 2.0肯定会鼓励早期返回的风格,而不是嵌套if语句。嵌套的do / catch语句恕我直言也是如此。
拉尔斯·布隆伯格

try / catch是Swift中的本机错误处理。try?如果您仅对调用结果感兴趣,而对可能的错误消息不感兴趣,则可以使用。是的,guard try? ..这很好,但是如果您要打印错误,则需要执行do-block。两种方式都是Swifty。
马丁R

3
我已将单元测试添加到您的漂亮代码段中gist.github.com/neoneye/03cbb26778539ba5eb609d16200e4522
neoneye 2016年

1
即将基于@MartinR答案写我自己的文章,直到我看到为止。谢谢!
Oritm '17年

13

如果要从字符串中提取子字符串,而不仅仅是位置(而是实际的字符串,包括表情符号)。然后,以下可能是一个更简单的解决方案。

extension String {
  func regex (pattern: String) -> [String] {
    do {
      let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions(rawValue: 0))
      let nsstr = self as NSString
      let all = NSRange(location: 0, length: nsstr.length)
      var matches : [String] = [String]()
      regex.enumerateMatchesInString(self, options: NSMatchingOptions(rawValue: 0), range: all) {
        (result : NSTextCheckingResult?, _, _) in
        if let r = result {
          let result = nsstr.substringWithRange(r.range) as String
          matches.append(result)
        }
      }
      return matches
    } catch {
      return [String]()
    }
  }
} 

用法示例:

"someText 👿🏅👿⚽️ pig".regex("👿⚽️")

将返回以下内容:

["👿⚽️"]

请注意,使用“ \ w +”可能会产生意外的“”

"someText 👿🏅👿⚽️ pig".regex("\\w+")

将返回此String数组

["someText", "️", "pig"]

1
这就是我想要的
Kyle KIM,2016年

1
真好!对于Swift 3,它需要进行一些调整,但这很棒。
耶尔

@Jelle需要什么调整?我正在使用Swift 5.1.3
Peter Schorn

9

我发现不幸的是,接受的答案的解决方案无法在Linux的Swift 3上编译。这是一个修改后的版本,它可以:

import Foundation

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try RegularExpression(pattern: regex, options: [])
        let nsString = NSString(string: text)
        let results = regex.matches(in: text, options: [], range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

主要区别在于:

  1. Linux上的Swift似乎需要删除NS没有Swift本地等效项的Foundation对象的前缀。(请参阅Swift发展建议书#86。)

  2. Linux上的Swift也需要optionsRegularExpression初始化和matches方法指定参数。

  3. 出于某种原因,强迫一StringNSString不斯威夫特工作在Linux,但初始化一个新的NSString具有String作为源不工作。

此版本还可以在macOS / Xcode上与Swift 3一起使用,唯一的例外是必须使用名称NSRegularExpression代替RegularExpression


5

@ p4bloch如果要从一系列捕获括号中捕获结果,则需要使用的rangeAtIndex(index)方法NSTextCheckingResult,而不是range。这是上面针对Swift2的@MartinR的方法,适用于捕获括号。在返回的数组中,第一个结果[0]是整个捕获,然后各个捕获组从开始[1]。我注释掉了该map操作(以便更轻松地查看更改),然后将其替换为嵌套循环。

func matches(for regex: String!, in text: String!) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matchesInString(text, options: [], range: NSMakeRange(0, nsString.length))
        var match = [String]()
        for result in results {
            for i in 0..<result.numberOfRanges {
                match.append(nsString.substringWithRange( result.rangeAtIndex(i) ))
            }
        }
        return match
        //return results.map { nsString.substringWithRange( $0.range )} //rangeAtIndex(0)
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

一个示例用例可能是,例如,您想分割一个字符串,title year例如“ Finding Dory 2016”,您可以这样做:

print ( matches(for: "^(.+)\\s(\\d{4})" , in: "Finding Dory 2016"))
// ["Finding Dory 2016", "Finding Dory", "2016"]

这个答案让我很开心。我花了2个小时来寻找一个可以满足常规表达并能捕获更多群体的解决方案。
艾哈迈德(Ahmad)

这可以工作,但是如果找不到任何范围,它将崩溃。我修改了这段代码,以便该函数返回,[String?]并且在该for i in 0..<result.numberOfRanges块中,您必须添加一个仅在范围!=时才追加匹配项的测试NSNotFound,否则应追加nil。请参阅:stackoverflow.com/a/31892241/2805570
Stef,

4

没有NSString的Swift 4。

extension String {
    func matches(regex: String) -> [String] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: [.caseInsensitive]) else { return [] }
        let matches  = regex.matches(in: self, options: [], range: NSMakeRange(0, self.count))
        return matches.map { match in
            return String(self[Range(match.range, in: self)!])
        }
    }
}

请谨慎使用上述解决方案:NSMakeRange(0, self.count)不正确,因为selfString(= UTF8)而不是NSString(= UTF16)。因此,self.count不一定与nsString.length(在其他解决方案中使用的)相同。您可以将范围计算替换为NSRange(self.startIndex..., in: self)
pd95

3

上面的大多数解决方案仅给出完全匹配的结果,因此会忽略捕获组,例如:^ \ d + \ s +(\ d +)

为了使捕获组符合预期,您需要类似(Swift4)的内容:

public extension String {
    public func capturedGroups(withRegex pattern: String) -> [String] {
        var results = [String]()

        var regex: NSRegularExpression
        do {
            regex = try NSRegularExpression(pattern: pattern, options: [])
        } catch {
            return results
        }
        let matches = regex.matches(in: self, options: [], range: NSRange(location:0, length: self.count))

        guard let match = matches.first else { return results }

        let lastRangeIndex = match.numberOfRanges - 1
        guard lastRangeIndex >= 1 else { return results }

        for i in 1...lastRangeIndex {
            let capturedGroupIndex = match.range(at: i)
            let matchedString = (self as NSString).substring(with: capturedGroupIndex)
            results.append(matchedString)
        }

        return results
    }
}

这是伟大的,如果你想只是第一个结果,让每一个结果,它需要for index in 0..<matches.count {各地let lastRange... results.append(matchedString)}
杰夫·

for子句应如下所示:for i in 1...lastRangeIndex { let capturedGroupIndex = match.range(at: i) if capturedGroupIndex.location != NSNotFound { let matchedString = (self as NSString).substring(with: capturedGroupIndex) results.append(matchedString.trimmingCharacters(in: .whitespaces)) } }
CRE8IT

2

这就是我的操作方式,我希望它为Swift的工作方式带来新的视角。

在下面的示例中,我将获取之间的任何字符串 []

var sample = "this is an [hello] amazing [world]"

var regex = NSRegularExpression(pattern: "\\[.+?\\]"
, options: NSRegularExpressionOptions.CaseInsensitive 
, error: nil)

var matches = regex?.matchesInString(sample, options: nil
, range: NSMakeRange(0, countElements(sample))) as Array<NSTextCheckingResult>

for match in matches {
   let r = (sample as NSString).substringWithRange(match.range)//cast to NSString is required to match range format.
    println("found= \(r)")
}

2

这是一个非常简单的解决方案,它返回带有匹配项的字符串数组

雨燕3。

internal func stringsMatching(regularExpressionPattern: String, options: NSRegularExpression.Options = []) -> [String] {
        guard let regex = try? NSRegularExpression(pattern: regularExpressionPattern, options: options) else {
            return []
        }

        let nsString = self as NSString
        let results = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))

        return results.map {
            nsString.substring(with: $0.range)
        }
    }

2

在Swift 5中返回所有匹配和捕获组的最快方法

extension String {
    func match(_ regex: String) -> [[String]] {
        let nsString = self as NSString
        return (try? NSRegularExpression(pattern: regex, options: []))?.matches(in: self, options: [], range: NSMakeRange(0, count)).map { match in
            (0..<match.numberOfRanges).map { match.range(at: $0).location == NSNotFound ? "" : nsString.substring(with: match.range(at: $0)) }
        } ?? []
    }
}

返回二维数组的字符串:

"prefix12suffix fix1su".match("fix([0-9]+)su")

返回...

[["fix12su", "12"], ["fix1su", "1"]]

// First element of sub-array is the match
// All subsequent elements are the capture groups

0

非常感谢拉尔斯·布隆伯格他的答案为捕获组和完全匹配与雨燕4,这帮助了我很多。我还为那些想要在他们的正则表达式无效时想要error.localizedDescription响应的人添加了它:

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        do {
            let regex = try NSRegularExpression(pattern: regex)
            let nsString = self as NSString
            let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
            return results.map { result in
                (0..<result.numberOfRanges).map {
                    result.range(at: $0).location != NSNotFound
                        ? nsString.substring(with: result.range(at: $0))
                        : ""
                }
            }
        } catch let error {
            print("invalid regex: \(error.localizedDescription)")
            return []
        }
    }
}

对我来说,将localizedDescription作为错误有助于理解转义的原因,因为它显示了最终regex swift试图实现的内容。

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.