如何在Swift中解码HTML实体?


121

我正在从网站提取JSON文件,收到的字符串之一是:

The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi

如何将类似的内容&#8216转换为正确的字符?

我创建了一个Xcode Playground来演示它:

import UIKit

var error: NSError?
let blogUrl: NSURL = NSURL.URLWithString("http://sophisticatedignorance.net/api/get_recent_summary/")
let jsonData = NSData(contentsOfURL: blogUrl)

let dataDictionary = NSJSONSerialization.JSONObjectWithData(jsonData, options: nil, error: &error) as NSDictionary

var a = dataDictionary["posts"] as NSArray

println(a[0]["title"])

Answers:


157

这个答案最近针对Swift 5.2和iOS 13.4 SDK进行了修订。


没有做到这一点的直接方法,但是您可以使用 NSAttributedString魔术使此过程尽可能轻松(请注意,此方法也将剥离所有HTML标记)。

记住仅从主线程初始化NSAttributedString。它使用WebKit解析下面的HTML,因此是必需的。

// This is a[0]["title"] in your case
let encodedString = "The Weeknd <em>&#8216;King Of The Fall&#8217;</em>"

guard let data = htmlEncodedString.data(using: .utf8) else {
    return
}

let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
    .documentType: NSAttributedString.DocumentType.html,
    .characterEncoding: String.Encoding.utf8.rawValue
]

guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
    return
}

// The Weeknd ‘King Of The Fall’
let decodedString = attributedString.string
extension String {

    init?(htmlEncodedString: String) {

        guard let data = htmlEncodedString.data(using: .utf8) else {
            return nil
        }

        let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
        ]

        guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
            return nil
        }

        self.init(attributedString.string)

    }

}

let encodedString = "The Weeknd <em>&#8216;King Of The Fall&#8217;</em>"
let decodedString = String(htmlEncodedString: encodedString)

54
什么?扩展旨在扩展现有类型以提供新功能。
akashivskyy 2014年

4
我了解您要说的话,但是否定扩展名不是路要走。
akashivskyy 2014年

1
@akashivskyy:要使它与非ASCII字符一起正常工作,必须添加NSCharacterEncodingDocumentAttribute,然后进行比较 stackoverflow.com/a/27898167/1187415
Martin R

13
此方法非常繁琐,建议不要在表视图或网格视图中使用
圭Lodetti

1
这很棒!尽管它阻塞了主线程,但是有什么方法可以在后台线程中运行它吗?
MMV

78

@akashivskyy的答案很好,它演示了如何利用它NSAttributedString来解码HTML实体。一个可能的缺点(如他所说)是所有 HTML标记也被删除了,因此

<strong> 4 &lt; 5 &amp; 3 &gt; 2</strong>

变成

4 < 5 & 3 > 2

在OS X上,有CFXMLCreateStringByUnescapingEntities()哪些工作:

let encoded = "<strong> 4 &lt; 5 &amp; 3 &gt; 2 .</strong> Price: 12 &#x20ac;.  &#64; "
let decoded = CFXMLCreateStringByUnescapingEntities(nil, encoded, nil) as String
println(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12.  @ 

但这在iOS上不可用。

这是一个纯粹的Swift实现。它像&lt;使用字典一样解码字符实体引用,并像&#64或那样解码所有数字字符实体&#x20ac。(请注意,我没有明确列出所有252个HTML实体。)

斯威夫特4:

// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ Substring : Character ] = [
    // XML predefined entities:
    "&quot;"    : "\"",
    "&amp;"     : "&",
    "&apos;"    : "'",
    "&lt;"      : "<",
    "&gt;"      : ">",

    // HTML character entity references:
    "&nbsp;"    : "\u{00a0}",
    // ...
    "&diams;"   : "♦",
]

extension String {

    /// Returns a new string made by replacing in the `String`
    /// all HTML character entity references with the corresponding
    /// character.
    var stringByDecodingHTMLEntities : String {

        // ===== Utility functions =====

        // Convert the number in the string to the corresponding
        // Unicode character, e.g.
        //    decodeNumeric("64", 10)   --> "@"
        //    decodeNumeric("20ac", 16) --> "€"
        func decodeNumeric(_ string : Substring, base : Int) -> Character? {
            guard let code = UInt32(string, radix: base),
                let uniScalar = UnicodeScalar(code) else { return nil }
            return Character(uniScalar)
        }

        // Decode the HTML character entity to the corresponding
        // Unicode character, return `nil` for invalid input.
        //     decode("&#64;")    --> "@"
        //     decode("&#x20ac;") --> "€"
        //     decode("&lt;")     --> "<"
        //     decode("&foo;")    --> nil
        func decode(_ entity : Substring) -> Character? {

            if entity.hasPrefix("&#x") || entity.hasPrefix("&#X") {
                return decodeNumeric(entity.dropFirst(3).dropLast(), base: 16)
            } else if entity.hasPrefix("&#") {
                return decodeNumeric(entity.dropFirst(2).dropLast(), base: 10)
            } else {
                return characterEntities[entity]
            }
        }

        // ===== Method starts here =====

        var result = ""
        var position = startIndex

        // Find the next '&' and copy the characters preceding it to `result`:
        while let ampRange = self[position...].range(of: "&") {
            result.append(contentsOf: self[position ..< ampRange.lowerBound])
            position = ampRange.lowerBound

            // Find the next ';' and copy everything from '&' to ';' into `entity`
            guard let semiRange = self[position...].range(of: ";") else {
                // No matching ';'.
                break
            }
            let entity = self[position ..< semiRange.upperBound]
            position = semiRange.upperBound

            if let decoded = decode(entity) {
                // Replace by decoded character:
                result.append(decoded)
            } else {
                // Invalid entity, copy verbatim:
                result.append(contentsOf: entity)
            }
        }
        // Copy remaining characters to `result`:
        result.append(contentsOf: self[position...])
        return result
    }
}

例:

let encoded = "<strong> 4 &lt; 5 &amp; 3 &gt; 2 .</strong> Price: 12 &#x20ac;.  &#64; "
let decoded = encoded.stringByDecodingHTMLEntities
print(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12.  @

斯威夫特3:

// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
    // XML predefined entities:
    "&quot;"    : "\"",
    "&amp;"     : "&",
    "&apos;"    : "'",
    "&lt;"      : "<",
    "&gt;"      : ">",

    // HTML character entity references:
    "&nbsp;"    : "\u{00a0}",
    // ...
    "&diams;"   : "♦",
]

extension String {

    /// Returns a new string made by replacing in the `String`
    /// all HTML character entity references with the corresponding
    /// character.
    var stringByDecodingHTMLEntities : String {

        // ===== Utility functions =====

        // Convert the number in the string to the corresponding
        // Unicode character, e.g.
        //    decodeNumeric("64", 10)   --> "@"
        //    decodeNumeric("20ac", 16) --> "€"
        func decodeNumeric(_ string : String, base : Int) -> Character? {
            guard let code = UInt32(string, radix: base),
                let uniScalar = UnicodeScalar(code) else { return nil }
            return Character(uniScalar)
        }

        // Decode the HTML character entity to the corresponding
        // Unicode character, return `nil` for invalid input.
        //     decode("&#64;")    --> "@"
        //     decode("&#x20ac;") --> "€"
        //     decode("&lt;")     --> "<"
        //     decode("&foo;")    --> nil
        func decode(_ entity : String) -> Character? {

            if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
                return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 3) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 16)
            } else if entity.hasPrefix("&#") {
                return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 2) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 10)
            } else {
                return characterEntities[entity]
            }
        }

        // ===== Method starts here =====

        var result = ""
        var position = startIndex

        // Find the next '&' and copy the characters preceding it to `result`:
        while let ampRange = self.range(of: "&", range: position ..< endIndex) {
            result.append(self[position ..< ampRange.lowerBound])
            position = ampRange.lowerBound

            // Find the next ';' and copy everything from '&' to ';' into `entity`
            if let semiRange = self.range(of: ";", range: position ..< endIndex) {
                let entity = self[position ..< semiRange.upperBound]
                position = semiRange.upperBound

                if let decoded = decode(entity) {
                    // Replace by decoded character:
                    result.append(decoded)
                } else {
                    // Invalid entity, copy verbatim:
                    result.append(entity)
                }
            } else {
                // No matching ';'.
                break
            }
        }
        // Copy remaining characters to `result`:
        result.append(self[position ..< endIndex])
        return result
    }
}

斯威夫特2:

// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
    // XML predefined entities:
    "&quot;"    : "\"",
    "&amp;"     : "&",
    "&apos;"    : "'",
    "&lt;"      : "<",
    "&gt;"      : ">",

    // HTML character entity references:
    "&nbsp;"    : "\u{00a0}",
    // ...
    "&diams;"   : "♦",
]

extension String {

    /// Returns a new string made by replacing in the `String`
    /// all HTML character entity references with the corresponding
    /// character.
    var stringByDecodingHTMLEntities : String {

        // ===== Utility functions =====

        // Convert the number in the string to the corresponding
        // Unicode character, e.g.
        //    decodeNumeric("64", 10)   --> "@"
        //    decodeNumeric("20ac", 16) --> "€"
        func decodeNumeric(string : String, base : Int32) -> Character? {
            let code = UInt32(strtoul(string, nil, base))
            return Character(UnicodeScalar(code))
        }

        // Decode the HTML character entity to the corresponding
        // Unicode character, return `nil` for invalid input.
        //     decode("&#64;")    --> "@"
        //     decode("&#x20ac;") --> "€"
        //     decode("&lt;")     --> "<"
        //     decode("&foo;")    --> nil
        func decode(entity : String) -> Character? {

            if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
                return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(3)), base: 16)
            } else if entity.hasPrefix("&#") {
                return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(2)), base: 10)
            } else {
                return characterEntities[entity]
            }
        }

        // ===== Method starts here =====

        var result = ""
        var position = startIndex

        // Find the next '&' and copy the characters preceding it to `result`:
        while let ampRange = self.rangeOfString("&", range: position ..< endIndex) {
            result.appendContentsOf(self[position ..< ampRange.startIndex])
            position = ampRange.startIndex

            // Find the next ';' and copy everything from '&' to ';' into `entity`
            if let semiRange = self.rangeOfString(";", range: position ..< endIndex) {
                let entity = self[position ..< semiRange.endIndex]
                position = semiRange.endIndex

                if let decoded = decode(entity) {
                    // Replace by decoded character:
                    result.append(decoded)
                } else {
                    // Invalid entity, copy verbatim:
                    result.appendContentsOf(entity)
                }
            } else {
                // No matching ';'.
                break
            }
        }
        // Copy remaining characters to `result`:
        result.appendContentsOf(self[position ..< endIndex])
        return result
    }
}

10
太好了,谢谢马丁!这是带有HTML实体完整列表的扩展:gist.github.com/mwaterfall/25b4a6a06dc3309d9555我也对其进行了少许修改,以提供替换所产生的距离偏移。这允许正确调整可能受这些替换影响的任何字符串属性或实体(例如,Twitter实体索引)。
迈克尔瀑布

3
@MichaelWaterfall和Martin,这真是太棒了!奇迹般有效!我更新了Swift 2的扩展程序pastebin.com/juHRJ6au谢谢!
圣地亚哥

1
我将此答案转换为与Swift 2兼容,并将其转储到名为StringExtensionHTML的CocoaPod中,以方便使用。请注意,Santiago的Swift 2版本修复了编译时错误,但是strtooul(string, nil, base)完全删除将导致代码不适用于数字字符实体,并在无法识别的实体崩溃(而不是正常失败)。
阿黛拉·张

1
@AdelaChang:实际上我已经在2015年9月将答案转换为Swift2。它仍然可以在Swift 2.2 / Xcode 7.3的警告下进行编译。还是您指的是迈克尔的版本?
Martin R

1
谢谢,有了这个答案,我解决了我的问题:使用NSAttributedString遇到严重的性能问题。
安德烈·穆尼亚尼

27

@akashivskyy扩展的Swift 3版本,

extension String {
    init(htmlEncodedString: String) {
        self.init()
        guard let encodedData = htmlEncodedString.data(using: .utf8) else {
            self = htmlEncodedString
            return
        }

        let attributedOptions: [String : Any] = [
            NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
            NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue
        ]

        do {
            let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
            self = attributedString.string
        } catch {
            print("Error: \(error)")
            self = htmlEncodedString
        }
    }
}

效果很好。原始答案导致了奇怪的崩溃。感谢更新!
Geoherna '16

对于法语字符,我必须使用utf16
塞巴斯蒂安·

23

斯威夫特4


  • 字符串扩展计算变量
  • 没有额外的保护,就可以接球等等。
  • 如果解码失败,则返回原始字符串

extension String {
    var htmlDecoded: String {
        let decoded = try? NSAttributedString(data: Data(utf8), options: [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
        ], documentAttributes: nil).string

        return decoded ?? self
    }
}

1
哇 !可以立即使用Swift 4!。用法//让encode =“ The Weeknd&#8216; The King Of Fall&#8217;” let finalString = encode.htmlDecoded
Naishta '18

2
我喜欢这个答案的简单性。但是,它在后台运行时会导致崩溃,因为它试图在主线程上运行。
杰里米·希克斯

14

@akashivskyy扩展的Swift 2版本

 extension String {
     init(htmlEncodedString: String) {
         if let encodedData = htmlEncodedString.dataUsingEncoding(NSUTF8StringEncoding){
             let attributedOptions : [String: AnyObject] = [
            NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
            NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
        ]

             do{
                 if let attributedString:NSAttributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil){
                     self.init(attributedString.string)
                 }else{
                     print("error")
                     self.init(htmlEncodedString)     //Returning actual string if there is an error
                 }
             }catch{
                 print("error: \(error)")
                 self.init(htmlEncodedString)     //Returning actual string if there is an error
             }

         }else{
             self.init(htmlEncodedString)     //Returning actual string if there is an error
         }
     }
 }

该代码不完整,应绝对避免。错误未得到正确处理。实际上,错误代码将崩溃。发生错误时,您应该将代码更新为至少返回nil。或者,您可以只使用原始字符串初始化。最后,您应该处理错误。事实并非如此。哇!
oyalhi

9

Swift 4版本

extension String {

    init(htmlEncodedString: String) {
        self.init()
        guard let encodedData = htmlEncodedString.data(using: .utf8) else {
            self = htmlEncodedString
            return
        }

        let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
        ]

        do {
            let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
            self = attributedString.string
        } 
        catch {
            print("Error: \(error)")
            self = htmlEncodedString
        }
    }
}

当我尝试使用它时,我收到“错误域= NSCocoaErrorDomain代码= 259”无法打开文件,因为它的格式不正确。” 如果我在主线程上运行完整的do catch,这将消失。我从检查NSAttributedString文档中发现了这一点:“不应从后台线程调用HTML导入器(即,选项字典包括值为html的documentType)。它将尝试与主线程同步,失败,并且超时。”
MickeDG

8
请,rawValue语法 NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue)NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue)可怕。替换为.documentTypeand.characterEncoding
vadian

@MickeDG-您能否解释一下您为解决该错误所做的工作?我零星地得到它。
罗斯·巴比什

@RossBarbish-对不起,罗斯,这太久了,不记得详细信息了。您是否尝试过我在上面的注释中建议的内容,即在主线程上运行完整的do catch?
MickeDG

7
extension String{
    func decodeEnt() -> String{
        let encodedData = self.dataUsingEncoding(NSUTF8StringEncoding)!
        let attributedOptions : [String: AnyObject] = [
            NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
            NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
        ]
        let attributedString = NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil, error: nil)!

        return attributedString.string
    }
}

let encodedString = "The Weeknd &#8216;King Of The Fall&#8217;"

let foo = encodedString.decodeEnt() /* The Weeknd ‘King Of The Fall’ */

关于“周末”:不是“周末”
Peter Mortensen

语法突出显示看起来很奇怪,尤其是最后一行的注释部分。你能修好它吗?
Peter Mortensen

“ The Weeknd”是歌手,是的,这就是他名字的拼写方式。
wLc

5

我一直在寻找一个纯粹的Swift 3.0实用程序来从HTML字符引用转义/转义(即针对macOS和Linux上的服务器端Swift应用程序),但是没有找到任何全面的解决方案,因此我编写了自己的实现:https: //github.com/IBM-Swift/swift-html-entities

软件包HTMLEntities可以与HTML4命名字符引用以及十六进制/十进制数字字符引用一起使用,并且它将根据W3 ​​HTML5规范识别特殊的数字字符引用(即,&#x80;应将其转义为欧元符号(unicode U+20AC),而不应转义为unicode )的字符U+0080,并且某些数字字符引用范围应替换为替换字符U+FFFD转义,并且在转义时)。

用法示例:

import HTMLEntities

// encode example
let html = "<script>alert(\"abc\")</script>"

print(html.htmlEscape())
// Prints ”&lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;"

// decode example
let htmlencoded = "&lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;"

print(htmlencoded.htmlUnescape())
// Prints<script>alert(\"abc\")</script>"

对于OP的示例:

print("The Weeknd &#8216;King Of The Fall&#8217; [Video Premiere] | @TheWeeknd | #SoPhi ".htmlUnescape())
// prints "The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi "

编辑:HTMLEntities现在支持自2.0.0版开始的HTML5命名字符引用。还实现了符合规范的解析。


1
这是始终有效的最通用的答案,不需要在主线程上运行。即使使用最复杂的HTML转义的unicode字符串(如(&nbsp;͡&deg;&nbsp;͜ʖ&nbsp;͡&deg;&nbsp;))也可以使用,而其他答案都无法解决。
斯特凡COPIN

5

斯威夫特4:

最终的解决方案最终对我有用,包括HTML代码,换行符和单引号

extension String {
    var htmlDecoded: String {
        let decoded = try? NSAttributedString(data: Data(utf8), options: [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
            ], documentAttributes: nil).string

        return decoded ?? self
    }
}

用法:

let yourStringEncoded = yourStringWithHtmlcode.htmlDecoded

然后,我不得不应用更多过滤器以消除单引号(例如,nothasntIt's等)和换行符,例如\n

var yourNewString = String(yourStringEncoded.filter { !"\n\t\r".contains($0) })
yourNewString = yourNewString.replacingOccurrences(of: "\'", with: "", options: NSString.CompareOptions.literal, range: nil)

这本质上是其他答案的副本。您所做的就是添加一些显而易见的用法。
rmaddy

有人赞成这个答案,发现它真的很有用,这告诉你什么?
Naishta

@Naishta它告诉您每个人都有不同的意见,没关系
Josh Wolff

3

这就是我的方法。您可以从https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555 Michael Waterfall提及添加实体字典。

extension String {
    func htmlDecoded()->String {

        guard (self != "") else { return self }

        var newStr = self

        let entities = [
            "&quot;"    : "\"",
            "&amp;"     : "&",
            "&apos;"    : "'",
            "&lt;"      : "<",
            "&gt;"      : ">",
        ]

        for (name,value) in entities {
            newStr = newStr.stringByReplacingOccurrencesOfString(name, withString: value)
        }
        return newStr
    }
}

使用的例子:

let encoded = "this is so &quot;good&quot;"
let decoded = encoded.htmlDecoded() // "this is so "good""

要么

let encoded = "this is so &quot;good&quot;".htmlDecoded() // "this is so "good""

1
我不太喜欢这个,但是我还没有发现任何更好的东西,所以这是针对Swift 2.0 gist
jrmgx /

3

优雅的Swift 4解决方案

如果你想要一个字符串,

myString = String(htmlString: encodedString)

将此扩展添加到您的项目中:

extension String {

    init(htmlString: String) {
        self.init()
        guard let encodedData = htmlString.data(using: .utf8) else {
            self = htmlString
            return
        }

        let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
           .documentType: NSAttributedString.DocumentType.html,
           .characterEncoding: String.Encoding.utf8.rawValue
        ]

        do {
            let attributedString = try NSAttributedString(data: encodedData,
                                                          options: attributedOptions,
                                                          documentAttributes: nil)
            self = attributedString.string
        } catch {
            print("Error: \(error.localizedDescription)")
            self = htmlString
        }
    }
}

如果您想使用带有粗体,斜体,链接等的NSAttributedString,

textField.attributedText = try? NSAttributedString(htmlString: encodedString)

将此扩展添加到您的项目中:

extension NSAttributedString {

    convenience init(htmlString html: String) throws {
        try self.init(data: Data(html.utf8), options: [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
            ], documentAttributes: nil)
    }

}

2

@yishus的答案的计算版本

public extension String {
    /// Decodes string with HTML encoding.
    var htmlDecoded: String {
        guard let encodedData = self.data(using: .utf8) else { return self }

        let attributedOptions: [String : Any] = [
            NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
            NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue]

        do {
            let attributedString = try NSAttributedString(data: encodedData,
                                                          options: attributedOptions,
                                                          documentAttributes: nil)
            return attributedString.string
        } catch {
            print("Error: \(error)")
            return self
        }
    }
}

1

斯威夫特4

func decodeHTML(string: String) -> String? {

    var decodedString: String?

    if let encodedData = string.data(using: .utf8) {
        let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
        ]

        do {
            decodedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil).string
        } catch {
            print("\(error.localizedDescription)")
        }
    }

    return decodedString
}

一个解释将是有条理的。例如,它与以前的Swift 4答案有何不同?
Peter Mortensen

1

斯威夫特4.1 +

var htmlDecoded: String {


    let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [

        NSAttributedString.DocumentReadingOptionKey.documentType : NSAttributedString.DocumentType.html,
        NSAttributedString.DocumentReadingOptionKey.characterEncoding : String.Encoding.utf8.rawValue
    ]


    let decoded = try? NSAttributedString(data: Data(utf8), options: attributedOptions
        , documentAttributes: nil).string

    return decoded ?? self
} 

一个解释将是有条理的。例如,它与以前的答案有何不同?使用了哪些Swift 4.1功能?它仅在Swift 4.1中有效,而在以前的版本中无效吗?还是说它可以在Swift 4.1之前运行,例如在Swift 4.0中?
Peter Mortensen

1

斯威夫特4

extension String {
    var replacingHTMLEntities: String? {
        do {
            return try NSAttributedString(data: Data(utf8), options: [
                .documentType: NSAttributedString.DocumentType.html,
                .characterEncoding: String.Encoding.utf8.rawValue
            ], documentAttributes: nil).string
        } catch {
            return nil
        }
    }
}

简单用法

let clean = "Weeknd &#8216;King Of The Fall&#8217".replacingHTMLEntities ?? "default value"

我已经可以听到有人抱怨我的部队解散了可选部队。如果您正在研究HTML字符串编码,但不知道如何处理Swift可选项,那么您就太过领先了。
quequeful

是的,是的(编辑于11月1日22:37,使“简单用法”变得更难理解)
魁北克

1

斯威夫特4

我真的很喜欢使用documentAttributes的解决方案。但是,在表视图单元格中解析文件和/或使用速度可能太慢。我简直无法相信苹果没有为此提供一个不错的解决方案。

作为一种解决方法,我在GitHub上找到了此String Extension,它可以完美运行并且解码速度很快。

因此,对于给定答案缓慢的情况,请参阅此链接中建议的解决方案: https //gist.github.com/mwaterfall/25b4a6a06dc3309d9555

注意:它不会解析HTML标记。


1

更新的答案适用于Swift 3

extension String {
    init?(htmlEncodedString: String) {
        let encodedData = htmlEncodedString.data(using: String.Encoding.utf8)!
        let attributedOptions = [ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]

        guard let attributedString = try? NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil) else {
            return nil
        }
        self.init(attributedString.string)
   }

0

物镜

+(NSString *) decodeHTMLEnocdedString:(NSString *)htmlEncodedString {
    if (!htmlEncodedString) {
        return nil;
    }

    NSData *data = [htmlEncodedString dataUsingEncoding:NSUTF8StringEncoding];
    NSDictionary *attributes = @{NSDocumentTypeDocumentAttribute:     NSHTMLTextDocumentType,
                             NSCharacterEncodingDocumentAttribute:     @(NSUTF8StringEncoding)};
    NSAttributedString *attributedString = [[NSAttributedString alloc]     initWithData:data options:attributes documentAttributes:nil error:nil];
    return [attributedString string];
}

0

带有实际字体大小转换的Swift 3.0版本

通常,如果直接将HTML内容转换为属性字符串,则字体大小会增加。您可以尝试将HTML字符串转换为属性字符串,然后再次返回以查看区别。

相反,这是实际的大小转换,通过对所有字体应用0.75的比率来确保字体大小不变:

extension String {
    func htmlAttributedString() -> NSAttributedString? {
        guard let data = self.data(using: String.Encoding.utf16, allowLossyConversion: false) else { return nil }
        guard let attriStr = try? NSMutableAttributedString(
            data: data,
            options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
            documentAttributes: nil) else { return nil }
        attriStr.beginEditing()
        attriStr.enumerateAttribute(NSFontAttributeName, in: NSMakeRange(0, attriStr.length), options: .init(rawValue: 0)) {
            (value, range, stop) in
            if let font = value as? UIFont {
                let resizedFont = font.withSize(font.pointSize * 0.75)
                attriStr.addAttribute(NSFontAttributeName,
                                         value: resizedFont,
                                         range: range)
            }
        }
        attriStr.endEditing()
        return attriStr
    }
}

0

斯威夫特4

extension String {

    mutating func toHtmlEncodedString() {
        guard let encodedData = self.data(using: .utf8) else {
            return
        }

        let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
            NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue): NSAttributedString.DocumentType.html,
            NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue): String.Encoding.utf8.rawValue
        ]

        do {
            let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
            self = attributedString.string
        }
        catch {
            print("Error: \(error)")
        }
    }

请,rawValue语法 NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue)NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue)可怕。将其替换为.documentType.characterEncoding
vadian

该解决方案的性能令人震惊。对于单独的caes可能没问题,不建议解析文件。
文森特


0

Swift 5.1版本

import UIKit

extension String {

    init(htmlEncodedString: String) {
        self.init()
        guard let encodedData = htmlEncodedString.data(using: .utf8) else {
            self = htmlEncodedString
            return
        }

        let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
        ]

        do {
            let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
            self = attributedString.string
        } 
        catch {
            print("Error: \(error)")
            self = htmlEncodedString
        }
    }
}

另外,如果您要提取日期,图像,元数据,标题和描述,则可以使用我的名为:

] [1]

可读性套件


是什么让它无法在某些早期版本(Swift 5.0,Swift 4.1,Swift 4.0等)中运行?
Peter Mortensen

使用collectionViews解码字符串时发现错误
Tung Vu Duc

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.