如何将UTF-8字节[]转换为字符串？

931

我有一个byte[]从恰好包含UTF-8的文件加载的数组。

在一些调试代码中，我需要将其转换为字符串。是否有一个班轮可以做到这一点？

在幕后，它应该只是一个分配和一个内存复制，因此即使未实现它也应该是可能的。

— BCS
source

5

“应该只是一个分配和一个内存拷贝”：不正确，因为.NET字符串是UTF-16编码的。Unicode字符可以是一个UTF-8代码单元或一个UTF-16代码单元。另一个可能是两个UTF-8代码单元或一个UTF-16代码单元，另一个可能是三个UTF-8代码单元或一个UTF-16代码单元，另一个可能是四个UTF-8代码单元或两个UTF-16代码单元。内存副本可能能够扩展，但无法处理UTF-8到UTF-16的转换。

— 汤姆·布洛杰特

1468

string result = System.Text.Encoding.UTF8.GetString(byteArray);

— 扎诺尼
source

13

它如何处理以null结尾的字符串？

— maazza

14

@maazza出于未知原因根本没有。我这样称呼System.Text.Encoding.UTF8.GetString(buf).TrimEnd('\0');。

— Hi-Angel

15

@ Hi-Angel未知原因？以null终止的字符串变得流行的唯一原因是C语言-甚至那仅是由于历史上的古怪（处理以null终止的字符串的CPU指令）。与使用空终止字符串的代码（最终消失）互操作时，.NET仅使用以空终止的字符串。字符串包含NUL字符是完全有效的。当然，尽管以NULL终止的字符串在ASCII中非常简单（直到您获得第一个零字节才进行构建），但是其他编码（包括UTF-8）并不是那么简单。

— 罗安2015年

4

UTF-8的美丽功能之一是，较短的序列永远不会是较长序列的子序列。因此，以null结尾的UTF-8字符串很简单。

— plugwash

10

好吧，如果它没有ascii，请拆开包装。只需使用Convert.ToBase64String。

— Erik Bergstedt 2015年

323

至少有四种不同的方式可以完成此转换。

编码的GetString
，但是如果原始字节具有非ASCII字符，您将无法取回原始字节。
BitConverter.ToString
输出是一个以“-”分隔的字符串，但是没有.NET内置方法将字符串转换回字节数组。
Convert.ToBase64String
您可以使用轻松将输出字符串转换回字节数组Convert.FromBase64String。
注意：输出字符串可以包含“ +”，“ /”和“ =”。如果要在URL中使用字符串，则需要对其进行显式编码。
HttpServerUtility.UrlTokenEncode
您可以使用轻松将输出字符串转换回字节数组HttpServerUtility.UrlTokenDecode。输出字符串已经是URL友好的了！缺点是，System.Web如果您的项目不是Web项目，则需要汇编。

一个完整的例子：

byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters

string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1);  // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results

string s2 = BitConverter.ToString(bytes);   // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
    decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes

string s3 = Convert.ToBase64String(bytes);  // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes

string s4 = HttpServerUtility.UrlTokenEncode(bytes);    // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes

— 虚幻的
source

7

LINQ它：var decBytes2 = str.Split('-').Select(ch => Convert.ToByte(ch, 16)).ToArray();

— drtf

25

当您不知道编码时，从字节数组转换为字符串的一般解决方案：

static string BytesToStringConverted(byte[] bytes)
{
    using (var stream = new MemoryStream(bytes))
    {
        using (var streamReader = new StreamReader(stream))
        {
            return streamReader.ReadToEnd();
        }
    }
}

— 尼尔
source

3

但这假设字节流中有一个编码BOM或它在UTF-8中。但是无论如何，您都可以使用“编码”来做同样的事情。当您不知道编码时，它不能神奇地解决问题。

— 塞巴斯蒂安·赞德

12

定义：

public static string ConvertByteToString(this byte[] source)
{
    return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}

使用：

string result = input.ConvertByteToString();

— ErçinDedeoğlu
source

9

将a转换byte[]为a string看起来很简单，但是任何形式的编码都可能使输出字符串混乱。这个小功能可以正常工作而不会产生任何意外结果：

private string ToString(byte[] bytes)
{
    string response = string.Empty;

    foreach (byte b in bytes)
        response += (Char)b;

    return response;
}

— 杰德
source

当我用Convert.FromBase64String解压缩它时，使用您的方法收到了System.FormatException。

— Erik Bergstedt 2015年

@ AndrewJE如果您有一个大的字节数组（如图片中使用的数组），则甚至需要计算。

— user3841581

7

使用(byte)b.ToString("x2")，输出b4b5dfe475e58b67

public static class Ext {

    public static string ToHexString(this byte[] hex)
    {
        if (hex == null) return null;
        if (hex.Length == 0) return string.Empty;

        var s = new StringBuilder();
        foreach (byte b in hex) {
            s.Append(b.ToString("x2"));
        }
        return s.ToString();
    }

    public static byte[] ToHexBytes(this string hex)
    {
        if (hex == null) return null;
        if (hex.Length == 0) return new byte[0];

        int l = hex.Length / 2;
        var b = new byte[l];
        for (int i = 0; i < l; ++i) {
            b[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
        }
        return b;
    }

    public static bool EqualsTo(this byte[] bytes, byte[] bytesToCompare)
    {
        if (bytes == null && bytesToCompare == null) return true; // ?
        if (bytes == null || bytesToCompare == null) return false;
        if (object.ReferenceEquals(bytes, bytesToCompare)) return true;

        if (bytes.Length != bytesToCompare.Length) return false;

        for (int i = 0; i < bytes.Length; ++i) {
            if (bytes[i] != bytesToCompare[i]) return false;
        }
        return true;
    }

}

— 隐喻
source

4

还有一个类UnicodeEncoding，用法非常简单：

ByteConverter = new UnicodeEncoding();
string stringDataForEncoding = "My Secret Data!";
byte[] dataEncoded = ByteConverter.GetBytes(stringDataForEncoding);

Console.WriteLine("Data after decoding: {0}", ByteConverter.GetString(dataEncoded));

— PK
source

但是不是UTF-8方法吗？

— david.pfx

1

UnicodeEncoding是有史以来最差的名字；unicode根本不是编码。该类实际上是UTF-16。我认为是小端版本。

— Nyerguds

3

或者：

 var byteStr = Convert.ToBase64String(bytes);

— 费尔
source

2

一个Linq一线式byteArrFilename将从文件中读取的字节数组转换为纯ascii C样式的零终止字符串是这样的：方便读取旧存档格式的文件索引表之类的内容。

String filename = new String(byteArrFilename.TakeWhile(x => x != 0)
                              .Select(x => x < 128 ? (Char)x : '?').ToArray());

我'?'在这里将所有非纯ascii用作默认字符，但是可以更改。如果您想确保可以检测到它，请改用它，'\0'因为TakeWhile开始时请确保以这种方式构建的字符串不可能包含'\0'来自输入源的值。

— Nyerguds
source

2

BitConverter类可用于将转换byte[]为string。

var convertedString = BitConverter.ToString(byteAttay);

BitConverter类的文档可以在MSDN上找到

— 萨加尔
source

1

这会将字节数组转换为代表每个字节的十六进制字符串，通常这不是将字节转换为字符串时想要的。如果这样做，那么这是另一个问题，例如，请参见如何将字节数组转换为十六进制字符串，反之亦然？。

— CodeCaster

OP没问到什么

— 冬季

2

据我所知，给出的答案都不能保证零终止的正确行为。在有人给我不同的显示之前，我用以下方法编写了自己的静态类来处理此问题：

// Mimics the functionality of strlen() in c/c++
// Needed because niether StringBuilder or Encoding.*.GetString() handle \0 well
static int StringLength(byte[] buffer, int startIndex = 0)
{
    int strlen = 0;
    while
    (
        (startIndex + strlen + 1) < buffer.Length // Make sure incrementing won't break any bounds
        && buffer[startIndex + strlen] != 0       // The typical null terimation check
    )
    {
        ++strlen;
    }
    return strlen;
}

// This is messy, but I haven't found a built-in way in c# that guarentees null termination
public static string ParseBytes(byte[] buffer, out int strlen, int startIndex = 0)
{
    strlen = StringLength(buffer, startIndex);
    byte[] c_str = new byte[strlen];
    Array.Copy(buffer, startIndex, c_str, 0, strlen);
    return Encoding.UTF8.GetString(c_str);
}

的原因startIndex是在我正在研究的示例中，我需要将a解析byte[]为以空终止的字符串组成的数组。在简单情况下可以安全地忽略它

— 同化
source

我的确实如此。byteArr.TakeWhile(x => x != 0)是解决空终止问题的快速简便的方法。

— Nyerguds

1

hier是您不必费心编码的结果。我在网络类中使用了它，并以字符串形式发送二进制对象。

        public static byte[] String2ByteArray(string str)
        {
            char[] chars = str.ToArray();
            byte[] bytes = new byte[chars.Length * 2];

            for (int i = 0; i < chars.Length; i++)
                Array.Copy(BitConverter.GetBytes(chars[i]), 0, bytes, i * 2, 2);

            return bytes;
        }

        public static string ByteArray2String(byte[] bytes)
        {
            char[] chars = new char[bytes.Length / 2];

            for (int i = 0; i < chars.Length; i++)
                chars[i] = BitConverter.ToChar(bytes, i * 2);

            return new string(chars);
        }

— 马可·帕多（Marco Pardo）
source

没有一个。但是此功能已用于我们公司网络中的二进制传输，到目前为止，已对20TB进行了重新编码并正确编码。因此对我来说，此功能有效:)

— Marco Pardo

1

除选择的答案外，如果使用的是.NET35或.NET35 CE，则必须指定要解码的第一个字节的索引以及要解码的字节数：

string result = System.Text.Encoding.UTF8.GetString(byteArray,0,byteArray.Length);

— 唯一的那个
source

0

试试这个控制台应用程序：

static void Main(string[] args)
{
    //Encoding _UTF8 = Encoding.UTF8;
    string[] _mainString = { "Héllo World" };
    Console.WriteLine("Main String: " + _mainString);

    //Convert a string to utf-8 bytes.
    byte[] _utf8Bytes = Encoding.UTF8.GetBytes(_mainString[0]);

    //Convert utf-8 bytes to a string.
    string _stringuUnicode = Encoding.UTF8.GetString(_utf8Bytes);
    Console.WriteLine("String Unicode: " + _stringuUnicode);
}

— RM Shahidul伊斯兰教Shahed
source

0

我在这篇文章中看到了一些答案，有可能被认为是完整的基础知识，因为在C＃编程中有几种方法可以解决相同的问题。仅需考虑的一件事是Pure UTF-8与带有BOM的UTF-8之间的区别。

在上周的工作中，我需要开发一种功能，该功能可以输出带BOM的CSV文件和带有纯UTF-8（不带BOM）的其他CSV，每种CSV文件的编码类型将由不同的非标准化API使用， API读取带有BOM的UTF-8，而其他API读取不带有BOM的东西。我需要研究有关此概念的参考，阅读“ 没有BOM的UTF-8和UTF-8有什么区别？ ”堆栈溢出讨论和此Wikipedia链接“ 字节顺序标记 ”来构建我的方法。

最后，我的两种UTF-8编码类型（带有BOM和pure）的C＃编程都必须类似于以下示例：

//for UTF-8 with B.O.M., equals shared by Zanoni (at top)
string result = System.Text.Encoding.UTF8.GetString(byteArray);

//for Pure UTF-8 (without B.O.M.)
string result = (new UTF8Encoding(false)).GetString(byteArray);

— 安东尼奥·莱昂纳多
source