UTF-8字节[]转换为字符串

243

假设我刚刚使用a BufferedInputStream将UTF-8编码的文本文件的字节读入字节数组。我知道我可以使用以下例程将字节转换为字符串，但是有没有比仅遍历字节并转换每个字节更有效/更智能的方式了？

public String openFileToString(byte[] _bytes)
{
    String file_string = "";

    for(int i = 0; i < _bytes.length; i++)
    {
        file_string += (char)_bytes[i];
    }

    return file_string;    
}

java utf-8

— 司可乐
source

17

你为什么不能这样做 String fileString = new String(_bytes,"UTF-8");呢？

— CoolBeans 2011年

1

另外，您可以使用BufferedReader读入char数组。

— 安迪·托马斯

在Java中

— 布鲁诺

@CoolBeans如果知道的话我可以;）谢谢。

— skeryl 2011年

根据文件大小，我不确定是否将整体加载到byte[]内存中并通过new String(_bytes,"UTF-8")（或什至按+=字符串上的大块）进行转换是最有效的。链接InputStreams和Readers可能会更好，特别是在大文件上。

— 布鲁诺

498

查看String的构造函数

String str = new String(bytes, StandardCharsets.UTF_8);

而且，如果您感到懒惰，可以使用Apache Commons IO库将InputStream直接转换为String：

String str = IOUtils.toString(inputStream, StandardCharsets.UTF_8);

— 杰森·尼科尔斯（Jason Nichols）
source

13

或Guava的Charsets.UTF_8（如果您使用的JDK早于1.7）

— siledh

6

如果您也使用19以下的Android API，请使用Guava的Charsets.UTF_8

— Ben Clayton 2014年

并且如果checkstyle说：“非法实例化：应避免java.lang.String的实例化。”然后呢？

— AttilaNeparáczki2014年

1

您可以在此处查看java.nio.charset.Charset.availableCharsets()地图中的所有字符集，而不仅是中的字符集StandardCharsets。而且，如果您想使用其他字符集并且仍然希望防止String构造函数抛出UnsupportedEncodingException，则可以使用java.nio.charset.Charset.forName()

— nyxz 2015年

2

现在不推荐使用IOUtils.toString（inputStream，StandardCharsets.UTF_8）。

— Aung Myat Hein

41

Java String类具有用于将字节数组转换为字符串的内置构造函数。

byte[] byteArray = new byte[] {87, 79, 87, 46, 46, 46};

String value = new String(byteArray, "UTF-8");

— 卡西夫·汗
source

9

要转换utf-8数据，您不能假定字节和字符之间为1-1对应。试试这个：

String file_string = new String(bytes, "UTF-8");

（B。我看到我要按一下“发布您的答案”按钮的速度很慢。）

要将整个文件读取为字符串，请执行以下操作：

public String openFileToString(String fileName) throws IOException
{
    InputStream is = new BufferedInputStream(new FileInputStream(fileName));

    try {
        InputStreamReader rdr = new InputStreamReader(is, "UTF-8");
        StringBuilder contents = new StringBuilder();
        char[] buff = new char[4096];
        int len = rdr.read(buff);
        while (len >= 0) {
            contents.append(buff, 0, len);
        }
        return buff.toString();
    } finally {
        try {
            is.close();
        } catch (Exception e) {
            // log error in closing the file
        }
    }
}

— 泰德·霍普
source

4

您可以String(byte[] bytes) 为此使用构造函数。有关详细信息，请参见此链接。编辑您还必须根据Java文档考虑平台的默认字符集：

通过使用平台的默认字符集解码指定的字节数组来构造新的String。新String的长度是字符集的函数，因此可能不等于字节数组的长度。未指定默认字符集中给定字节无效时此构造函数的行为。当需要对解码过程进行更多控制时，应使用CharsetDecoder类。

— 格塔
source

1

并且如果您的字节不在平台的默认字符集中，则可以使用具有第二个Charset参数的版本来确保转换正确。

— Mike Daniels

1

@MikeDaniels确实，我不想包含所有细节。刚刚编辑了我的答案

— GETah 2011年

2

您可以使用此问题中描述的方法（尤其是从InputStream开始时）：将InputStream读取/转换为String

特别是，如果您不想依赖外部库，则可以尝试以下答案，该答案InputStream通过将读取InputStreamReader到char[]并将其附加到StringBuilder。

— 布鲁诺
source

2

知道您正在处理UTF-8字节数组时，您肯定要使用接受字符集名称的String构造函数。否则，您可能会遇到一些基于字符集编码的安全漏洞。请注意，它会引发UnsupportedEncodingException您必须处理的问题。像这样：

public String openFileToString(String fileName) {
    String file_string;
    try {
        file_string = new String(_bytes, "UTF-8");
    } catch (UnsupportedEncodingException e) {
        // this should never happen because "UTF-8" is hard-coded.
        throw new IllegalStateException(e);
    }
    return file_string;
}

— 阿萨夫
source

2

这是一个简化的函数，它将读取字节并创建一个字符串。它假定您可能已经知道文件的编码格式（否则为默认编码）。

static final int BUFF_SIZE = 2048;
static final String DEFAULT_ENCODING = "utf-8";

public static String readFileToString(String filePath, String encoding) throws IOException {

    if (encoding == null || encoding.length() == 0)
        encoding = DEFAULT_ENCODING;

    StringBuffer content = new StringBuffer();

    FileInputStream fis = new FileInputStream(new File(filePath));
    byte[] buffer = new byte[BUFF_SIZE];

    int bytesRead = 0;
    while ((bytesRead = fis.read(buffer)) != -1)
        content.append(new String(buffer, 0, bytesRead, encoding));

    fis.close();        
    return content.toString();
}

— 斯科特
source

将代码修改为默认值utf-8以匹配OP的问题。

— 斯科特，2014年

1

字符串具有一个构造函数，该构造函数将byte []和charsetname作为参数:)

— 灵魂检查
source

0

这也涉及到迭代，但这比串联字符串好得多，因为它们非常昂贵。

public String openFileToString(String fileName)
{
    StringBuilder s = new StringBuilder(_bytes.length);

    for(int i = 0; i < _bytes.length; i++)
    {
        s.append((char)_bytes[i]);
    }

    return s.toString();    
}

— 吹牛
source

8

亲爱的主 String str = new String(byte[])会很好的。

— zengr 2011年

3

这样可以提高效率，但是不能正确解码utf8数据。

— 特德·霍普

0

为什么不从一开始就得到想要的内容并从文件中读取字符串而不是字节数组？就像是：

BufferedReader in = new BufferedReader(new InputStreamReader( new FileInputStream( "foo.txt"), Charset.forName( "UTF-8"));

然后从中读取Line直到完成。

— Digitaljoel
source

有时，保留原始行定界符很有用。OP可能希望这样做。

— 布鲁诺

0

我用这种方式

String strIn = new String(_bytes, 0, numBytes);

— Anatoliy Pelepetz
source

1

这没有指定字符集，因此您会获得平台默认字符集，该字符集可能不是UTF-8。

— greg-449