在Python 3中将int转换为字节

177

我试图在Python 3中构建此byte对象：

b'3\r\n'

所以我尝试了显而易见的（对我来说），发现了一个奇怪的行为：

>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'

显然：

>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

我无法看到有关为什么字节转换以这种方式阅读文档的任何指示。但是，在此Python问题中，我确实发现了一些有关添加format到字节的惊奇消息（另请参见Python 3字节格式）：

http://bugs.python.org/issue3982

现在与byte（int）之类的奇数的相互作用更差，现在返回零

和：

如果bytes（int）返回该int的ASCIIfication，对我来说将更加方便；但老实说，即使是错误也比这种行为要好。（如果我想要这种行为-我从未有过-我宁愿它是一种类方法，就像“ bytes.zeroes（n）”一样被调用。）

有人可以向我解释这种行为的来源吗？

python python-3.x

— 阿斯特罗胡安路
source

1

与标题相关的：3 .to_bytes

— jfs 2015年

2

从您的问题尚不清楚，您是否需要整数值3或表示数字3的ASCII字符的值（整数值51）。第一个是bytes（[3]）== b'\ x03'。后者是bytes（[ord（'3'）]）== b'3'。

— florisla

177

那就是它的设计方式-很有道理，因为通常，您将调用bytes一个可迭代而不是单个整数：

>>> bytes([3])
b'\x03'

该文档说明这一点，以及文档字符串为bytes：

 >>> help(bytes)
 ...
 bytes(int) -> bytes object of size given by the parameter initialized with null bytes

— 蒂姆·皮茨克
source

25

请注意，以上内容仅适用于python3。在python 2中，bytes它只是的别名str，意味着bytes([3])给您'[3]'。

— botchniaque

8

在Python 3中，请注意，它bytes([n])仅适用于int n从0到255的情况。对于其他情况，它将引发ValueError。

— Acumenus

8

@ABB：不是真的令人惊讶，因为一个字节只能存储0和255之间的值

— 蒂姆Pietzcker

7

还应该注意的是，bytes([3])它与OP的要求仍然不同-即用于以ASCII编码数字“ 3”的字节值，即。bytes([51])，这是b'3'不是b'\x03'。

— lenz

2

bytes(500)创建一个带len == 500的字节串。它不会创建一个对整数500进行编码的字节串。我同意bytes([500])这是行不通的，这也是为什么这也是错误的答案。int.to_bytes()对于> = 3.1的版本，可能是正确的答案。

— weberc2

198

从python 3.2你可以做

>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'

https://docs.python.org/3/library/stdtypes.html#int.to_bytes

def int_to_bytes(x: int) -> bytes:
    return x.to_bytes((x.bit_length() + 7) // 8, 'big')

def int_from_bytes(xbytes: bytes) -> int:
    return int.from_bytes(xbytes, 'big')

因此，x == int_from_bytes(int_to_bytes(x))。请注意，此编码仅适用于无符号（非负）整数。

— 布伦斯高
source

4

虽然这个答案很好，但仅适用于无符号（非负）整数。我已经改写了一个答案，它也适用于有符号整数。

— Acumenus

1

正如问题所提出的那样，这对于b"3"从中获取信息没有帮助3。（它将给出b"\x03"。）

— gsnedders '19

40

您可以使用结构包：

In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'

“>”是字节顺序（big-endian），而“ I”是格式字符。因此，如果您要执行其他操作，则可以具体说明：

In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'

In [13]: struct.pack("B", 1)
Out[13]: '\x01'

这在python 2和python 3上都相同。

注意：逆运算（字节到int）可以使用unpack完成。

— 安迪·海登（Andy Hayden）
source

2

@AndyHayden为了澄清，因为一个结构具有标准尺寸，而不管输入的，I，H，和B工作直到2**k - 1其中k为32，分别为16和8。对于更大的投入，他们提高了struct.error。

— Acumenus

大概投票否决了它，因为它没有回答问题：OP想要知道如何生成b'3\r\n'，即，包含ASCII字符“ 3”而不是ASCII字符“ \ x03”的字节串

— Dave Jones，

1

@DaveJones是什么让您认为那是OP想要的？在接受答案的回报\x03，如果你只是想解决方案b'3'是微不足道的。ABB引用的理由似乎更合理……或至少可以理解。

— 安迪·海登

@DaveJones另外，我添加此答案的原因是因为Google在进行搜索时正将您带到这里。这就是为什么它在这里。

— 安迪·海登

4

这不仅在2和3中工作相同，而且比Python 3.5中的bytes([x])和(x).to_bytes()方法都快。真是出乎意料。

— Mark Ransom'3

25

Python 3.5+ printf为字节引入了％插值（-style格式）：

>>> b'%d\r\n' % 3
b'3\r\n'

请参阅PEP 0461-将％格式添加到字节和字节数组。

在早期版本中，您可以使用str和.encode('ascii')结果：

>>> s = '%d\r\n' % 3
>>> s.encode('ascii')
b'3\r\n'

注意：它与产生的东西int.to_bytes不同：

>>> n = 3
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big') or b'\0'
b'\x03'
>>> b'3' == b'\x33' != '\x03'
True

— f
source

11

该文档说：

bytes(int) -> bytes object of size given by the parameter
              initialized with null bytes

序列：

b'3\r\n'

它是字符“ \ r”（13）和“ \ n”（10）的字符“ 3”（十进制51）。

因此，该方式将这样对待它，例如：

>>> bytes([51, 13, 10])
b'3\r\n'

>>> bytes('3', 'utf8') + b'\r\n'
b'3\r\n'

>>> n = 3
>>> bytes(str(n), 'ascii') + b'\r\n'
b'3\r\n'

在IPython 1.1.0和Python 3.2.3上测试

— 施克里赫
source

1

最后我做bytes(str(n), 'ascii') + b'\r\n'或str(n).encode('ascii') + b'\r\n'。谢谢！:)

— astrojuanlu 2014年

1

@ Juanlu001，"{}\r\n".format(n).encode()我也不认为使用默认的utf8编码有任何危害

— John La Rooy 2015年

6

3的ASCII "\x33"否"\x03"！

这就是python所做的事情，str(3)但是对于字节来说，这是完全错误的，因为应该将它们视为二进制数据的数组，而不应将其当作字符串来使用。

实现所需内容最简单的方法是bytes((3,))，它比bytes([3])因为初始化列表的开销要大得多，因此更好，因此，在可以使用元组时不要使用列表。您可以使用转换较大的整数int.to_bytes(3, "little")。

初始化具有给定长度的字节是有意义的，并且是最有用的，因为它们通常用于创建某种类型的缓冲区，您需要为其分配一些给定大小的内存。我在初始化数组或通过向其写入零来扩展某些文件时经常使用它。

— 巴绍
source

1

这个答案有几个问题：（a）的逃逸符号b'3'is b'\x33'，not b'\x32'。（b）(3)不是元组-您必须添加逗号。（c）用零初始化序列的情况不适用于bytes对象，因为它们是不可变的（bytearray尽管对s 有意义）。

— lenz

谢谢你的评论。我修复了这两个明显的错误。在的情况下，bytes和bytearray，我认为它主要是一致性的问题。但是，如果要将一些零推入缓冲区或文件中，这也很有用，在这种情况下，它仅用作数据源。

— 巴查（Bachsau）'17年

5

int （包括Python2的 long）可以bytes使用以下函数转换为：

import codecs

def int2bytes(i):
    hex_value = '{0:x}'.format(i)
    # make length of hex_value a multiple of two
    hex_value = '0' * (len(hex_value) % 2) + hex_value
    return codecs.decode(hex_value, 'hex_codec')

反向转换可以由另一种完成：

import codecs
import six  # should be installed via 'pip install six'

long = six.integer_types[-1]

def bytes2int(b):
    return long(codecs.encode(b, 'hex_codec'), 16)

这两个函数都可以在Python2和Python3上使用。

— 伦斯基
source

'hex_value ='％x'％i'在Python 3.4下不起作用。您会收到TypeError，因此必须使用hex（）代替。

— bjmc

@bjmc替换为str.format。这应该适用于Python 2.6+。

— renskiy

谢谢，@ renskiy。您可能想使用'hex_codec'而不是'hex'，因为似乎'hex'别名并非在所有Python 3版本上都可用，请参见stackoverflow.com/a/12917604/845210

— bjmc

@bjmc已修复。谢谢

— renskiy

这在python 3.6的负整数上失败

— 狂战士

4

我很好奇范围内单个int的各种方法的性能[0, 255]，因此我决定进行一些时序测试。

基于下面的定时，和从从尝试许多不同的值和结构中观察到的总的趋势，struct.pack似乎是最快，其次int.to_bytes，bytes和与str.encode（勿庸置疑）是最慢的。请注意，结果显示出比所显示的更多的变化，int.to_bytes并且bytes有时在测试过程中切换速度排名，但struct.pack显然是最快的。

Windows上CPython 3.7的结果：

Testing with 63:
bytes_: 100000 loops, best of 5: 3.3 usec per loop
to_bytes: 100000 loops, best of 5: 2.72 usec per loop
struct_pack: 100000 loops, best of 5: 2.32 usec per loop
chr_encode: 50000 loops, best of 5: 3.66 usec per loop

测试模块（名为int_to_byte.py）：

"""Functions for converting a single int to a bytes object with that int's value."""

import random
import shlex
import struct
import timeit

def bytes_(i):
    """From Tim Pietzcker's answer:
    https://stackoverflow.com/a/21017834/8117067
    """
    return bytes([i])

def to_bytes(i):
    """From brunsgaard's answer:
    https://stackoverflow.com/a/30375198/8117067
    """
    return i.to_bytes(1, byteorder='big')

def struct_pack(i):
    """From Andy Hayden's answer:
    https://stackoverflow.com/a/26920966/8117067
    """
    return struct.pack('B', i)

# Originally, jfs's answer was considered for testing,
# but the result is not identical to the other methods
# https://stackoverflow.com/a/31761722/8117067

def chr_encode(i):
    """Another method, from Quuxplusone's answer here:
    https://codereview.stackexchange.com/a/210789/140921

    Similar to g10guang's answer:
    https://stackoverflow.com/a/51558790/8117067
    """
    return chr(i).encode('latin1')

converters = [bytes_, to_bytes, struct_pack, chr_encode]

def one_byte_equality_test():
    """Test that results are identical for ints in the range [0, 255]."""
    for i in range(256):
        results = [c(i) for c in converters]
        # Test that all results are equal
        start = results[0]
        if any(start != b for b in results):
            raise ValueError(results)

def timing_tests(value=None):
    """Test each of the functions with a random int."""
    if value is None:
        # random.randint takes more time than int to byte conversion
        # so it can't be a part of the timeit call
        value = random.randint(0, 255)
    print(f'Testing with {value}:')
    for c in converters:
        print(f'{c.__name__}: ', end='')
        # Uses technique borrowed from https://stackoverflow.com/q/19062202/8117067
        timeit.main(args=shlex.split(
            f"-s 'from int_to_byte import {c.__name__}; value = {value}' " +
            f"'{c.__name__}(value)'"
        ))

— 格雷厄姆
source

1

@ABB如第一句话所述，我仅针对range中的单个int进行测量[0, 255]。我以“错误的指标”来假设您的意思是我的测量不够全面，无法适应大多数情况？还是我的测量方法不佳？如果是后者，我很想听听您要说些什么，但是如果是前者，我从未声称我的度量适用于所有用例。对于我的（也许是利基）情况，我只处理int范围内的int [0, 255]，这就是我打算针对此回答的受众。我的答案不清楚吗？为了清晰起见，我可以对其进行编辑...

— Graham

1

仅对范围的预计算编码编制索引的技术怎么样？预计算将不受时间限制，只有索引会受到限制。

— Acumenus

@ABB这是一个好主意。听起来这将比其他任何东西都快。我会安排一些时间，并在有时间的时候将其添加到此答案中。

— 格雷厄姆

3

如果您真的想对可迭代字节进行计时，则应使用bytes((i,))而不是bytes([i])因为list更为复杂，使用更多的内存并花费很长时间进行初始化。在这种情况下，一无所有。

— 巴绍（Bachsau）

4

尽管brunsgaard的先前答案是有效的编码，但它仅适用于无符号整数。这是它的基础，可同时用于有符号和无符号整数。

def int_to_bytes(i: int, *, signed: bool = False) -> bytes:
    length = ((i + ((i * signed) < 0)).bit_length() + 7 + signed) // 8
    return i.to_bytes(length, byteorder='big', signed=signed)

def bytes_to_int(b: bytes, *, signed: bool = False) -> int:
    return int.from_bytes(b, byteorder='big', signed=signed)

# Test unsigned:
for i in range(1025):
    assert i == bytes_to_int(int_to_bytes(i))

# Test signed:
for i in range(-1024, 1025):
    assert i == bytes_to_int(int_to_bytes(i, signed=True), signed=True)

对于编码器，不仅(i + ((i * signed) < 0)).bit_length()要使用编码器，i.bit_length()因为后者会导致-128，-32768等的无效编码。

图片来源：CervEd，用于解决效率低下的问题。

— cum骨
source

int_to_bytes(-128, signed=True) == (-128).to_bytes(1, byteorder="big", signed=True)是False

— CervEd

您没有使用长度2，而是要计算有符号整数的位长，将7加1，如果是有符号整数，则加1。最后，将其转换为以字节为单位的长度。这就产生了意想不到的效果-128，-32768等等

— CervEd

让我们继续聊天中的讨论。

— CervEd

这就是您如何解决(i+(signed*i<0)).bit_length()

— CervEd

3

该行为来自以下事实：在版本3之前的Python中，bytes它只是的别名str。在Python3.x中bytes是的不可变版本bytearray-全新类型，不向后兼容。

— 怪异的
source

3

从字节文档：

因此，构造函数参数被解释为针对bytearray（）。

然后，从bytearray docs：

可选的source参数可以通过几种不同的方式用于初始化数组：

如果它是整数，则数组将具有该大小，并将使用空字节初始化。

请注意，这与2.x（其中x> = 6）行为不同，其中bytes只是str：

>>> bytes is str
True

PEP 3112：

2.6 str与3.0的字节类型在各种方面有所不同。最值得注意的是，构造函数完全不同。

— 阿尔科
source

0

有些答案不能大量使用。

将整数转换为十六进制表示形式，然后将其转换为字节：

def int_to_bytes(number):
    hrepr = hex(number).replace('0x', '')
    if len(hrepr) % 2 == 1:
        hrepr = '0' + hrepr
    return bytes.fromhex(hrepr)

结果：

>>> int_to_bytes(2**256 - 1)
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

— 马克斯·马里什（Max Malysh）
source

1

“所有其他方法不适用于大量方法。” 事实并非如此，它int.to_bytes可以与任何整数一起使用。

— juanpa.arrivillaga

@ juanpa.arrivillaga是的，我不好。我已经编辑了答案。

— Max Malysh

-1

如果问题是如何将整数本身（而不是等效的字符串）转换为字节，我认为可靠的答案是：

>>> i = 5
>>> i.to_bytes(2, 'big')
b'\x00\x05'
>>> int.from_bytes(i.to_bytes(2, 'big'), byteorder='big')
5

有关这些方法的更多信息，请参见：

— 尼拉什语C
source

1

这与5年前发布的，目前票数最高的答案的brunsgaard的答案有何不同？

— 亚瑟·塔卡