如何将整数转换为以62为底的整数(如十六进制,但具有以下数字:“ 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ”)。
我一直在为它寻找一个好的Python库,但是它们似乎都被转换字符串所占据。Python base64模块仅接受字符串并将一个数字转换为四个字符。我正在寻找类似于URL缩短器使用的东西。
如何将整数转换为以62为底的整数(如十六进制,但具有以下数字:“ 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ”)。
我一直在为它寻找一个好的Python库,但是它们似乎都被转换字符串所占据。Python base64模块仅接受字符串并将一个数字转换为四个字符。我正在寻找类似于URL缩短器使用的东西。
Answers:
没有为此的标准模块,但是我编写了自己的函数来实现这一点。
BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def encode(num, alphabet):
"""Encode a positive number into Base X and return the string.
Arguments:
- `num`: The number to encode
- `alphabet`: The alphabet to use for encoding
"""
if num == 0:
return alphabet[0]
arr = []
arr_append = arr.append # Extract bound-method for faster access.
_divmod = divmod # Access to locals is faster.
base = len(alphabet)
while num:
num, rem = _divmod(num, base)
arr_append(alphabet[rem])
arr.reverse()
return ''.join(arr)
def decode(string, alphabet=BASE62):
"""Decode a Base X encoded string into the number
Arguments:
- `string`: The encoded string
- `alphabet`: The alphabet to use for decoding
"""
base = len(alphabet)
strlen = len(string)
num = 0
idx = 0
for char in string:
power = (strlen - (idx + 1))
num += alphabet.index(char) * (base ** power)
idx += 1
return num
请注意,您可以给它提供任何字母以用于编码和解码的事实。如果您忽略该alphabet
参数,则将获得在第一行代码中定义的62个字符的字母,从而对62个基数进行编码/解码。
希望这可以帮助。
PS-对于URL缩短器,我发现最好省略一些令人困惑的字符,例如0Ol1oI等。因此,为了满足URL缩短的需要,我使用了该字母- "23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"
玩得开心。
$-_.+!*'(),;/?:@&=
您可能也可以使用其他字符,例如[]~
etc
我曾经写过一个脚本也可以做到这一点,我认为它很优雅:)
import string
# Remove the `_@` below for base62, now it has 64 characters
BASE_LIST = string.digits + string.letters + '_@'
BASE_DICT = dict((c, i) for i, c in enumerate(BASE_LIST))
def base_decode(string, reverse_base=BASE_DICT):
length = len(reverse_base)
ret = 0
for i, c in enumerate(string[::-1]):
ret += (length ** i) * reverse_base[c]
return ret
def base_encode(integer, base=BASE_LIST):
if integer == 0:
return base[0]
length = len(base)
ret = ''
while integer != 0:
ret = base[integer % length] + ret
integer /= length
return ret
用法示例:
for i in range(100):
print i, base_decode(base_encode(i)), base_encode(i)
reversed(string)
比string[::-1]
base_decode函数中的切片更快。
integer /= length
为integer //=length
以获取正确的余数
以下解码器制造商可以使用任何合理的基础,并具有更整洁的循环,并在遇到无效字符时给出明确的错误消息。
def base_n_decoder(alphabet):
"""Return a decoder for a base-n encoded string
Argument:
- `alphabet`: The alphabet used for encoding
"""
base = len(alphabet)
char_value = dict(((c, v) for v, c in enumerate(alphabet)))
def f(string):
num = 0
try:
for char in string:
num = num * base + char_value[char]
except KeyError:
raise ValueError('Unexpected character %r' % char)
return num
return f
if __name__ == "__main__":
func = base_n_decoder('0123456789abcdef')
for test in ('0', 'f', '2020', 'ffff', 'abqdef'):
print test
print func(test)
**
循环中没有运算符可逆转Micky-Mouse 。
如果您正在寻找最高效率(例如django),则需要以下内容。该代码结合了Baishampayan Ghose和WoLpH和John Machin的高效方法。
# Edit this list of characters as desired.
BASE_ALPH = tuple("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")
BASE_DICT = dict((c, v) for v, c in enumerate(BASE_ALPH))
BASE_LEN = len(BASE_ALPH)
def base_decode(string):
num = 0
for char in string:
num = num * BASE_LEN + BASE_DICT[char]
return num
def base_encode(num):
if not num:
return BASE_ALPH[0]
encoding = ""
while num:
num, rem = divmod(num, BASE_LEN)
encoding = BASE_ALPH[rem] + encoding
return encoding
您可能还需要提前计算字典。(注意:即使是很长的数字,使用字符串编码也比使用列表显示效率更高。)
>>> timeit.timeit("for i in xrange(1000000): base.base_decode(base.base_encode(i))", setup="import base", number=1)
2.3302059173583984
在2.5秒内对100万个数字进行编码和解码。(2.2Ghz i7-2670QM)
tuple()
周围BASE_ALPH
的环境。在Python中,每个String都是可迭代的。该功能当然是由enumerate()
。因此代码变得更精简了:)
如果使用django框架,则可以使用django.utils.baseconv模块。
>>> from django.utils import baseconv
>>> baseconv.base62.encode(1234567890)
1LY7VK
除了base62,baseconv还定义了base2 / base16 / base36 / base56 / base64。
您可能需要base64,而不是base62。它有一个与URL兼容的版本,因此多余的两个填充符应该不是问题。
这个过程非常简单。考虑到base64代表6位,常规字节代表8。为选择的64个字符中的每个字符分配一个从000000到111111的值,并将这四个值放在一起以匹配一组3个base256字节。对于每3个字节的集合重复一次,并在末尾选择填充字符(通常是0)。
如果您只需要生成一个简短的ID(因为您提到了URL缩短器)而不是对某些内容进行编码/解码,那么该模块可能会有所帮助:
我从这里其他人的帖子中受益匪浅。我最初需要用于Django项目的python代码,但是从那时起,我转向了node.js,所以这是Baishampayan Ghose提供的代码的javascript版本(编码部分)。
var ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
function base62_encode(n, alpha) {
var num = n || 0;
var alphabet = alpha || ALPHABET;
if (num == 0) return alphabet[0];
var arr = [];
var base = alphabet.length;
while(num) {
rem = num % base;
num = (num - rem)/base;
arr.push(alphabet.substring(rem,rem+1));
}
return arr.reverse().join('');
}
console.log(base62_encode(2390687438976, "123456789ABCDEFGHIJKLMNPQRSTUVWXYZ"));
我希望以下代码片段可以有所帮助。
def num2sym(num, sym, join_symbol=''):
if num == 0:
return sym[0]
if num < 0 or type(num) not in (int, long):
raise ValueError('num must be positive integer')
l = len(sym) # target number base
r = []
div = num
while div != 0: # base conversion
div, mod = divmod(div, l)
r.append(sym[mod])
return join_symbol.join([x for x in reversed(r)])
您的情况的用法:
number = 367891
alphabet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
print num2sym(number, alphabet) # will print '1xHJ'
显然,您可以指定另一个字母,该字母由更少或更多的符号组成,然后它将您的数字转换为更少或更多的基数。例如,提供“ 01”作为字母将输出表示输入数字为二进制的字符串。
最初,您可以对字母进行混洗以使数字具有唯一性。如果您要进行URL缩短服务,这可能会有所帮助。
if num < 0 or type(num) not in (int, long):
。
long
Py 3.x中不存在-因此有人可能想使用此答案。
isinstance(x, (type(1), type(2**32)))
。
现在有一个python库。
我正在为此做一个点子包。
我建议您使用受bases.js启发的bases.py https://github.com/kamijoutouma/bases.py
from bases import Bases
bases = Bases()
bases.toBase16(200) // => 'c8'
bases.toBase(200, 16) // => 'c8'
bases.toBase62(99999) // => 'q0T'
bases.toBase(200, 62) // => 'q0T'
bases.toAlphabet(300, 'aAbBcC') // => 'Abba'
bases.fromBase16('c8') // => 200
bases.fromBase('c8', 16) // => 200
bases.fromBase62('q0T') // => 99999
bases.fromBase('q0T', 62) // => 99999
bases.fromAlphabet('Abba', 'aAbBcC') // => 300
请参阅https://github.com/kamijoutouma/bases.py#known-basesalphabets 了解可用的碱基
这是我的解决方案:
def base62(a):
baseit = (lambda a=a, b=62: (not a) and '0' or
baseit(a-a%b, b*62) + '0123456789abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'[a%b%61 or -1*bool(a%b)])
return baseit()
在任何基数中,每个数字均等于。 a1+a2*base**2+a3*base**3...
因此,目标是找到所有a
s。
对于每个N=1,2,3...
代码,aN*base**N
通过“取模” 将所有s大于s的切片隔离开来,然后b
对b=base**(N+1)
所有a
s进行N
切片,a
以使它们的序列小于每次通过当前递归调用该函数N
而减少时的序列。a
aN*base**N
Base%(base-1)==1
因此base**p%(base-1)==1
,因此q*base^p%(base-1)==q
只有一个例外,即何时q==base-1
返回0
。为了解决这种情况,它返回0
。该功能0
从头开始检查。
在此示例中,只有一个乘法(而不是除法)和一些模运算,这些运算都相对较快。
我前一段时间写了这篇文章,而且效果很好(包括负数在内)
def code(number,base):
try:
int(number),int(base)
except ValueError:
raise ValueError('code(number,base): number and base must be in base10')
else:
number,base = int(number),int(base)
if base < 2:
base = 2
if base > 62:
base = 62
numbers = [0,1,2,3,4,5,6,7,8,9,"a","b","c","d","e","f","g","h","i","j",
"k","l","m","n","o","p","q","r","s","t","u","v","w","x","y",
"z","A","B","C","D","E","F","G","H","I","J","K","L","M","N",
"O","P","Q","R","S","T","U","V","W","X","Y","Z"]
final = ""
loc = 0
if number < 0:
final = "-"
number = abs(number)
while base**loc <= number:
loc = loc + 1
for x in range(loc-1,-1,-1):
for y in range(base-1,-1,-1):
if y*(base**x) <= number:
final = "{}{}".format(final,numbers[y])
number = number - y*(base**x)
break
return final
def decode(number,base):
try:
int(base)
except ValueError:
raise ValueError('decode(value,base): base must be in base10')
else:
base = int(base)
number = str(number)
if base < 2:
base = 2
if base > 62:
base = 62
numbers = ["0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f",
"g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v",
"w","x","y","z","A","B","C","D","E","F","G","H","I","J","K","L",
"M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
final = 0
if number.startswith("-"):
neg = True
number = list(number)
del(number[0])
temp = number
number = ""
for x in temp:
number = "{}{}".format(number,x)
else:
neg = False
loc = len(number)-1
number = str(number)
for x in number:
if numbers.index(x) > base:
raise ValueError('{} is out of base{} range'.format(x,str(base)))
final = final+(numbers.index(x)*(base**loc))
loc = loc - 1
if neg:
return -final
else:
return final
对这一切的长度感到抱歉
BASE_LIST = tuple("23456789ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghjkmnpqrstuvwxyz")
BASE_DICT = dict((c, v) for v, c in enumerate(BASE_LIST))
BASE_LEN = len(BASE_LIST)
def nice_decode(str):
num = 0
for char in str[::-1]:
num = num * BASE_LEN + BASE_DICT[char]
return num
def nice_encode(num):
if not num:
return BASE_LIST[0]
encoding = ""
while num:
num, rem = divmod(num, BASE_LEN)
encoding += BASE_LIST[rem]
return encoding
这是一种递归和迭代的方法。迭代的速度要快一些,具体取决于执行次数。
def base62_encode_r(dec):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
return s[dec] if dec < 62 else base62_encode_r(dec / 62) + s[dec % 62]
print base62_encode_r(2347878234)
def base62_encode_i(dec):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ret = ''
while dec > 0:
ret = s[dec % 62] + ret
dec /= 62
return ret
print base62_encode_i(2347878234)
def base62_decode_r(b62):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
if len(b62) == 1:
return s.index(b62)
x = base62_decode_r(b62[:-1]) * 62 + s.index(b62[-1:]) % 62
return x
print base62_decode_r("2yTsnM")
def base62_decode_i(b62):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ret = 0
for i in xrange(len(b62)-1,-1,-1):
ret = ret + s.index(b62[i]) * (62**(len(b62)-i-1))
return ret
print base62_decode_i("2yTsnM")
if __name__ == '__main__':
import timeit
print(timeit.timeit(stmt="base62_encode_r(2347878234)", setup="from __main__ import base62_encode_r", number=100000))
print(timeit.timeit(stmt="base62_encode_i(2347878234)", setup="from __main__ import base62_encode_i", number=100000))
print(timeit.timeit(stmt="base62_decode_r('2yTsnM')", setup="from __main__ import base62_decode_r", number=100000))
print(timeit.timeit(stmt="base62_decode_i('2yTsnM')", setup="from __main__ import base62_decode_i", number=100000))
0.270266867033
0.260915645986
0.344734796766
0.311662500262
3.7.x
当寻找现有的base62脚本时,我找到了一些算法的博士学位的github 。目前,它不适用于当前的Python 3的max-version版本,因此我继续进行修复,并在需要的地方进行了一些重构。我通常不使用Python,并且总是临时使用它,因此YMMV。赖志华博士全归功于他。我只是解决了这个版本的Python。
base62.py
#modified from Dr. Zhihua Lai's original on GitHub
from math import floor
base = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
b = 62;
def toBase10(b62: str) -> int:
limit = len(b62)
res = 0
for i in range(limit):
res = b * res + base.find(b62[i])
return res
def toBase62(b10: int) -> str:
if b <= 0 or b > 62:
return 0
r = b10 % b
res = base[r];
q = floor(b10 / b)
while q:
r = q % b
q = floor(q / b)
res = base[int(r)] + res
return res
try_base62.py
import base62
print("Base10 ==> Base62")
for i in range(999):
print(f'{i} => {base62.toBase62(i)}')
base62_samples = ["gud", "GA", "mE", "lo", "lz", "OMFGWTFLMFAOENCODING"]
print("Base62 ==> Base10")
for i in range(len(base62_samples)):
print(f'{base62_samples[i]} => {base62.toBase10(base62_samples[i])}')
try_base62.py
Base10 ==> Base62
0 => 0
[...]
998 => g6
Base62 ==> Base10
gud => 63377
GA => 2640
mE => 1404
lo => 1326
lz => 1337
OMFGWTFLMFAOENCODING => 577002768656147353068189971419611424
由于回购中没有许可信息,因此我确实提交了PR,因此原始作者至少知道其他人正在使用和修改他们的代码。
简单的递归
"""
This module contains functions to transform a number to string and vice-versa
"""
BASE = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
LEN_BASE = len(BASE)
def encode(num):
"""
This function encodes the given number into alpha numeric string
"""
if num < LEN_BASE:
return BASE[num]
return BASE[num % LEN_BASE] + encode(num//LEN_BASE)
def decode_recursive(string, index):
"""
recursive util function for decode
"""
if not string or index >= len(string):
return 0
return (BASE.index(string[index]) * LEN_BASE ** index) + decode_recursive(string, index + 1)
def decode(string):
"""
This function decodes given string to number
"""
return decode_recursive(string, 0)
有史以来最简单的。
BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def encode_base62(num):
s = ""
while num>0:
num,r = divmod(num,62)
s = BASE62[r]+s
return s
def decode_base62(num):
x,s = 1,0
for i in range(len(num)-1,-1,-1):
s = int(BASE62.index(num[i])) *x + s
x*=62
return s
print(encode_base62(123))
print(decode_base62("1Z"))