这可以通过O(n log(n))
使用树来实现。
首先,创建树,在每个节点的右侧和左侧保留所有后代节点的累积总和。
要对项目进行采样,请从根节点递归采样,使用累积和来确定是返回当前节点,左侧的节点还是右侧的节点。每次对节点采样时,将其权重设置为零,并更新父节点。
这是我在Python中的实现:
import random
def weigthed_shuffle(items, weights):
if len(items) != len(weights):
raise ValueError("Unequal lengths")
n = len(items)
nodes = [None for _ in range(n)]
def left_index(i):
return 2 * i + 1
def right_index(i):
return 2 * i + 2
def total_weight(i=0):
if i >= n:
return 0
this_weigth = weights[i]
if this_weigth <= 0:
raise ValueError("Weigth can't be zero or negative")
left_weigth = total_weight(left_index(i))
right_weigth = total_weight(right_index(i))
nodes[i] = [this_weigth, left_weigth, right_weigth]
return this_weigth + left_weigth + right_weigth
def sample(i=0):
this_w, left_w, right_w = nodes[i]
total = this_w + left_w + right_w
r = total * random.random()
if r < this_w:
nodes[i][0] = 0
return i
elif r < this_w + left_w:
chosen = sample(left_index(i))
nodes[i][1] -= weights[chosen]
return chosen
else:
chosen = sample(right_index(i))
nodes[i][2] -= weights[chosen]
return chosen
total_weight() # build nodes tree
return (items[sample()] for _ in range(n - 1))
用法:
In [2]: items = list(range(10))
...: weights = list(range(10, 0, -1))
...:
In [3]: for _ in range(10):
...: print(list(weigthed_shuffle(items, weights)))
...:
[5, 0, 8, 6, 7, 2, 3, 1, 4]
[1, 2, 5, 7, 3, 6, 9, 0, 4]
[1, 0, 2, 6, 8, 3, 7, 5, 4]
[4, 6, 8, 1, 2, 0, 3, 9, 7]
[3, 5, 1, 0, 4, 7, 2, 6, 8]
[3, 7, 1, 2, 0, 5, 6, 4, 8]
[1, 4, 8, 2, 6, 3, 0, 9, 5]
[3, 5, 0, 4, 2, 6, 1, 8, 9]
[6, 3, 5, 0, 1, 2, 4, 8, 7]
[4, 1, 2, 0, 3, 8, 6, 5, 7]
weigthed_shuffle
是一个生成器,因此您可以k
有效地对热门商品进行抽样。如果要改组整个数组,只需遍历生成器,直到耗尽为止(使用list
函数)。
更新:
加权随机采样(2005年; Efraimidis,Spirakis)为此提供了一种非常优雅的算法。该实现非常简单,并且可以在中运行O(n log(n))
:
def weigthed_shuffle(items, weights):
order = sorted(range(len(items)), key=lambda i: -random.random() ** (1.0 / weights[i]))
return [items[i] for i in order]