KOTH-加载的RPS


12

竞赛永久开放-2017年8月10日更新

即使在2017年6月5日,我宣布了一个获胜者(他将被保留为最佳答案),我仍将选择新的机器人并更新结果。

6月5日的结果

恭喜用户1502040

由于没有平局,我只显示获胜的百分比。

Statistician2-95.7%
Fitter-89.1%
Nash-83.9%
Weigher-79.9%
ExpectedBayes-76.4%
AntiRepeater-72.1%
Yggdrasil-65.0%
AntiGreedy-64.1%
Reactor-59.9%
NotHungry-57.3%
NashBot-55.1%
Blodsocer-48.6%
BestOfBothWorlds-48.4%
GoodWinning-43.9%
Rockstar-40.5%
ArtsyChild-40.4%
Assassin-38.1 %
WeightedRandom-37.7%
Ensemble-37.4%
UseOpponents-36.4%
GreedyPsychologist-36.3%
TheMessenger-33.9%
Copycat-31.4%
Greedy-28.3%
SomewhatHungry-27.6%
AntiAntiGreedy-21.0%
Cycler-20.3%
Swap-19.8%
RandomBot-16.2%

我使用每个配对结果的网格创建了一个Google表格:https : //docs.google.com/spreadsheets/d/1KrMvcvWMkK-h1Ee50w0gWLh_L6rCFOgLhTN_QlEXHyk/edit? usp =sharing


多亏了陪替氏困境,我发现自己能够应付这位山丘之王。

游戏

游戏是一个简单的“石头剪刀布”,带有扭曲:游戏中每次胜利都会增加点数(加载R,P或S)。

  • 纸胜摇滚
  • 剪刀胜纸
  • 摇滚胜剪刀

获胜者获得的积分与其在游戏中的负担一样多。

失败者的游戏负担增加1。

在平局的情况下,每个玩家的游戏负担增加0.5。

经过100场比赛,获胜者将获得更高分。

例如:P1的负载为[10,11,12](岩石,纸张,剪刀),P2的负载为[7,8,9]。P1玩R,P2玩P。P2获胜并获得8分。P1负载变为[11,11,12],P2负载保持不变。

挑战规格

您的程序必须使用Python编写(对不起,否则我不知道如何处理)。您将创建一个函数,在每次执行时将这些变量中的每一个作为参数:

my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history

points -当前点(您和您的opp)

loaded-带有负载的阵列(按RPS顺序排列)(您和您的opp)

history-包含所有播放的字符串,最后一个字符是最后一个播放(您和您的opp)

您必须返回"R""P""S"。如果您返回不同的内容,那将是比赛的自动失败。

规则

您无法更改内置功能。

测试中

我将使用代码和所有自动程序进行更新以保持Git更新:https : //github.com/Masclins/LoadedRPS

评判

胜者将由在1000次完整循环中选择胜出次数最多的人来决定。领带将被打的比赛打破。因为我期望很多随机性,所以要进行1000场比赛而不是一场比赛,这样一来随机性就不会那么重要了。

您最多可以提交5个漫游器。

在比赛结束七月 6月4日(这将是最后一天,我会接受任何的答案),并于七月 6月5日,我会发布最终stadings(可能会尝试之前发布的advancemnt)。


由于这是我的第一个KOTH,因此我100%愿意改变任何东西以进行改进,例如与每个机器人的比赛次数。

编辑了1000场比赛,因为我看到确实涉及相当多的随机性。


使用一些随机机器人,您实际上是想进行多个回合的多个游戏
Destructible Lemon

@DestructibleLemon我考虑过让每个机器人与对方bot进行三遍而不是一次。看到您有类似的想法,我会这样做。
Masclins'5

1
(您确实需要大量的信息,因为某些概率确实确实会影响多个比赛。请参阅我的机器人,该机器人可能会被删节,但可能不会进行大量的比赛)
破坏性柠檬

1
我很高兴我的问题帮助您能够运行此程序,@ AlbertMasclans!
狮ry

2
@AlbertMasclans您可以发布完整的测试脚本(包括runcodebots)吗?
CalculatorFeline

Answers:


8

统计员(不再玩)

import random
import collections

R, P, S = moves = range(3)
move_idx = {"R": R, "P": P, "S": S}
name = "RPS"
beat = (P, S, R)
beaten = (S, R, P)

def react(_0, _1, _2, _3, _4, opp_history):
    if not opp_history:
        return random.randrange(0, 3)
    return beat[opp_history[-1]]

def anti_react(_0, _1, _2, _3, _4, opp_history):
    if not opp_history:
        return random.randrange(0, 3)
    return beaten[opp_history[-1]]

def random_max(scores):
    scores = [s + random.normalvariate(0, 1) for s in scores]
    return scores.index(max(scores))

def greedy_margin(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    scores = [my_loaded[move] - opp_loaded[beat[move]] for move in moves]
    return random_max(scores)

def anti_greedy(my_points, opp_pints, my_loaded, opp_loaded, my_history, opp_history):
    scores = [-my_loaded[move] for move in moves]
    return random_max(scores)

def recent_stats(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    opp_history = opp_history[-10:-1]
    counts = collections.Counter(opp_history)
    scores = [(counts[beaten[move]] + 1) * my_loaded[move] - 
              (counts[beat[move]] + 1) * opp_loaded[move] for move in moves]
    return random_max(scores)

def statistician(_0, _1, _2, _3, my_history, opp_history):
    m1 = []
    o1 = []
    my_loaded = [0] * 3
    opp_loaded = [0] * 3
    my_points = 0
    opp_points = 0
    strategies = [react, anti_react, greedy_margin, anti_greedy, recent_stats]
    strategy_scores = [0 for _ in strategies]
    for i, (mx, ox) in enumerate(zip(my_history, opp_history)):
        mx = move_idx[mx]
        ox = move_idx[ox]
        for j, strategy in enumerate(strategies):
            strategy_scores[j] *= 0.98
            move = strategy(my_points, opp_points, my_loaded, opp_loaded, m1, o1)
            if move == beat[ox]:
                strategy_scores[j] += my_loaded[move]
            elif move == beaten[ox]:
                strategy_scores[j] -= opp_loaded[ox]
        m1.append(mx)
        o1.append(ox)
        if mx == beat[ox]:
            opp_loaded[ox] += 1
            my_points += my_loaded[mx]
        elif mx == beaten[ox]:
            my_loaded[mx] += 1
            opp_points += opp_loaded[ox]
        else:
            my_loaded[mx] += 0.5
            opp_loaded[ox] += 0.5
    strategy = strategies[random_max(strategy_scores)]
    return name[strategy(my_points, opp_points, my_loaded, opp_loaded, m1, o1)]

根据预期的过去表现在几种简单的策略之间切换

统计员2

import random
import collections
import numpy as np

R, P, S = moves = range(3)
move_idx = {"R": R, "P": P, "S": S}
names = "RPS"
beat = (P, S, R)
beaten = (S, R, P)

def react(my_loaded, opp_loaded, my_history, opp_history):
    if not opp_history:
        return random.randrange(0, 3)
    counts = [0, 0, 0]
    counts[beat[opp_history[-1]]] += 1
    return counts

def random_max(scores):
    scores = [s + random.normalvariate(0, 1) for s in scores]
    return scores.index(max(scores))

def argmax(scores):
    m = max(scores)
    return [s == m for s in scores]

def greedy_margin(my_loaded, opp_loaded, my_history, opp_history):
    scores = [my_loaded[move] - opp_loaded[beat[move]] for move in moves]
    return argmax(scores)

recent_counts = None

def best_move(counts, my_loaded, opp_loaded):
    scores = [(counts[beaten[move]] + 0.5) * my_loaded[move] - 
              (counts[beat[move]] + 0.5) * opp_loaded[move] for move in moves]
    return argmax(scores)

def recent_stats(my_loaded, opp_loaded, my_history, opp_history):
    if len(opp_history) >= 10:
        recent_counts[opp_history[-10]] -= 1
    recent_counts[opp_history[-1]] += 1
    return best_move(recent_counts, my_loaded, opp_loaded)

order2_counts = None

def order2(my_loaded, opp_loaded, my_history, opp_history):
    if len(my_history) >= 2:
        base0 = 9 * my_history[-2] + 3 * opp_history[-2]
        order2_counts[base0 + opp_history[-1]] += 1
    base1 = 9 * my_history[-1] + 3 * opp_history[-1]
    counts = [order2_counts[base1 + move] for move in moves]
    return best_move(counts, my_loaded, opp_loaded)

def nash(my_loaded, opp_loaded, my_history, opp_history):
    third = 1.0 / 3
    p = np.full(3, third)
    q = np.full(3, third)
    u = np.array(my_loaded)
    v = np.array(opp_loaded)
    m0 = np.zeros(3)
    m1 = np.zeros(3)
    lr = 0.2
    for _ in range(10):
        de0 = u * np.roll(q, 1) - np.roll(v * q, 2)
        de1 = v * np.roll(p, 1) - np.roll(u * p, 2)
        m0 = 0.9 * m0 + 0.1 * de0
        m1 = 0.9 * m1 + 0.1 * de1
        p += lr * m0
        q += lr * m1
        p[p < 0] = 0
        q[q < 0] = 0
        tp, tq = np.sum(p), np.sum(q)
        if tp == 0 or tq == 0:
            return np.full(3, third)
        p /= tp
        q /= tq
        lr *= 0.9
    return p

strategies = [react, greedy_margin, recent_stats, order2, nash]

predictions = strategy_scores = mh = oh = None

def statistician2func(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    global strategy_scores, history, recent_counts, mh, oh, predictions, order2_counts
    if not opp_history:
        strategy_scores = [0 for _ in strategies]
        recent_counts = collections.Counter()
        order2_counts = collections.Counter()
        mh, oh = [], []
        predictions = None
        return random.choice(names)
    my_move = move_idx[my_history[-1]]
    opp_move = move_idx[opp_history[-1]]
    if predictions is not None:
        for j, p in enumerate(predictions):
            good = beat[opp_move]
            bad = beaten[opp_move]
            strategy_scores[j] += (my_loaded[good] * p[good] - opp_loaded[opp_move] * p[bad]) / sum(p)
    mh.append(my_move)
    oh.append(opp_move)
    predictions = [strategy(my_loaded, opp_loaded, mh, oh) for strategy in strategies]
    strategy = random_max(strategy_scores)
    p = predictions[strategy]
    r = random.random()
    for i, pi in enumerate(p):
        r -= pi
        if r <= 0:
            break
    return names[i]

纳什

import numpy as np
import random

def nashfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    third = 1.0 / 3
    p = np.full(3, third)
    q = np.full(3, third)
    u = np.array(my_loaded)
    v = np.array(opp_loaded)
    m0 = np.zeros(3)
    m1 = np.zeros(3)
    lr = 0.2
    for _ in range(10):
        de0 = u * np.roll(q, 1) - np.roll(v * q, 2)
        de1 = v * np.roll(p, 1) - np.roll(u * p, 2)
        m0 = 0.9 * m0 + 0.1 * de0
        m1 = 0.9 * m1 + 0.1 * de1
        p += lr * m0
        q += lr * m1
        p[p < 0] = 0
        q[q < 0] = 0
        tp, tq = np.sum(p), np.sum(q)
        if tp == 0 or tq == 0:
            return random.choice("RPS")
        p /= tp
        q /= tq
        lr *= 0.9
    r = random.random()
    for i, pi in enumerate(p):
        r -= pi
        if r <= 0:
            break
    return "RPS"[i]

通过梯度下降计算近似Nash平衡。


1
我真的很喜欢这种方法,并且可以理解为什么您希望能够在轮回之间保持状态。尽管鉴于提交的数量,我认为更改它是一个巨大的问题。我将考虑到进一步的挑战(我希望完成这些挑战时可以做到)。
Masclins'5

5

衡器

我在尝试代码时失去了推理的能力,但基本思路是使用一些权重来估计对手最后3次移动的移动概率,然后再乘以另一个取决于负载的权重。我以为我也可以使用my_loaded,但我无法决定如何使用,因此将其省略。

def weigher(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    idx = {"R": 0, "P": 1, "S": 2}
    sc = [0, 0, 0]
    for i, m in enumerate(reversed(opp_history[-3:])):
        sc[idx[m]] += (1 / (1 + i))

    for i in range(3):
        sc[i] *= (opp_loaded[i] ** 2)

    return "PSR"[sc.index(max(sc))]

撒但

可能会被取消资格,因为它是一种作弊,并且对测试功能进行了一些假设(它必须在其堆栈框架的变量中具有对手的功能),但是从技术上讲,它不会破坏任何当前规则—它不会重新定义或重写任何内容。它只是使用黑魔法来执行对手的功能,以查看他们做了/将要做的回合。它不能处理随机性,但是确定性机器人没有机会击败撒旦。

def satan(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    import inspect, types
    f = inspect.currentframe()
    s = f.f_code.co_name
    try:
        for v in f.f_back.f_locals.values():
            if isinstance(v, types.FunctionType) and v.__name__ != s:
                try:
                    return "PSR"[{"R": 0, "P": 1, "S": 2}[
                        v(opp_points, my_points, opp_loaded, my_loaded, opp_history, my_history)]]
                except:
                    continue
    finally:
        del f

毫无疑问,就简单性结果而言,最好的是
Masclins

顺便说一句,要使用该工具,my_loaded您可以添加权重,该权重将根据您的上一举动而损失的举动。这就像假设您的对手会做与您所做的类似的事情,因此以假设您会继续玩同样的方式惩罚他。像这样的东西:for i, m in enumerate(reversed(my_history[-3:])): sc[(idx[m]+1)%3] += (K / (1 + i))
Masclins

@AlbertMasclans添加了另一个解决方案
显示名称为

1
我真的很喜欢撒旦。但是正如您所说,我认为它不应该符合条件:即使它没有违反任何明确的规则,也显然违反了游戏的精神。不过,还是恭喜您!
Masclins '17

4

钳工

该机器人改进了模式,并将其与经济学人融合(模式和经济学人将不再参与)

模式的改进在于,机器人现在寻找两种两种模式:对手对他的最后一局做出反应,对手对我的最后一局做出反应。然后评估两个预测以使用最适合的预测。

从这种模式来看,机器人现在有R,P和S的可能性。考虑到这一点以及每场比赛的预期价值(如《经济学人》所做的那样),机器人发挥了最大的价值。

import random
import numpy as np
def fitterfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
        t = len(opp_history)
        RPS = ["R","P","S"]
        if t <= 2:
                return RPS[t]
        elif t == 3:
                return random.choice(RPS)

        def n(c): return RPS.index(c)

        total_me = np.zeros(shape=(3,3))
        total_opp= np.zeros(shape=(3,3))
        p_me = np.array([[1/3]*3]*3)
        p_opp = np.array([[1/3]*3]*3)

        for i in range(1, t):
                total_me[n(my_history[i-1]), n(opp_history[i])] += 1
                total_opp[n(opp_history[i-1]), n(opp_history[i])] += 1
        for i in range(3):
                if np.sum(total_me[i,:]) != 0:
                        p_me[i,:] = total_me[i,:] / np.sum(total_me[i,:])
                if np.sum(total_opp[i,:]) != 0:
                        p_opp[i,:] = total_opp[i,:] / np.sum(total_opp[i,:])

        error_me = 0
        error_opp = 0

        for i in range(1, t):
                diff = 1 - p_me[n(my_history[i-1]), n(opp_history[i])]
                error_me += diff * diff
                diff = 1 - p_opp[n(opp_history[i-1]), n(opp_history[i])]
                error_opp += diff * diff

        if error_me < error_opp:
                p = p_me[n(my_history[-1]),:]
        else:
                p = p_opp[n(opp_history[-1]),:]


# From here, right now I weight values, though not 100% is the best idea, I leave the alternative in case I'd feel like changing it
        value = [(p[2]*my_loaded[0] - p[1]*opp_loaded[1], "R"), (p[0]*my_loaded[1] - p[2]*opp_loaded[2], "P"), (p[1]*my_loaded[2] - p[0]*opp_loaded[0], "S")]
        value.sort()

        if value[-1][0] > value[-2][0]:
                return value[-1][1]
        elif value[-1][0] > value[-3][0]:
                return random.choice([value[-1][1], value[-2][1]])
        else:
                return random.choice(RPS)

#       idx = p.tolist().index(max(p))
#       return ["P", "S", "R"][idx]

这是两个旧代码

模式(不再播放)

模式试图寻找对手的模式。看起来对手在上一局打完之后的表现(对后一局给予了更大的重视)。通过这种方式,它可以猜测对手将要打什么,并对此进行反击。

import random
import numpy as np
def patternfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
        if len(opp_history) == 0:
                return random.choice(["R","P","S"])
        elif len(opp_history) == 1:
                if opp_history == "R":
                        return "P"
                elif opp_history == "P":
                        return "S"
                elif opp_history == "S":
                        return "R"

        p = np.array([1/3]*3)
        c = opp_history[-1]
        for i in range(1, len(opp_history)):
                c0 = opp_history[i-1]
                c1 = opp_history[i]
                if c0 == c:
                        p *= .9
                        if c1 == "R":
                                p[0] += .1
                        elif c1 == "P":
                                p[1] += .1
                        elif c1 == "S":
                                p[2] += .1

        idx = p.tolist().index(max(p))
        return ["P", "S", "R"][idx]

经济学家(不再玩)

《经济学人》做到了以下几点:通过观察对手最近9圈的比赛情况来猜测对手每场比赛的机率。从中计算出每个游戏的预期收益,并与具有最高预期价值的那一个进行比较。

import random
def economistfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
        if len(opp_history) == 0:
                return random.choice(["R","P","S"])
        if len(opp_history) > 9:
                opp_history = opp_history[-10:-1]
        p = [opp_history.count("R"), opp_history.count("P"), opp_history.count("S")]

        value = [(p[2]*my_loaded[0] - p[1]*opp_loaded[1], "R"), (p[0]*my_loaded[1] - p[2]*opp_loaded[2], "P"), (p[1]*my_loaded[2] - p[0]*opp_loaded[0], "S")]
        value.sort()

        if value[-1][0] > value[-2][0]:
                return value[-1][1]
        elif value[-1][0] > value[-3][0]:
                return random.choice([value[-1][1], value[-2][1]])
        else:
                return random.choice(["R","P","S"])

4

伊格德拉西尔

之所以命名为“ Yggdrasil”,是因为它在游戏树中向前看。该机器人不会对对手进行任何预测,它只是试图保持统计优势(如果获得了优势)(通过平衡当前和未来的利润)。它计算一个近似理想的混合策略,并返回使用这些权重随机选择的移动。如果该机器人是完美的(不是这样,因为状态评估功能非常差,而且看起来并不遥远),那么不可能超过50%的时间击败该机器人。我不知道该机器人在实践中的表现如何。

def yggdrasil(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    cache = {}
    def get(turn, ml, ol):
        key = str(turn) + str(ml) + str(ol)
        if not key in cache:
            cache[key] = State(turn, ml, ol)
        return cache[key]

    def wrand(opts):
        total = sum(abs(w) for c,w in opts.items())
        while True:
            r = random.uniform(0, total)
            for c, w in opts.items():
                r -= abs(w)
                if r < 0:
                    return c
            print("error",total,r)

    class State():
        turn = 0
        ml = [1,1,1]
        ol = [1,1,1]
        val = 0
        strat = [1/3, 1/3, 1/3]
        depth = -1
        R = 0
        P = 1
        S = 2
        eps = 0.0001
        maxturn = 1000

        def __init__(self, turn, ml, ol):
            self.turn = turn
            self.ml = ml
            self.ol = ol
        def calcval(self, depth):
            if depth <= self.depth:
                return self.val
            if turn >= 1000:
                return 0
            a = 0
            b = -self.ol[P]
            c = self.ml[R]
            d = self.ml[P]
            e = 0
            f = -self.ol[S]
            g = -self.ol[R]
            h = self.ml[S]
            i = 0
            if depth > 0:
                a += get(self.turn+1,[self.ml[R]+1,self.ml[P],self.ml[S]],[self.ol[R]+1,self.ol[P],self.ol[S]]).calcval(depth-1)
                b += get(self.turn+1,[self.ml[R]+2,self.ml[P],self.ml[S]],[self.ol[R],self.ol[P],self.ol[S]]).calcval(depth-1)
                c += get(self.turn+1,[self.ml[R],self.ml[P],self.ml[S]],[self.ol[R],self.ol[P],self.ol[S]+2]).calcval(depth-1)
                d += get(self.turn+1,[self.ml[R],self.ml[P],self.ml[S]],[self.ol[R]+2,self.ol[P],self.ol[S]]).calcval(depth-1)
                e += get(self.turn+1,[self.ml[R],self.ml[P]+1,self.ml[S]],[self.ol[R],self.ol[P]+1,self.ol[S]]).calcval(depth-1)
                f += get(self.turn+1,[self.ml[R],self.ml[P]+2,self.ml[S]],[self.ol[R],self.ol[P],self.ol[S]]).calcval(depth-1)
                g += get(self.turn+1,[self.ml[R],self.ml[P],self.ml[S]+2],[self.ol[R],self.ol[P],self.ol[S]]).calcval(depth-1)
                h += get(self.turn+1,[self.ml[R],self.ml[P],self.ml[S]],[self.ol[R],self.ol[P]+2,self.ol[S]]).calcval(depth-1)
                i += get(self.turn+1,[self.ml[R],self.ml[P],self.ml[S]+1],[self.ol[R],self.ol[P],self.ol[S]+1]).calcval(depth-1)
            self.val = -9223372036854775808
            for pr in range(0,7):
                for pp in range(0,7-pr):
                    ps = 6-pr-pp
                    thisval = min([pr*a+pp*d+ps*g,pr*b+pp*e+ps*h,pr*c+pp*f+ps*i])
                    if thisval > self.val:
                        self.strat = [pr,pp,ps]
                        self.val = thisval
            self.val /= 6


            if depth == 0:
                self.val *= min(self.val, self.maxturn - self.turn)
            return self.val

    turn = len(my_history)
    teststate = get(turn, [x * 2 for x in my_loaded], [x * 2 for x in opp_loaded])
    teststate.calcval(1)
    return wrand({"R":teststate.strat[R],"P":teststate.strat[P],"S":teststate.strat[S]})

请删除不会使代码更易于理解的注释
显示名称

@SargeBorsch完成
PhiNotPi

1
@PhiNotPi我知道我没有发布时间限制,但是Yggdrasil对每个对手花费了超过一分钟的时间。可以优化一下吗?
Masclins,2017年

是的,它的运行速度令人难以忍受
显示名称为

@AlbertMasclans每分钟对手的分钟数,您是说对阵对手的所有比赛总共1分钟吗?我也可以尝试加快速度,但我真的不知道该怎么做,它看起来仅前进了1步。
PhiNotPi

4

反中继器

from random import choice
def Antirepeaterfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    s = opp_history.count("S")
    r = opp_history.count("R")
    p = opp_history.count("P")

    if s>p and s>r:
        return "R"
    elif p>s and p>r:
        return "S"
    else:
        return "P"

在第一回合中拾取纸张,此后它会返回对手所做最多的拍子,并在出现平局的情况下进行拾取。

山寨

import random
def copycatfunc(I,dont,care,about,these,enmoves):
    if not enmoves:
        return random.choice(["R","P","S"])
    else:
        return enmoves[len(enmoves)-1]

只需复制对手的最后一招。

反贪婪

from random import choice
def antiantigreedy(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    if opp_loaded[0] > opp_loaded[1] and opp_loaded[0] > opp_loaded[2]:
        return "S"
    if opp_loaded[1] > opp_loaded[0] and opp_loaded[1] > opp_loaded[2]:
        return "R"
    if opp_loaded[2] > opp_loaded[0] and opp_loaded[2] > opp_loaded[1]:
        return "P"
    else:
        return choice(["R","P","S"])

选择输给对手最大权重选择的任何东西。

有点饿

from random import choice
def somewhathungryfunc(blah, blah2, load, blah3, blah4, blah5):
    if load[0] > load[1] and load[0] < load[2] or load[0] < load[1] and load[0] > load[2]:
        return "R"
    if load[1] > load[0] and load[1] < load[2] or load[1] < load[0] and load[1] > load[2]:
        return "P"
    if load[2] > load[1] and load[2] < load[0] or load[2] < load[1] and load[2] > load[0]:
        return "S"
    else:
        return choice(["R","P","S"])

3

传信人

def themessengerfunc(I,do,not,need,these,arguments):返回“ P”

摇滚明星

def rockstarfunc(I,do,not,need,these,arguments):返回“ R”

刺客

def assassinfunc(I,do,not,need,these,arguments):返回“ S”

说明

现在,您可能会认为这些机器人完全是愚蠢的。

并非完全正确,这些实际上是基于思想,积累巨额奖金,而敌人却走错了脚步并因此而陷入困境。

现在,这些漫游器的运行方式与贪婪非常相似,但是它们更简单,并且只有在加载一种武器后才会随机选择,并且坚持使用自己选择的武器。

还要注意的另一件事是:它们每个都会在一半的时间内击败贪婪,消耗三分之一的时间,而失去六分之一的时间。当他们获胜时,他们往往会赢得很多。为什么是这样?

直到他输掉一轮贪婪,他才会随机选择武器。这意味着当他没有赢得一回合时,他将再次随机选择一种武器,这可能又是一次胜利。如果贪婪吸引或输掉,他会坚持使用那种武器。如果贪婪至少赢得一轮,则选择与机器人相同的武器,贪婪获胜。如果贪婪在某个时候选择了失败的武器,我们的机器人将获胜,因为我们武器上的负载将比贪婪的得分更高。

假设贪婪并不总是仅凭大机会来挑选获胜的武器,这意味着机会是:

1/3:{1/2胜(1/6总数)。1/2输(总共1/6)。}

1/3平局

1/3胜

所以:抽奖的机会是1/3,抽奖的机会是1/6,获胜的机会是1/2。

这可能表明您需要进行多个回合的多个游戏

这些主要是为了使挑战滚动


3

反应堆

进行上一场比赛的胜利。

import random
def reactfunc(I, dont, need, all, these, opp_history):
    if not opp_history:
        return random.choice(["R","P","S"])
    else:
        prev=opp_history[len(opp_history)-1]
        if prev == "R":
            return "P"
        if prev == "P":
            return "S"
        else:
            return "R"

1
您可以替换opp_history[len(opp_history)-1]opp_history[-1]
CalculatorFeline

3

附庸风雅的孩子

该机器人的行为就像孩子在玩手工艺品一样,将从纸开始,随机使用纸或剪刀,但由于需要在纸上使用剪刀,因此不会在石头或剪刀之后使用剪刀。会向任何向她扔石头的人扔石头。

import random
def artsychildfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    if len(opp_history) == 0:
            return "P"
    elif opp_history[-1] == "R":
            return "R"
    elif my_history[-1] != "P":
            return "P"
    else:
            return random.choice(["P", "S"])

2

这是我为测试而构建的三个Bot:


RandomBot

import random
def randombotfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
        return random.choice(["R","P","S"])

贪婪

只需选择他最忙的选项。

import random
def greedyfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
        if my_loaded[0] > my_loaded[1]:
                if my_loaded[0] > my_loaded[2]:
                        return "R"
                elif my_loaded[0] < my_loaded[2]:
                        return "S"
                else:
                        return random.choice(["R","S"])
        elif my_loaded[0] < my_loaded[1]:
                if my_loaded[1] > my_loaded[2]:
                        return "P"
                elif my_loaded[1] < my_loaded[2]:
                        return "S"
                else:
                        return random.choice(["P","S"])
        else:
                if my_loaded[0] > my_loaded[2]:
                        return random.choice(["R","P"])
                elif my_loaded[0] < my_loaded[2]:
                        return "S"
                else:
                        return random.choice(["R","P","S"])

反贪婪

假设对手将扮演贪婪并扮演获胜者。

import random
def antigreedyfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
        if opp_loaded[0] > opp_loaded[1]:
                if opp_loaded[0] > opp_loaded[2]:
                        return "P"
                elif opp_loaded[0] < opp_loaded[2]:
                        return "R"
                else:
                        return "R"
        elif opp_loaded[0] < opp_loaded[1]:
                if opp_loaded[1] > opp_loaded[2]:
                        return "S"
                elif opp_loaded[1] < opp_loaded[2]:
                        return "R"
                else:
                        return "S"
        else:
                if opp_loaded[0] > opp_loaded[2]:
                        return "P"
                elif opp_loaded[0] < opp_loaded[2]:
                        return "R"
                else:
                        return random.choice(["R","P","S"])

1

不饿

def nothungryfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    if my_loaded[0] < my_loaded[1]:
            if my_loaded[0] < my_loaded[2]:
                    return "R"
            elif my_loaded[0] > my_loaded[2]:
                    return "S"
            else:
                    return random.choice(["R","S"])
    elif my_loaded[0] > my_loaded[1]:
            if my_loaded[1] < my_loaded[2]:
                    return "P"
            elif my_loaded[1] > my_loaded[2]:
                    return "S"
            else:
                    return random.choice(["P","S"])
    else:
            if my_loaded[0] < my_loaded[2]:
                    return random.choice(["R","P"])
            elif my_loaded[0] > my_loaded[2]:
                    return "S"
            else:
                    return random.choice(["R","P","S"])

从字面上看,这是贪婪的反面,它选择了可用的最低点选项。


1

使用对手的最爱

from collections import Counter
import random
def useopponents(hi, my, name, is, stephen, opp_history):
  if opp_history:
    data = Counter(opp_history)
    return data.most_common(1)[0][0]
  else:
    return random.choice(["R","P","S"])

对于第一回合,请选择一个随机项目。每隔一回合,使用对手最常见的选择。如果有平局,则默认为最早的最常见选择。

//我从这里偷了代码


胜利是好事

import random
def goodwinning(no, yes, maybe, so, my_history, opp_history):
  if opp_history:
    me = my_history[len(my_history)-1]
    you = opp_history[len(opp_history)-1]
    if you == me:
      return goodwinning(no, yes, maybe, so, my_history[:-1], opp_history[:-1])
    else:
      if me == "R":
        if you == "P":
          return "P"
        else:
          return "R"
      elif me == "P":
        if you == "S":
          return "S"
        else:
          return "R"
      else:
        if you == "R":
          return "R"
        else:
          return "P"
  else:
    return random.choice(["R","P","S"])

返回上一轮获胜者的选择。如果上一轮是平局,则在此之前递归检查一轮。如果只是平局,或者是第一轮,则返回随机选择。


1

两全其美

该机器人基本上结合了Anti-Greedy和Greedy(因此得名)。

def bobwfunc(a, b, my_loaded, opp_loaded, c, d):
    opp_max = max(opp_loaded)
    opp_play = "PSR"[opp_loaded.index(opp_max)]

    my_max = max(my_loaded)
    my_play = "RPS"[my_loaded.index(my_max)]

    if opp_play == my_play:
        return opp_play
    else:
        return my_play if opp_max < my_max else opp_play

这是“反贪婪”,已经作为示例发布了。
Masclins

@AlbertMasclans将其更改为另一个机器人。
clismique

find用于字符串。my_loadedopp_loaded都是列表。index应该适合您想要的东西。
Masclins

@AlbertMasclans糟糕,现在已修复。感谢您的接见!我希望这不是另一个重复...我不想再次删除此帖子。
clismique

没关系,感谢您的参与
Masclins

1

NashBot

import random
def nashbotfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    r = opp_loaded[0] * opp_loaded[2]
    p = opp_loaded[0] * opp_loaded[1]
    s = opp_loaded[1] * opp_loaded[2]
    q = random.uniform(0, r + p + s) - r
    return "R" if q < 0 else "P" if q < p else "S"

在三个选项之间随机选择,以使对手在统计上对得分多少没有偏好。换句话说,贪婪和不饥饿都应该具有相同的平均预期得分。


1

预期贝叶斯

编辑:更新排名

这是列入ExpectedBayes之后的新最高排名:

  • statistician2func 91.89%
  • fitterfunc 85.65%
  • 纳什芬奇80.40%
  • 磅秤76.39%
  • 预期贝叶斯芬奇73.33%
  • 抗重复性68.52%
  • ...

说明

(NB:2017年5月6日发布)

该机器人试图通过以下方法最大程度地提高下一步行动的预期价值:

  • 计算对手下一个动作的概率
  • 使用该图和载荷来计算R,P和S的期望值
  • 选择期望值最高的动作
  • 如果预测失败,则随机选择一个值

概率每十步更新一次。每个漫游器用于计算概率的过去移动次数已设置为10(因此总共有20个功能)。这可能是过度拟合的数据,但我没有尝试进一步检查。

它依靠scikit库来计算对手的移动概率(我说的是万一我误读了规则,实际上是不允许的)。

它很容易击败总是做出相同选择的机器人。出乎意料的是,它以93%的胜率对随机机器人非常有效(我相信这是因为它限制了对手可以获得的积分数量,同时又使每个回合最大化了自己的积分数量)。

我快速尝试了100转,只使用了有限数量的机器人,这就是我从result_standing获得的结果:

  • 35
  • 第333章
  • 172第175章
  • 第491章491
  • 第298章
  • Rockstarfunc 200
  • 748第2章
  • 第656章
  • 预期的bayesfunc 601

哪个还不错!

from sklearn.naive_bayes import MultinomialNB
import random

#Number of past moves used to compute the probability of next move
#I did not really try to make such thing as a cross-validation, so this number is purely random
n_data = 10

#Some useful data structures
choices = ['R','P','S']
choices_dic = {'R':0,'P':1,'S':2}
point_dic = {(0,0):0,(1,1):0,(2,2):0, #Same choices
             (0,1):-1,(0,2):1, #me = rock
             (1,0):1,(1,2):-1, #me = paper
             (2,0):-1,(2,1):1} #me = scissor

def compute_points(my_choice,opp_choice,my_load,opp_load):
    """
    Compute points
    @param my_choice My move as an integer
    @param opp_choice Opponent choice as an integer
    @param my_load my_load array
    @param opp_load opp_load array
    @return A signed integer (+ = points earned, - = points losed)
    """
    points = point_dic[(my_choice,opp_choice)] #Get -1, 0 or 1
    if points > 0:
        return points*my_load[my_choice] 
    else:
        return points*opp_load[opp_choice]

#This use to be a decision tree, before I changed it to something else. Nevertheless, I kept the name
class Decision_tree:
    def __init__(self):
        self.dataX = []
        self.dataY = []
        self.clf = MultinomialNB()

    def decide(self,my_load,opp_load,my_history,opp_history):
        """
        Returns the decision as an integer

        Done through a try (if a prediction could be made) except (if not possible)
        """
        try:
            #Let's try to predict the next move
            my_h = list(map(lambda x: choices_dic[x],my_history[-n_data:-1]))
            opp_h = list(map(lambda x: choices_dic[x],opp_history[-n_data:-1]))
            pred = self.clf.predict_proba([my_h+opp_h])
            #We create a points array where keys are the available choices
            pts = []
            for i in range(3):
                #We compute the expected gain/loss for each choice
                tmp = 0
                for j in range(3):
                    tmp += compute_points(i,j,my_load,opp_load)*pred[0][j]
                pts.append(tmp)
            return pts.index(max(pts)) #We return key for the highest expected value
        except:
            return random.choice(range(3))

    def append_data(self,my_history,opp_history):
        if my_history == "":
            self.clf = MultinomialNB()
        elif len(my_history) < n_data:
            pass
        else:
            my_h = list(map(lambda x: choices_dic[x],my_history[-n_data:-1]))
            opp_h = list(map(lambda x: choices_dic[x],opp_history[-n_data:-1]))
            self.dataX = self.dataX + [my_h+opp_h]
            self.dataY = self.dataY + [choices_dic[opp_history[-1:]]]

            if len(self.dataX) >= 10:
                self.clf.partial_fit(self.dataX,self.dataY,classes=[0,1,2])

                self.dataX = []
                self.dataY = []


#Once again, this is not actually a decision tree
dt = Decision_tree()

#There we go:
def expectedbayesfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    dt.append_data(my_history,opp_history)
    choice = choices[dt.decide(my_loaded,opp_loaded,my_history,opp_history)]
    return choice

欢迎来到PPCG,也不错!
扎卡里

非常感谢!我很想参加PPCG。现在已修复!
lesibius

0

循环器

def cycler(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    return "RPS"[len(myhistory)%3]

0


0

合奏

from random import *
def f(I):
    if I==0:return "R"
    if I==1:return "P"
    return "S"
def b(I):
    if I=="R":return 0
    if I=="P":return 1
    return 2
def Ensemble(mp,op,ml,ol,mh,oh):
    A=[0]*3
    B=[0]*3
    if(len(oh)):
        k=b(oh[-1])
        A[k-2]+=0.84
        A[k]+=0.29
        for x in range(len(oh)):
            g=b(oh[x])
            B[g-2]+=0.82
            B[g]+=0.22
        s=sum(B)
        for x in range(len(B)):
            A[x]+=(B[x]*1.04/s)
        r=max(A)
    else:
        r=randint(0,3)
    return f(r)

几种竞争算法对最佳解决方案进行投票。

交换

from random import *
def f(I):
    if I==0:return "R"
    if I==1:return "P"
    return "S"
def b(I):
    if I=="R":return 0
    if I=="P":return 1
    return 2
def Swap(mp,op,ml,ol,mh,oh):
    A=[0]*3
    B=[0]*3
    if(len(mh)):
        r=(b(mh[-1])+randint(1,2))%3
    else:
        r=randint(0,3)
    return f(r)

随机移动一次,但不重复最后一次移动。


0

血癌

求助

我修复了它,所以我希望它现在应该可以工作

我又搞砸了,所以删除并取消删除。我搞砸了很多。

def blodsocerfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    import random
    # tuned up an ready to go hopeful
    # s o c e r y
    if len(my_history) > 40 and len(set(opp_history[-30:])) == 1:
        if opp_history[-1] == "S":
            return "R"
        elif opp_history[-1] == "R":
            return "P"
        else:
            return "S"
        # against confused bots that only do one thing most of the time.
    elif len(my_history)>30 and min(opp_history.count(i) for i in "RPS")/max(opp_history.count(i) for i in "RPS") >0.8:
        return "RPS"[my_loaded.index(max(my_loaded))] # This is so if the other bot is acting errratic
                                                      # the max bonus is used for advantage
    elif len(my_history) < 10:
        if len(my_history) > 2 and all(i == "S" for i in opp_history[1:]):
            if len(my_history) > 5: return "S"
            return "P"
        return "S" # Be careful, because scissors are SHARP
    elif len(set(opp_history[1:10])) == 1 and len(my_history) < 20:
        if opp_history[1] == "S":
            return "R"
        elif opp_history[1] == "R":
            return "R"
        else:
            return "P"
    elif len(opp_history) -  max(opp_history.count(i) for i in "RPS") < 4 and len(my_history) < 30:
        if opp_history.count("R") > max(opp_history.count(i) for i in "PS"):
            return "P"
        if opp_history.count("P") > max(opp_history.count(i) for i in "RS"):
            return "S"
        if opp_history.count("S") > max(opp_history.count(i) for i in "RP"):
            return "R"
    elif len(my_history) < 15:
        if max(opp_loaded)<max(my_loaded):
            return "RPS"[len(my_history)%3]
        else:
            return "RPS"[(my_loaded.index(max(my_loaded))+len(my_history)%2)%3]
    elif len(my_history) == 15:
        if max(opp_loaded)<max(my_loaded):
            return "RPS"[(len(my_history)+1)%3]
        else:
            return "RPS"[(my_loaded.index(max(my_loaded))+ (len(my_history)%2)^1)%3]
    else:
        if max(opp_loaded)<max(my_loaded):
            return random.choice("RPS")
        else:
            return "RPS"[(my_loaded.index(max(my_loaded))+ (random.randint(0,1)))%3]

1
if opp_history[1] == "S": return "R" elif opp_history[1] == "R": return "R" else: return "P"这是什么样的律师?
罗伯特·弗雷泽

@DestructibleLemon这被0除:elif min(opp_history.count(i) for i in "RPS")/max(opp_history.count(i) for i in "RPS") >0.8 and len(my_history)>30:
Masclins

@AlbertMasclans我已修复该问题。
破坏的柠檬

@RobertFraser该代码段到底有什么突出之处?
破坏的柠檬

@DestructibleLemon我不确定您要在这里做什么:"RPS"[my_loaded.index(max(my_loaded))+len(my_history)%2]但是它看起来超出范围(其他行也是如此)。
Masclins,2017年

0

加权随机

像RandomBot一样,但是每次调用它只会选择2个引发。有时会击败Rockstar或Assassin,但会提高对方的得分(例如,如果击败Rockstar,它将使刺客获得积分提升)。

import random

selection_set = ["R", "P", "S"]
selection_set.pop(random.randint(0,2))
def weightedrandombotfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    return random.choice(selection_set)

0

贪婪的心理学家

之所以这样命名,是因为它默认为贪婪,但如果无法决定,它会反击对手使用贪婪策略时会做什么。如果仍然无法决定,它将随机进行。

from random import choice

def greedypsychologistfunc(my_points, opp_points, my_loaded, opp_loaded, my_history, opp_history):
    greedy = get_my_move(my_loaded)
    combined = list(set(greedy) & set(get_opp_counter(opp_loaded)))

    if len(combined) == 0:
        return choice(greedy)
    return choice(combined)

def get_indexes(lst, value):
    return [i for i,x in enumerate(lst) if x == value]

def get_my_move(my_loaded):
    return ["RPS"[i] for i in get_indexes(my_loaded, max(my_loaded))]

def get_opp_counter(opp_loaded):
    return ["PSR"[i] for i in get_indexes(opp_loaded, max(opp_loaded))]
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.