诚实的岩石,纸,剪刀


58

许多人认为RPS是一种机会游戏。如果两个玩家的比赛都变幻莫测,则最佳策略是随机比赛。但是,让我们介绍一下它的可预测性。

每个漫游器都有机会告诉另一个漫游器同时播放的内容。然后是一个停顿,每个机器人都将知道另一个玩家宣布了什么。如果它使用该武器,它会宣布除了赢球或平局的得分外,还将获得1分。

获胜者可得2分,平局,1分和失0分。

     Honest Bot       Dishonest
Win     3                  2
Draw    2                  1
Loss    1                  0

诚实是您的最大利益所在(但同时也要确保您的对手不相信您)。

比赛将以循环赛的形式进行,目的是使您在比赛中的总成绩最大化。

I / O格式:

  • 您的机器人将是一个带有4个参数的Python 2.7函数,并且必须具有唯一的名称(用于表示您的提交)。
  • 前两个参数始终按顺序排列:对手的过去动作,然后是您的过去动作。这些将是从第一轮到最近一轮的顺序列表,每个索引都包含一个列表,其中列出了对手声称要进行的动作,然后是他们实际进行的动作。
  • 接下来的两个参数将使您的漫游器可以确定这是“诚实”回合还是“真实”回合。如果这是一次“诚实”的回合,那么它们都将为None。如果这是一个“真实”的回合,那么按照顺序,这将是对手声明要进行的动作,然后是声明要进行的动作。
  • 代表移动的所有自变量或自变量的一部分将分别使用“ R”,“ P”和“ S”表示岩石,纸张和剪刀。
  • 您的函数应该为岩石返回“ R”,为纸张返回“ P”,或为剪刀返回“ S”。有能力返回其他值的机器人将被取消比赛资格。
  • 每个漫游器将与其他漫游器运行200次,而自身运行100次。我们的目标是成为比赛结束时获得最高分的机器人。
  • 关于评论中的讨论,提交的内容不得读取或写入任何文件,也不得以任何方式破坏或读取对手的代码。

例子:

这是我快速整理的四个示例机器人。他们将作为其他机器人加入竞赛。如果输给最后一个,则需要做一些工作。

def honestpaper(I,dont,care,about_these):
    return "P"

def honestrock(I,dont,care,about_these):
    return "R"

def honestscissors(I,dont,care,about_these):
    return "S"

import random
def randombot(I,dont,care,about_these):
    return random.choice(["R","P","S"])

控制器:

这是我要使用的控制器。新提交的内容将在开始时导入并添加到bot_map词典中。

from honestrock import honestrock
from honestpaper import honestpaper
from honestscissors import honestscissors
from randombot import randombot

bot_map = {
  0:honestrock, 1:honestpaper, 2:honestscissors, 3:randombot
}

player_num=len(bot_map)

def real(history1,history2,number,honest1,honest2):
    return bot_map[number](history1,history2,honest1,honest2)

def honest(history1,history2,number):
    return bot_map[number](history1,history2,None,None)

def play_match(num1,num2):
    history1=[]
    history2=[]
    score1=0
    score2=0
    for x in range(250):
        h1=honest(history2,history1,num1)
        h2=honest(history1,history2,num2)
        r1=real(history2,history1,num1,h2,h1)
        r2=real(history1,history2,num2,h1,h2)

        if h1==r1: score1+=1
        if h2==r2: score2+=1

        if r1==r2: score1+=1; score2+=1
        elif r1=="R":
            if r2=="P": score2+=2
            else: score1+=2
        elif r1=="P":
            if r2=="S": score2+=2
            else: score1+=2
        else:
            if r2=="R": score2+=2
            else: score1+=2

        history1.append([h1,r1])
        history2.append([h2,r2])
    return score1,score2

scores = []
for x in range(player_num):
    scores.append(0)

for _ in range(100):

    for x in range(player_num):
        for y in range(player_num):
            scorex,scorey=play_match(x,y)
            scores[x]+=scorex
            scores[y]+=scorey

for score in scores:
    print score

最终成绩:

csbot                    3430397
thompson                 3410414
rlbot                    3340373
have_we_been_here_before 3270133
mason                    3227817
deepthought              3019363
adaptive_bot             2957506
THEbot                   2810535
dontlietome              2752984
irememberhowyoulie       2683508
learningbot4             2678388
betrayal                 2635901
averager                 2593368
honestrandom             2580764
twothirds                2568620
mirrorbot                2539016
tit4tat                  2537981
honestscissors           2486401
trusting_bot             2466662
rotate_scissors          2456069
rotate_paper             2455038
rotate_rock              2454999
honestpaper              2412600
honestrock               2361196
rockBot                  2283604
trustingRandom           2266456
user5957401bot           2250887
randombot                2065943
Dx                       1622238
liarliar                 1532558
everybodylies            1452785

1
状况如何?
user1502040

Answers:


11

石匠

尝试收集有关其他机器人的信息,例如它们的诚实程度以及我的第一步如何影响它们。然后,我尝试寻找遵循模式的其他明显的机器人,并利用这些机器人给我更多的积分。最终,梅森拥有一个秘密武器:一个秘密社会的知识,两个参与其中的机器人相互进行全面抽奖,每人获得500分。不幸的是,这个秘密真是……好秘密,而且梅森每次都会改变。

def mason(op_hist, my_hist, op_move, my_move):
    win_map = {"R": "P", "P": "S", "S": "R"}
    lose_map = {"R": "S", "P": "R", "S": "P"}
    if not len(op_hist):
        return "S"
    if op_hist[0] == ['S', 'S']:
        code = "S" + "".join("RPS"[ord(i) % 3] if isinstance(i, str) else "RPS"[i % 3] for i in __import__("sys")._getframe().f_code.co_code)[1::2]
        honest, guess = zip(*op_hist)
        if honest == guess == tuple(code[:len(op_hist)]):
            return code[len(op_hist)]
    op_honesty = sum(len(set(round))-1 for round in op_hist) / float(len(op_hist))
    if not my_move:
        moves = "".join(i[1] for i in op_hist)
        # Identify rotators
        if "PSRPSR" in moves:
            return moves[-2]
        # Identify consecutive moves
        if "RRRRR" in moves[:-10] or "SSSSS" in moves[:-10] or "PPPPP" in moves[:-10]:
            return win_map[moves[-1]]
        # Try just what wins against whatever they choose most
        return win_map[max("RPS", key=moves.count)]
    op_beats_my_honest = sum(win_map[me[0]] == op[1] for op, me in zip(op_hist, my_hist)) / float(len(op_hist))
    op_draws_my_honest = sum(me[0] == op[1] for op, me in zip(op_hist, my_hist)) / float(len(op_hist))
    op_loses_my_honest = sum(lose_map[me[0]] == op[1] for op, me in zip(op_hist, my_hist)) / float(len(op_hist))
    if op_honesty <= 0.4:
        return win_map[op_move]
    max_prob = max((op_loses_my_honest, op_draws_my_honest, op_beats_my_honest))
    if max_prob >= 0.6:
        if op_beats_my_honest == max_prob:
            return lose_map[my_move]
        if op_draws_my_honest == max_prob:
            return win_map[my_move]
        if op_loses_my_honest == max_prob:
            return my_move
        assert False
    return my_move

9

RLbot:强化学习

使用强化学习方法,以类似于n型武装匪徒问题的方式处理该游戏。它以两种方式做到这一点:尝试了解哪个声明对每个对手更好,并坚持那个声明(对恒定的机器人有用),并尝试学习在先前相似情况下的各种动作(相对玩法相似)的结果。 ,例如,石头与纸类似于先前的纸与剪刀)。最初的假设是乐观的,因此该玩家将假设诚实会给他3分,撒谎会给2分,因此,除非另有证明,否则始终会诚实。

更新:第一个锦标赛结果凸显了该机器人的一个问题,即该机器人无法检测其对手声明中的模式(这使其在对抗旋转器时表现欠佳)。然后,我在诚实回合的代码中添加了模式匹配组件,该组件使用正则表达式来查找对手声明历史中最长的后缀,该后缀在该历史中的某个位置存在,此后的动作是什么。我们假设对手将再次打相同的动作,并像以前一样使用强化学习来确定最佳答案。

import re
def rlbot(hismoves,mymoves,hismove,mymove):
 def score(d,m1,m2):
  s=0
  if m1==m2:
   s=1
  elif (m1+m2) in "RPSR":
   s=2
  return s+(d==m2)

 alpha=0.2
 if mymove:
  history=[([d1,m1],[d2,m2]) for ((d1,m1),(d2,m2)) in zip(hismoves,mymoves) if score(None,hismove,mymove)==score(None,d1,d2)]
  bestscore=-1
  bestmove=""
  for move in "RPS":
   ev=2+(move==mymove)
   for ((d1,m1),(d2,m2)) in history:
    if score(None,move,mymove)==score(None,m2,d2):
     ev=(1-alpha)*ev+alpha*score(d2,m1,m2)
   if ev>bestscore:
    bestscore=ev
    bestmove=move
  return bestmove

 else:
  if len(hismoves)==0:
   return "R"
  bestscore=-1
  bestmove=""
  hisdeclarations="".join(d for [d,m] in hismoves)
  predicted_move=re.search(r'(.*)\n.*\1(.)',hisdeclarations+'\n'+hisdeclarations).group(2)
  history=[([d1,m1],[d2,m2]) for ((d1,m1),(d2,m2)) in zip(hismoves,mymoves) if d1==predicted_move]
  for move in "RPS":
   ev=3
   for (his,my) in history:
    (d1,m1)=his
    (d2,m2)=my
    if d2==move:
     ev=(1-alpha)*ev+alpha*score(d2,m1,m2)
   if ev>bestscore:
    bestscore=ev
    bestmove=move
  return bestmove

在线尝试!


6

我从来没有真正使用过python,所以我确定我在某个地方犯了一个错误。

import random
def learningbot3(opponentlist,a,opponent,me):
 #tell the other bot a random thing
 if opponent==None:
  return random.choice(["R","P","S"])
 #check whether the other bot has mostly told the truth in the last 10 rounds
 truth=0
 for game in opponentlist[-10:]:
  truth-=1
  if game[0]==game[1]:
   truth+=2
 #assume the other bot will tell the truth
 if truth>=3:
  if me==opponent:
    return me
  elif opponent=="R":
   return "P"
  elif opponent=="P":
   return "S"
  elif opponent=="S":
   return "R"
 #assume the other bot is lying
 elif truth<=-3:
  return random.choice([me,opponent])
  #return opponent
 #pick whatever we said we would
 else:
  return me

它应该检查最近10轮以了解对手撒谎的频率,然后根据此选择不同的响应。


6

这是我的自适应机器人。它分析对手的最后两步,以确定它是否是诚实的机器人,并据此进行游戏:

编辑1:如果另一个机器人是恒定机器人(即始终使用相同的武器),则该机器人通过同时玩获胜的武器和诚实来粉碎它。

编辑2:改进的常量机器人检测程序,也可以与旋转机器人一起使用。

import random
def adaptive_bot(other_past, my_past, other_next, my_next):
    winners = {"R": "P", "P": "S", "S": "R"}
    if my_next is None:
        return winners[other_past[-6:][0][1]] if other_past else random.choice(list(winners.keys()))
    else:
        is_other_honest = all([other_claim == other_move for other_claim, other_move in other_past[-2:]])
        return winners[other_next] if is_other_honest else my_next

5

csbot

def csbot(ophist,myhist,opdecl,mydecl):

  import random

  RPS = "RPS"

  def value(opd,myd,opmove,mymove):
    if opmove==mymove:
      val = 9
    elif opmove+mymove in RPS+RPS:
      val = 20
    else:
      val = -2
    return val+10*(myd==mymove)-(opd==opmove)

  def best(od,md):
    l = float(len(ophist))
    weights = dict([ (m, random.random()/8) for m in RPS ])
    for n in range(len(ophist)):
      if ophist[n][0]==od and myhist[n][0]==md:
        weights[ophist[n][1]] += 1+4*((n+1)/l)**2
    sw = sum([ weights[m] for m in RPS ])
    bestexpect = 0
    for m in RPS:
      expect = sum([ weights[om]/sw*value(od,md,om,m) for om in RPS ])
      if expect > bestexpect:
        bestexpect = expect
        bestmove = m
    return bestmove, bestexpect


  honest = all ([ decl==mv for decl, mv in ophist ])

  if honest:
    if mydecl<>None:
      return mydecl
    expnxt = set();
    for i in range(len(ophist)-1):
      if ophist[i][0]==ophist[-1][0]:
        expnxt.add(ophist[i+1][0])
    if len(expnxt)==1:
      return RPS[ (RPS.index(expnxt.pop())+1) % 3 ]

  if mydecl==None:
    l = float(len(ophist))
    weights = dict([ (m, random.random()) for m in RPS ])
    for n in range(len(ophist)):
      weights[ophist[n][0]] += 1+((n+1)/l)**2
    sw = sum([ weights[m] for m in RPS ])
    bestexpect = 0
    worstexpect = 99
    for m in RPS:
      expect = sum([ best(od,m)[1]/sw*weights[od] for od in RPS ])
      if expect > bestexpect:
        bestexpect = expect
        bestmove = m
      if expect < worstexpect:
        worstexpect = expect
    if bestexpect-worstexpect < 3:
      bestmove = random.choice(RPS)
    return bestmove

  return best(opdecl,mydecl)[0]

只要对方是诚实的,就可以检测简单的确定性机器人。进行最大化期望值的移动,我们主要是为了获得积分,但也不想给其他玩家任何积分。但是自己的分数要好十倍,因此value函数中的数字异常。根据在这种情况下我们之前见过它们的频率(声明的动作),可以预期对手的动作,但是最近看到的动作比之前看到的动作更重要。对于随机的初始移动(之前从未见过的情况)和一些额外的模糊性,权重包括较小的额外随机数。

更新:在诚实的回合中也使用预期的结果。为了做到这一点,规范化并考虑对手可能为诚实而获得的额外积分-它不能在真正的回合中影响我们的决定,但现在是必须的。我从一开始就考虑过这样做,但错误地认为这样做不值得。我看到有可能给予trusting_bot更少的积分(但是无论如何该机器人都不是强大的对手),但是却错过rockbot了诚实比赛中的良好表现可以获得额外的积分,即使它在这一轮中的表现是随机的。


这似乎并不总是返回结果。
user1502040

我认为你if mydecl == None:是错误的。
user1502040

@ user1502040您为什么这样认为?我从来没有发现任何问题。
Christian Sievers


4

背叛

def betrayal(yours, mine, you ,me):
    import random
    if you is None:
        pick = random.choice(['R','P','S'])
    else:
        you = you[0]
        me = me[0]
        if len(yours) < 50: #Build myself a reputation of honesty
            pick = me
        else:
            if len(yours) >= 50 and len(yours) < 100:
                honesty = sum([1 if y[0]==y[1] else 0 for y in yours])/float(len(yours))
                if honesty <= 0.5: #If dishonest try to outwit
                    pick = 'S' if me=='R' else 'R' if me == 'P' else 'P'
                else: #Else just plain cheat
                    pick = 'P' if you=='R' else 'R' if you=='S' else 'S'
            elif len(yours) >= 100: #When dishonest moves outweight honest moves, change tactics...
                honesty = sum([1 if y[0]==y[1] else 0 for y in yours[50:]])/float(len(yours[50:]))
                if honesty <= 0.5: #... and just play according to most likely pick
                    what_did_you_do = [k[1] for k in yours if k[1]!=k[0]]
                    index = [i for i,k in enumerate(yours) if k[1]!=k[0]]
                    what_i_said_i_ll_do = [k[0] for i,k in enumerate(mine) if i in index]
                    matches = zip(what_i_said_i_ll_do, what_did_you_do)
                    what_you_might_answer = [k[1] for k in matches if k[0]==me]
                    table = [len([k for k in what_you_might_answer if k=='R']),len([k for k in what_you_might_answer if k=='P']),len([k for k in what_you_might_answer if k=='S'])]
                    maybe_your_pick = ['R','P','S'][table.index(max(table))]
                    pick = 'P' if maybe_your_pick=='R' else 'R' if maybe_your_pick=='S' else 'S'
                else:
                    pick = 'P' if you=='R' else 'R' if you=='S' else 'S'
    return pick

想法是,在前50步中,我会诚实地进行比赛,然后,一旦我诱使对手认为我是诚实的人,就不诚实地进行比赛,尝试进行与对手的比赛相反的事情(基于他是诚实还是不诚实)以往)。当我达到诚实而非不诚实的程度时,我会改变策略并根据以前的已知配置选择对手最可能的举动。


3
import random
def honestrandom(a, b, c, move):
    if move == None:
        return random.choice(["R","P","S"])
    return move

3

机器人名称:我记得你撒谎

import random

#Bot Name: I Remember How You Lie
def irememberhowyoulie(opponentlist, mylist, opponentmove, mymove):
    random.seed()

    wintable = {
                "R": {"R": 1, "P": 0, "S": 2},
                "P": {"R": 2, "P": 1, "S": 0},
                "S": {"R": 0, "P": 2, "S": 1}
               }

    winprob = {
               "R": {"R": 0.0, "P": 0.0, "S": 0.0},
               "P": {"R": 0.0, "P": 0.0, "S": 0.0},
               "S": {"R": 0.0, "P": 0.0, "S": 0.0}
              }

    totalprob = {"R": 0, "P": 0, "S": 0}

    # Calculate the probability that the opponent will lie base on the probability that it lied in the last 15 ~ 25 rounds
    # And calculate the probability that what the bot will show next
    picklength = min(random.randint(15, 25), len(opponentlist))
    lying, tempsum = 0, 0.0
    pickedup = {"R": 0, "P": 0, "S": 0}
    if picklength == 0:
        lying = 0.5
    else:
        for eachround in opponentlist[-picklength:]:
            pickedup[eachround[1]] += 1
            if eachround[0] != eachround[1]:
                lying += 1
        lying = lying * 1.0 / picklength
    for s in pickedup:
        pickedup[s] = 1.0 / (1 + pickedup[s])
        tempsum += pickedup[s]

    #Honest Round
    if opponentmove is None and mymove is None:
        a = random.random() * tempsum
        if a < pickedup["R"]:
            return "R"
        elif a < pickedup["R"] + pickedup["P"]:
            return "P"
        else:
            return "S"

    #Real Round
    else:                
        for me in winprob:
            ishonest = 0
            if me == mymove:
                ishonest = 1
            for op in winprob[me]:
                if op == opponentmove:
                    winprob[me][op] = (wintable[me][op] + ishonest) * (1 - lying)
                else:
                    winprob[me][op] = (wintable[me][op] + ishonest) * lying * pickedup[op] / (tempsum - pickedup[opponentmove])
                totalprob[me] += winprob[me][op]

        optimalmove, optimalvalue = "R", -9999999.0
        for me in totalprob:
            if totalprob[me] > optimalvalue:
                optimalmove, optimalvalue = me, totalprob[me]
        return optimalmove

测试了100次回合,结果获胜者平均得分约为220。我认为我有点老实;)

我第一次参加KOTH挑战,所以我认为仍有改进的空间


3

针锋相对

经典的Axelrodian参赛者:充满希望,但琐碎;简单,但功能强大。这不是囚徒的困境,我也没有尝试预测对手的举动,因此我非常怀疑这是否会真正具有竞争力。但是“合作”仍然为参赛者提供了最全面的分数,因此我认为至少会适中。

import random
def tit4tat(opphist, myhist, oppfut, myfut):
    if (not myfut): return random.choice(['R','P','S'])
    if (not opphist) or opphist[-1][0]==opphist[-1][1]: return myfut
    return random.choice(['R','P','S'])

3

三分之二

使用“沙箱”和此注释中提到的Peter Taylor策略。

它使用纳什均衡

import random

def two_thirds(h_opp, h_me, opp, me):

    def result(opp, me):
        if opp==me: return 0
        if opp=="R" and me=="S" or opp=="S" and me=="P" or opp=="P" and me=="R": return -1
        return 1

    moves = {"R", "P", "S"}
    honest = (opp == None)
    if honest:
        return random.choice(list(moves))
    else:
        res = result(opp, me)
        if res==-1:
            counter = list(moves - {opp, me})[0]
            return random.choice([me,counter,counter])
        if res==1:
            return random.choice([me,me,opp])
        return me

这对我来说是错误的。在第13行,返回random.choice(moves)。我认为这可能是因为您在字典上使用.choice。在此之前,恐怕此提交内容无效。
狮ry

@Gryphon不是字典,而是集合。
LyricLy

啊对不起 我只是看到弯曲的方括号,以为“字典”。我的错。知道为什么random.choice在那条线上出错了吗?
狮ry

@Gryphon似乎random.choice依赖于选择一个随机索引号,然后在该索引处返回列表中的对象。由于集合没有顺序,因此它们也不支持索引,因此无法使用random.choice。一个简单的解决方法是在调用之前将集合转换为列表random.choice
LyricLy

啊。我在这台计算机上没有python,所以现在无法修复,但是到家后我会在代码中修复它。如果@ mbomb007可以在这里修复它,那就太好了。
狮ry

3

深层思想

def check_not_loose_bot(opHist, myHist):
    not_loose_points = 0
    for i in range(0, len(opHist)):
        if opHist[i][1] == opHist[i][0] or myHist[i][0] == win_map[opHist[i][0]] and opHist[i][1] == win_map[myHist[i][0]]:
            not_loose_points += 1
    not_loose_percent = float(not_loose_points) / len(opHist)
    if not_loose_percent > 0.9:
    #    print("is not willing to loose")
        return True
    return False

def check_trick_bot(opHist, myHist):
    trick_points = 0
    for i in range(0, len(opHist)):
        if opHist[i][1] == win_map[myHist[i][0]]:
            trick_points += 1
    trick_percent = float(trick_points) / len(opHist)
    if trick_percent > 0.9:
  #      print("is tricking me")
        return True
    return False

def check_honest_bot(opHist):
  #  print("check honest")
    honest_points = 0
    for i in range(0, len(opHist)):
        if opHist[i][0] == opHist[i][1] :
            honest_points += 1
    honest_percent = float(honest_points) / len(opHist)
    if honest_percent > 0.9:
    #    print("is honest")
        return True
    return False

def check_self_match(opHist, myHist):
    for i in range(0, len(myHist)):
        if opHist[i][0] != myHist[i][0]:
            # im not playing against myself, because the other one was claiming a different value than i did
#            print("differ: "+str(opHist)+", "+str(myHist))
            return False
        if opHist[i][1] != opHist[i][0]:
#            print("lie")
            # im not playing against myself, because the other bot wasn't honest (and i'm always honest as long as i think i play against myself)
            return False
    return True

def check_equal(move1, move2, fullCheck): # WARNING: FOR COMPABILITY THIS IS RETURNING NEQ INSTEAD OF EQ
    if fullCheck:
        return move1 != move2
    else:
        return move1[0] != move2[0] #only check claims

def is_pattern(opHist, pattern_start, prob_pattern_start, pattern_length, full_check):
    for i in range(0, pattern_length-1):
        if check_equal(opHist[pattern_start + i] , opHist[prob_pattern_start + i], full_check):
            return False
    return True

win_map = {"R": "P", "P": "S", "S": "R"}
def deterministic_best_guess(opHist, full_check = True):
    size = 0
    random_result = random.choice(["R", "P", "S"])
    for pattern_length in range(2, (len(opHist)+1)/2): #a pattern has to occur at least twice
        for pattern_start in range(0, len(opHist) - 2 * pattern_length):
            if not is_pattern(opHist, pattern_start, len(opHist) - pattern_length + 1, pattern_length, full_check):
                 continue
            is_repeated = False
            is_fooled = False
            for repeated_pattern_start in range(pattern_start + pattern_length, len(opHist) - pattern_length):
                if not is_pattern(opHist, pattern_start, repeated_pattern_start, pattern_length, full_check):
                     continue
                is_repeated = True
                if check_equal(opHist[pattern_start + pattern_length - 1], opHist[repeated_pattern_start + pattern_length - 1], full_check):
                    is_fooled = True
                    break
    #            print("pattern found: " + str(opHist[pattern_start : pattern_start + pattern_length]) +" at "+str(pattern_start)+" and "+str(repeated_pattern_start))
   #             print("check: "+str(opHist))
            if is_fooled or not is_repeated:
                break
            #we have found a deterministic best guess
  #          print("most likely next step: "+ str(opHist[pattern_start + pattern_length - 1]))
            if full_check:
                return win_map[opHist[pattern_start + pattern_length - 1][1]], True
            return win_map[opHist[pattern_start + pattern_length - 1][0]], True # if we don't have a full check, the pattern only applies to claims. So pretend to win against the claimed result.

    #fallback
 #   print("fallback")
    return random_result, False

def DeepThought(opHist, myHist, opMove, myMove):
    if opMove == None:
    #claiming phase
        if len(myHist) == 0:
        #seed random to be able to be deterministic when chosing randomly
            #The seed is secret (kind of)
            random.seed(133427)
        else:
            #seed random according to my previous claims
            seed = 133427
            for i in range(0, len(myHist)):
                if myHist[i][0] == "R":
                    seed = seed*3+1
                elif myHist[i][0] == "S":
                    seed = seed*7+1
                elif myHist[i][0] == "P":
                    seed = seed*11+1
                while seed%2 == 0:
                    seed /= 2
            random.seed(seed)
        if check_self_match(opHist, myHist):
            #claim a random value, will happen in the first round or in a self-match
            result = random.choice(["R", "P", "S"])
            return result
      #  print("differ detected")
        if check_trick_bot(opHist, myHist) and len(myHist) > 10:
            # i play against a trick bot. I can reduce its points by trieing to guess its claim, and force him to lie
            result, sure = deterministic_best_guess(opHist, False)
        else:
            result, sure = deterministic_best_guess(opHist)
        random.seed(0)
        return result
    if check_self_match(opHist, myHist):
        #i play against myself, i can only hope for a honest draw, so do that
        return myMove
#    print("no self-math")
    #dbg needs a valid seed, so provide it
    random.seed(133427)
    result, sure = deterministic_best_guess(opHist)
    if sure:
        #i'm sure i play against a deterministic bot. I'll be honestly winning. YEY.
        return myMove
    if check_honest_bot(opHist) and len(opHist) > 10:
        #i play against an honest bot. I'll accept a draw, but i will not accept a loss
        if win_map[myMove] == opMove:
            return win_map[opMove]
        return myMove
    if check_trick_bot(opHist, myHist) and len(opHist) > 10:
        #i play against a tricking bot. He'll make me either loose honestly (1 Pnt) or i have to be dishonest (2 Pnt). So let's lie.
        return win_map[win_map[myMove]]
    if check_not_loose_bot(opHist, myHist) and len(opHist) > 10:
        #i play against a bot thats not willing to loose. If it looks like i won, i can loose honestly (1Pnt, 2Pnt for him),
        #or i have to be dishonest (2 Pnt, 0 Pnt for him). So let's lie in that case.
        #If it looks like its a draw, i'll be honest (conservative way), and get my 2 : 2 Pnt.
        #If it lokks like i'll loose, I'll not accept it. I'll lie to win for 2 : 1 Pnt.
        if myMove == opMove:
            return myMove
        if myMove == win_map[opMove]:
            # He'll lie. So lie together and keep smiling.
            return opMove
        # I'll loose. NO!!!! Not gonna happen
        return win_map[opMove]
    return myMove

只是一些注意事项:

  • DeepThought喜欢思考。很多。我对此感到抱歉,但我真的不知道如何解决。我怪Python。
  • DeepThought试图做到诚实。Beeing诚实确实为您提供了一个附加点,与正常RPS的期望值相同
  • 但是:DeepThought每场比赛的平均得分甚至超过2分。他使用某种检测来发现一些常见的行为(例如欺骗,诚实等),并据此进行适应。
  • DeepThought纯粹是确定性的,因此它会自成一体,因为它总是在两端进行相同的决定。
  • 为确保自己不自欺欺人,它具有特殊的检测能力,就像这里的其他一些机器人一样。这是非常激进的,即使假设在下一轮之后(以及在第一轮)也是如此。基本上,只要对手的动作完全是我的,我就假定其为镜像匹配。
  • 有趣的部分(以及具有许多假阳性的部分)是对确定性机器人程序的检查,这仅取决于其自身的先验结果。该检查将搜索大小为n的任何模式,将其重复两次,这可能描述了最后n-1次移动,从而预测了对手的要求并提前进行了移动。遗憾的是,这部分需要一些时间。

我对koth和Python都不陌生,所以请告诉我我是否在此bot中弄乱了任何东西。我认为它不能胜过强化学习(因为我想它会太快地学习我的动作),但让我们尝试一下。

我喜欢这个挑战,如果我有空的话,我想添加一个有机的计算方法(尽管对较大的尺寸可能压力太小了)。可以添加多个建议吗?还是禁止通过插入只针对主要机器人的东西来防止破坏您的主机器人?

编辑:固定的代码类型,使我成为非母语的英语使用者


禁止发布多个条目,但禁止发布支撑另一种机器人(甚至不是您自己的机器人)的条目。只要不是故意设计的,就可以输给另一个机器人。
狮ry

运行此程序时出现错误,因为DeepThought函数的第32行return result需要附加缩进。我认为它应该在紧随其后的硕大if语句内,因为该变量return仅在it语句中声明。我在代码中进行了修改,现在可以正常运行了。如果您不介意在此进行更改,那就太好了。

3
您似乎对全局随机生成器的状态感到困惑,这可能不行。我考虑做类似的事情,并找到了以下解决方案:使用创建一个新的随机对象,R=random.Random(seed)并像这样使用它:R.choice(...)
Christian Sievers

@狮ry固定。从我的本地脚本转换为se时可能出现了一些错误,其中的所有内容都必须再加上一次
Alex berne

1
@alexberne您可以选择粘贴的代码,然后单击{}工具栏上的按钮以自动缩进每一行。
塞尔柱克

2
import random
def user5957401bot(a,b,c,d):
    if d == None: 
       return random.choice(["R","P","S"])
    else:
       return random.choice(["R","P","S",d])

2

have_we_been_here_before

只需问“我们以前来过这里”,然后选择在任何此类先前比赛中均能获得最佳平均成绩的举动即可。

编辑:诚实俱乐部。我添加了一小段代码,因为另一个机器人(泥工)通过与自己形成秘密俱乐部而做得非常好。但是请注意,与诚实的对手对抗时,诚实地对抗对手平均可以获得完全相同的收益,也许还有更广泛的共同利益?

Edit2:在我之前编写两个机器人的时候,两个机器人都利用了旋转器,因此我将添加另一个代码块,以便也能赶上这个潮流。我猜我的代码看起来似乎很老派-坚持使用任何编程语言中发现的熟悉的构造,因为我真的不了解Python。

import random

def have_we_been_here_before(opponentList, myList, opponent, me):

    def win(x):
        if x=="R": return "P"
        elif x=="P": return "S"
        elif x=="S": return "R"

    def calc_score(r1, r2):
        if r1==r2: return 1
        elif r1==win(r2): return 2
        else: return 0

    def have_we(opponentList, myList, opponent, me, me2):
        score, count = 0, 0
        for n in range(len(opponentList)):
            if (opponent == opponentList[n][0] and me == myList[n][0]):
                score += calc_score(me2, opponentList[n][1])
                count += 1
        if count == 0: return 0
        else: return float(score) / float(count)

    if opponent == None:

        # exploit rotators
        if len(opponentList) >= 3:
            rotator = True

            for n in range(3, len(opponentList)):
                if opponentList[n][1] != opponentList[n % 3][1]:
                    rotator = False
                    break

            if rotator: return win(opponentList[len(opponentList) % 3][1])

        if len(opponentList) == 0:
            return random.choice(["R", "P", "S"])
        else:
            # crude attempt to exploit the house bots
            prev = random.choice(opponentList)[1]
            return win(prev)

    # Play honestly if opponent has played honestly so far
    honest = True
    for oppMove in opponentList:
        if (oppMove[0] != oppMove[1]):
            honest = False
            break

    if honest: return me
    # Done playing honestly

    # Have we been here before?
    rock = have_we(opponentList, myList, opponent, me, "R")
    paper = have_we(opponentList, myList, opponent, me, "P")
    sissors = have_we(opponentList, myList, opponent, me, "S")

    if rock > paper and rock > sissors: return "R"
    elif paper > rock and paper > sissors: return "P"
    elif sissors > paper and sissors > rock: return "S"
    else: return win(opponent)

2

THEbot:诚实的剥削者

import random 
def thebot(ho,hm,om,mm):
    hands = {"R": "P", "P": "S", "S": "R"}
    if om == None:
        if (len(set([i[0] for i in ho])) < 3) and (len(ho) > 2):
            return hands[random.choice(list(set([i[0] for i in ho])))]
        else:
            return random.choice(["R","P","S"])
    else:
        if sum(1 for i in ho if i[0]==i[1]) > (len(ho)/3):
            if om == mm:
                return om
            else:
                return hands[om]
        else:
            return mm

我只是意识到我因点击不当而感到遗憾,对不起。编辑时将撤消。(无法原样更改。)
Christian Sievers

@ChristianSievers编辑
Cinaski '17

@ChristianSievers谢谢!
Cinaski '17

2

汤普森

import math
import random

moves = list(range(3))
names = "RPS"
from_name = dict(zip(names, moves))
to_name = dict(zip(moves, names))

#Payoff matrices given each relationship between honest moves.
A = [
    [[2, 1, 3], [2, 1, 0], [0, 2, 1]],
    [[1, 3, 2], [1, 0, 2], [2, 1, 0]],
    [[3, 2, 1], [0, 2, 1], [1, 0, 2]]
]

#Add a 1.2% penalty for the opponent's score (idea shamelessly stolen from csbot).
for d_h in range(3):
    for i in range(3):
        for j in range(3):
            A[d_h][i][j] -= 0.012 * A[[0, 2, 1][d_h]][j][i]

third = 1. / 3
two_thirds = 2 * third

nash_prior = [
    [[1, 0, 0], [two_thirds, 0, third], [third, 0, two_thirds]], 
    [[third, 0, two_thirds], [1, 0, 0], [two_thirds, 0, third]], 
    [[two_thirds, 0, third], [third, 0, two_thirds], [1, 0, 0]]
]

def mult_m_v(M, v):
    w = [0 for _ in v]
    for i, M_i in enumerate(M):
        for M_ij, v_j in zip(M_i, v):
            w[i] += M_ij * v_j
    return w

def mean_belief(counts):
    c = 1. / sum(counts)
    return [n * c for n in counts]

def sample_belief(counts):
    return mean_belief([random.gammavariate(n, 1) for n in counts])

def thompson(h_opp, h_me, opp, me):

    #Prior rationality of opponent.
    a = 0.95

    #Confidence in priors.
    n0_h = 0.5
    n0_m = 0.5

    def v(x):
        return [x for _ in range(3)]

    h_p = [v(n0_h * third) for _ in range(3)]

    m_p0 = [v(None) for _ in range(3)]
    m_p1 = [v(None) for _ in range(3)]

    #Expected prior is a mixture between nash equilibrium and uniform distribution.
    for h_i in range(3):
        for h_j in range(3):
            m_p0[h_i][h_j] = [n0_m * (a * nash + (1 - a) * third) for nash in nash_prior[h_i][h_j]] 

    for d_j_prev in range(3):
        for d_ij in range(3):
            m_p1[d_j_prev][d_ij] = list(m_p0[0][d_ij])

    #Track whether it's better to model the real moves based on the exact honest moves or
    #just the relationship between honest moves together with the opponent's defection strategy in the previous round.
    log_mp0 = 0
    log_mp1 = 0

    #Identify myself and always cooperate.
    is_me = True

    for (t, ((h_i, m_i), (h_j, m_j))) in enumerate(zip(h_me, h_opp)):

        h_i, m_i, h_j, m_j = from_name[h_i], from_name[m_i], from_name[h_j], from_name[m_j]

        d_j = (m_j - h_j) % 3
        d_ij = (h_j - h_i) % 3

        if t:
            h_j_prev = from_name[h_opp[t - 1][0]]
            m_j_prev = from_name[h_opp[t - 1][1]]
            h_p[h_j_prev][h_j] += 1

            d_j_prev = (m_j_prev - h_j_prev) % 3

            log_mp0 += math.log(m_p0[h_i][h_j][d_j] / sum(m_p0[h_i][h_j]))
            log_mp1 += math.log(m_p1[d_j_prev][d_ij][d_j] / sum(m_p1[d_j_prev][d_ij]))

            m_p1[d_j_prev][d_ij][d_j] += 1

        m_p0[h_i][h_j][d_j] += 1

        if is_me and ((h_i != h_j) or (h_j != m_j)):
            is_me = False

    if is_me:
        random.seed(len(h_me) + 1337)
        me_next = random.randrange(3)

    log_ps = [log_mp0, log_mp1]
    log_p_max = max(log_ps)
    ps = [math.exp(log_p - log_p_max) for log_p in log_ps]
    p0 = ps[0] / sum(ps)

    #We have to blend between the predictions of our 2 models for the real rounds.  

    def sample_expectation(h_i, h_j, d_j_prev=None):
        d_ij = (h_j - h_i) % 3
        if d_j_prev is None or random.random() < p0:
            p = m_p0[h_i][h_j]
        else:
            p = m_p1[d_j_prev][d_ij]
        return mult_m_v(A[d_ij], sample_belief(p))

    def take_expectation(h_i, h_j, d_j_prev=None):
        d_ij = (h_j - h_i) % 3
        e0 = mult_m_v(A[d_ij], mean_belief(m_p0[h_i][h_j]))
        if d_j_prev is None:
            return e0
        e1 = mult_m_v(A[d_ij], mean_belief(m_p1[d_j_prev][d_ij]))
        return [p0 * e0i + (1 - p0) * e1i for e0i, e1i in zip(e0, e1)]

    #We use thompson sampling, selecting the optimal deterministic strategy
    #with respect to a random opponent sampled from the posterior.

    #Actually, we use optimistic thompson sampling which clips samples to have >= than the mean expected value.

    if opp == None:
        #For the honest round we perform a lookahead to the real round to choose our strategy.
        if h_opp:
            if is_me:
                return to_name[me_next]
            h_j_prev = from_name[h_opp[-1][0]]
            m_j_prev = from_name[h_opp[-1][1]]
            d_j_prev = (m_j_prev - h_j_prev) % 3
            h_p_s = sample_belief(h_p[h_j_prev])
            h_p_u = mean_belief(h_p[h_j_prev])
            s_i = [0] * 3
            s_i_u = [0] * 3
            for h_i in range(3):
                for h_j in range(3):
                    s_i[h_i] += h_p_s[h_j] * max(sample_expectation(h_i, h_j, d_j_prev))
                    s_i_u[h_i] += h_p_u[h_j] * max(take_expectation(h_i, h_j, d_j_prev))
                s_i[h_i] = max(s_i[h_i], s_i_u[h_i])
            return to_name[s_i.index(max(s_i))]
        else:
            return to_name[me_next]
    else:
        if h_opp:
            if is_me:
                return me
            h_j_prev = from_name[h_opp[-1][0]]
            m_j_prev = from_name[h_opp[-1][1]]
            d_j_prev = (m_j_prev - h_j_prev) % 3
        else:
            if opp == me:
                return me
            d_j_prev = None
        h_i, h_j = from_name[me], from_name[opp]
        s_i = [max(s0, s1) for s0, s1 in zip(sample_expectation(h_i, h_j, d_j_prev), take_expectation(h_i, h_j, d_j_prev))]
        return to_name[(h_i + s_i.index(max(s_i))) % 3]

有趣的条目。我会尽快运行,应该可以在今天下午发布结果。
狮ry

好的,我稍微调了一下参数。
user1502040

得到它了。抱歉,更新需要很长时间,只是,每次即将完成时,都会有人更新其bot,或者我得到一个新的bot,然后必须再次运行它。
狮ry

@Gryphon您可以保留所有配对结果的表格,因此,当机器人被更新时,您只需要运行200 *(num_bots-1)+ 100个新匹配。
user1502040

2

镜像机器人

import random

def mirrorbot(op_hist, my_hist, op_move, my_move):
    if my_move == None :
        return random.choice(["R","P","S"])
    else :
        for x in range(len(op_hist)):
            if ((op_hist[len(op_hist) -x-1][0] == my_move) and (my_hist[len(op_hist) -x-1][0] == op_move)):
                return op_hist[len(op_hist) -x-1][1]
        return my_move

在这种情况下,我将尝试一个简单的机器人来重做对手的最后一局


欢迎来到PPCG!
马丁·恩德

1
def rotate_rock(h1, h2, is_, honest):
 return ("R", "P", "S")[len(h1) % 3]

def rotate_paper(h1, h2, is_, honest):
 return ("P", "S", "R")[len(h1) % 3]

def rotate_scissors(h1, h2, is_, honest):
 return ("S", "R", "P")[len(h1) % 3]

这里的想法是在发挥自我的同时最大化得分,同时在其他阶段仍然可以随机竞争其他不良机器人。


1
单词is是关键字,因此无效。
暴民埃里克(Erik the Outgolfer)'17年

@EriktheOutgolfer谢谢:)
Stephen

1

Dx

我只写了这个机器人,所以我的机器人名字xD可以使人笑脸。

def Dx(ophist, myhist, opmove, mymove):
    from random import choice
    import math
    def honest(hist):
        return [int(x[0]==x[1]) for x in hist]

    def avg(arr):
        if len(arr)==0:
            return 0
        return sum(arr)/float(len(arr))

    def clamp(i, lo, hi):
        return min(hi, max(lo, i))

    def deltas(arr):
        return [a-b for a,b in zip(arr[1:],arr[:-1])]

    def delta_based_prediction(arr,l):
        deltarr = []
        i=0
        while len(arr)<0:
            deltarr[i]=avg(arr[-l:])
            i+=1
            arr = deltas(arr)
        return sum(deltarr)

    next_honesty = delta_based_prediction(honest(ophist),int(math.sqrt(len(ophist))))
    if abs(next_honesty-0.5)<0.1 or not opmove:
        return choice(['R','P','S'])
    next_honesty=int(clamp(round(next_honesty),0,1))
    winner = {'S': 'R', 'R': 'P', 'P': 'S'}

    if next_honesty > 0:
        return winner[opmove]

    return choice([opmove, winner[winner[opmove]]])

1

每个人都说谎

import random

def everybodylies (opphist, myhist, oppmove, mymove):
    if mymove == None:
        return random.choice(["R","P","S"])
    elif mymove == "R": return "S"
    elif mymove == "P": return "R"
    elif mymove == "S": return "P"

这取决于它的举动(“我要玩剪刀!”),并假设对手也在撒谎,并且他们将努力击败我说的我的举动(“嗯,Rock击败了剪刀,所以我在玩那”),但我实际上是在打败那个动作的动作(“纸!惊喜!”)。


3
对我来说,这听起来像是“ 伊卡因粉末”策略的第一阶段:-)“现在,一个聪明的人会将毒药放入自己的酒杯中,因为他知道只有一个大傻瓜才能达到他所得到的目标。我不是一个大傻瓜,所以我显然不能选择您面前的葡萄酒。但是您必须知道我不是一个大傻瓜,您应该指望它,所以我显然不能选择我面前的葡萄酒。 。”
安东尼

1

信任机器人

def trusting_bot(h_opp, h_me, opp, me):
    if opp=="S":
        return "R"
    elif opp=="R":
        return "P"
    else:
        return "S"

总是声称自己会扔剪刀,但是会击败对手。会可靠地绘制自己。


如果始终对自己诚实,这将更加有效。
狮ry

@Gryphon可能是,但是我的Python不够好,想尝试做出类似的事情。
ATaco

没关系。
狮ry

1

机器人名称:Liar Liar

不能停止说谎。

import random

def liarliar (herHistory, myHistory, herMove, myMove):
    options = ["R", "P", "S"]
    if myMove == None:
        return random.choice(options)
    else:
        options.remove(myMove);
        return random.choice(options)

1

RockBot

假设对手是诚实的并试图击败他们,但拒绝打摇滚。

import random
def rockBot(oppHist,myHist,oppMove,myMove):
    if oppMove == None:
        return random.choice(["R","P","S"])
    else:
        if(oppMove == "R"):
            return "P"
        elif(oppMove == "P"):
            return "S"
        elif(myMove != "R"):
            return myMove
        else:
            return random.choice(["P","S"])

1
这似乎是错误的,因为在最后一行,“ P”,“ S”不在方括号(不是列表)内。我在我的版本中对此进行了更改,但是如果您可以在此处进行同样的操作,那就太好了。谢谢。
狮ry

这会不会因不断的剪刀而变得可怕吗?
通配符

@Wildcard是的,但是它对纸机器人很有效
Slepz

1

机器人名称:dontlietome

根据对手在最近10轮中撒谎的次数来确定对手是否在撒谎。根据对手是否撒谎来选择移动。如果确定对手说谎,则播放提示内容。

import random
def dontlietome(opp_moves, my_moves, opp_hint, my_hint):
    def is_trustworthy(moves, length):
        length = max(-length, -len(moves))
        history = [1 if move[0] == move[1] else 0 for move in moves[length:]]
        prob_honest = float(sum(history))/float(len(history))
        choice = random.uniform(0., 1.)
        if choice <= prob_honest:
            return True
        else:
            return False

    moves = ["R", "P", "S"]
    lose_against_map = {"S":"R", "R":"P", "P":"S"}
    length = 10
    if opp_hint == None:
        # Honest round
        return random.choice(moves)
    else:
        # Real round
        if len(opp_moves) < length:
            return my_hint
        if is_trustworthy(opp_moves, length):
            return lose_against_map[opp_hint]
        else:
            return my_hint

在“ if is_trustworthy(opp_moves,self.length):”行中,未定义self。此外,在“ return loss_against_map [opp_hint]”行中,也未定义Loss_against_map。self.length似乎可以通过删除self来解决。但是另一个问题仍然存在。在此之前,恐怕这是无效的。

糟糕,我使用对象编写了此代码,却忘记删除一些自我引用并完全复制了代码。我一到家就将其修复。
coolioasjulio

好。如果只是一个小错误,我会纠正它(就像在其他一些机器人中一样,如果仅仅是自有问题,我会纠正),但是缺少函数却是另外一回事了。
狮phon

@Gryphon我修复了错误。(删除了自己,添加了引用的对象lost_against_map,并修复了if语句,检查是否为诚实回合)
coolioasjulio

0
import random
def trustingRandom(a,b,c,d):
  move = random.choice(["R","P","S"])
  if c == "R":
    move = "P"
  elif c == "P":
    move = "S"
  elif c == "S":
    move = "R"
  return move

0

平均器

def averager(op, mp, od, md):
  import random
  if od == md == None:
    if op == mp == []:
      return random.choice('RPS')
    else:
      opa = [i[1] for i in op]
      copa = [opa.count(i) for i in 'RPS']
      copam = [i for i, j in zip('RPS', copa) if j == max(copa)]
      opd = [i[0] for i in op]
      copd = [opd.count(i) for i in 'RPS']
      copm = [i for i, j in zip('RPS', copd) if j == max(copd) and i in copam]
      return random.choice(copam if copm == [] else copm)
  else:
    if op == mp == []:
      return md
    else:
      hop = sum([1. if i[0] == i[1] else 0. for i in op]) / len(op)
      hmp = sum([1. if i[0] == i[1] else 0. for i in mp]) / len(mp)
      return 'PSR'['RPS'.index(od)] if hmp >= 0.75 and hop >= 0.50 else md

0

比我以前的作品好一点...

def learningbot4(yourlist,mylist,you,me):
  CHECK={"R":{"R":0,"P":1,"S":-1},"P":{"R":-1,"P":0,"S":1},"S":{"R":1,"P":-1,"S":0}}
  results={None:{"R":0,"P":0,"S":0},"R":{"R":0,"P":0,"S":0},"P":{"R":0,"P":0,"S":0},"S":{"R":0,"P":0,"S":0}}
  for i in range(len(yourlist)):
    res=CHECK[yourlist[i][1]][mylist[i][1]]
    if mylist[i][0]==mylist[i][1]: res+=0.5
    results[yourlist[i][0]][mylist[i][1]]+=res
    results[None][mylist[i][0]]+=res
  return max(results[you],key=results[you].get)

0

csbot对类固醇

我认为应该遵循@ user1502040在评论中提出的建议。否则,该机器人将具有我认为不公平的优势。我提交它,以便可以评估它带来的差异。根据建议的随机播种,类固醇会被中和,而bot会等同于csbot,因此只有一个人可以参加比赛。

from random import seed
from csbot import csbot

def csbot_on_steroids(ophist,myhist,opdecl,mydecl):
  seed()
  m = csbot(ophist,myhist,opdecl,mydecl)
  seed(0)
  return m
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.