反复囚徒困境


19

挑战状态:公开

评论,打开公关,或者如果我想念你的机器人,对我大喊大叫。


囚徒困境...有三种选择。疯了吧?

这是我们的收益矩阵。玩家A在左侧,B在顶部

A,B| C | N | D
---|---|---|---
 C |3,3|4,1|0,5
 N |1,4|2,2|3,2
 D |5,0|2,3|1,1

收益矩阵经过精心设计,因此对于两个玩家来说始终是最好的合作方式,但是您可以(通常)选择“中立”或“背叛”来获得收益。

这是一些(竞争性)示例机器人。

# turns out if you don't actually have to implement __init__(). TIL!

class AllC:
    def round(self, _): return "C"
class AllN:
    def round(self, _): return "N"
class AllD:
    def round(self, _): return "D"
class RandomBot:
    def round(self, _): return random.choice(["C", "N", "D"])

# Actually using an identically-behaving "FastGrudger".
class Grudger:
    def __init__(self):
        self.history = []
    def round(self, last):
        if(last):
            self.history.append(last)
            if(self.history.count("D") > 0):
                return "D"
        return "C"

class TitForTat:
    def round(self, last):
        if(last == "D"):
            return "D"
        return "C"

您的机器人是Python3类。为每个游戏创建一个新实例,并round()在每个回合中调用该实例,对手从上一回合中进行选择(如果是第一回合,则为None)

大约一个月内,获胜者可获得50 rep赏金。

细节

  • 每个机器人都在[已编辑]回合中与其他每个机器人(1v1)进行比赛,包括自身。
  • 不允许出现标准漏洞。
  • 不要与班级以外的任何事物或其他卑鄙的恶作剧混为一谈。
  • 您最多可以提交五个漫游器。
  • 是的,您可以实现握手。
  • CN或以外的任何响应D都将被视为静默N
  • 每个机器人在其玩的每场比赛中的得分都将被汇总和比较。

控制者

校验!

其他语言

如果有人需要,我将整理一个API。

成绩:2018-11-27

27 bots, 729 games.

name            | avg. score/round
----------------|-------------------
PatternFinder   | 3.152
DirichletDice2  | 3.019
EvaluaterBot    | 2.971
Ensemble        | 2.800
DirichletDice   | 2.763
Shifting        | 2.737
FastGrudger     | 2.632
Nash2           | 2.574
HistoricAverage | 2.552
LastOptimalBot  | 2.532
Number6         | 2.531
HandshakeBot    | 2.458
OldTitForTat    | 2.411
WeightedAverage | 2.403
TitForTat       | 2.328
AllD            | 2.272
Tetragram       | 2.256
Nash            | 2.193
Jade            | 2.186
Useless         | 2.140
RandomBot       | 2.018
CopyCat         | 1.902
TatForTit       | 1.891
NeverCOOP       | 1.710
AllC            | 1.565
AllN            | 1.446
Kevin           | 1.322

1
机器人如何相互对抗?我从格鲁杰(Grudger)那里得到,总是有两个相互对峙的机器人,而敌人的最后选择就是传递给该机器人。玩了几轮?对于游戏:仅计算结果(即谁赢了)还是计分?
黑猫头鹰Kai

1
如果您与语言无关,或者至少更广泛,则将获得更多条目。您可能有一个包装器python类,该类产生一个进程并将其发送给文本命令以获取文本响应。
Sparr

1
做完了 这在沙盒上呆了一个月!
SIGSTACKFAULT

2
如果将多数main.py中while len(botlist) > 1:botlist.remove(lowest_scoring_bot)在循环的底部,你得到有意思的结果淘汰赛。
Sparr

1
有一天的另一个版本可能会传递整个交互历史,而不仅仅是最后一步。尽管它稍微简化了用户代码,但变化不大。但这会允许扩展,例如随着时间的流逝而澄清的嘈杂的沟通渠道:“真的,是D,即使我连续四次说过C吗?不,我不是说D;您带我什么哦,对不起,我们能忘记那轮吗?”
Scott Sauyet,

Answers:


10

评估机器人

class EvaluaterBot:
    def __init__(self):
        self.c2i = {"C":0, "N":1, "D":2}
        self.i2c = {0:"C", 1:"N", 2:"D"}
        self.history = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
        self.last = [None, None]

    def round(self, last):
        if self.last[0] == None:
            ret = 2
        else:
            # Input the latest enemy action (the reaction to my action 2 rounds ago)
            # into the history
            self.history[self.last[0]][self.c2i[last]] += 1
            # The enemy will react to the last action I did
            prediction,_ = max(enumerate(self.history[self.last[1]]), key=lambda l:l[1])
            ret = (prediction - 1) % 3
        self.last = [self.last[1], ret]
        return self.i2c[ret]

赢得所有先前提交的机器人(可能是随机机器人除外)(但它可能会有优势,因为它在平局中选择了D且D应该是最优的),并且对自己进行了持续的平局。


是的,击败一切。
SIGSTACKFAULT

抓到一点,PatternFinder击败了它。
SIGSTACKFAULT

7

纳什均衡

该机器人在大学上过游戏理论课,但是很懒惰,没有去上那些涉及迭代游戏的课程。所以他只打单场比赛混合纳什均衡。原来1/5 2/5 2/5是回报的混合NE。

class NashEquilibrium:
    def round(self, _):
        a = random.random()
        if a <= 0.2:
            return "C"
        elif a <= 0.6:
            return "N"
        else:
            return "D" 

持续滥用纳什均衡

这个机器人从他懒惰的兄弟那里学了一两个课。他的懒兄弟的问题是他没有利用固定策略。此版本检查对手是否是固定玩家或替身,并相应地进行比赛,否则它将进行常规的纳什均衡。

唯一的缺点是,每回合平均每回合得2.2分。

class NashEquilibrium2:

    def __init__(self):
        self.opphistory = [None, None, None]
        self.titfortatcounter = 0
        self.titfortatflag = 0
        self.mylast = "C"
        self.constantflag = 0
        self.myret = "C"

    def round(self, last):
        self.opphistory.pop(0)
        self.opphistory.append(last)

        # check if its a constant bot, if so exploit
        if self.opphistory.count(self.opphistory[0]) == 3:
            self.constantflag = 1
            if last == "C":
                 self.myret = "D"
            elif last == "N":
                 self.myret = "C"
            elif last == "D":
                 self.myret = "N"

        # check if its a titfortat bot, if so exploit
        # give it 2 chances to see if its titfortat as it might happen randomly
        if self.mylast == "D" and last == "D":
            self.titfortatcounter = self.titfortatcounter + 1

        if self.mylast == "D" and last!= "D":
            self.titfortatcounter = 0

        if self.titfortatcounter >= 3:
            self.titfortatflag = 1

        if self.titfortatflag == 1:
            if last == "C":
                 self.myret = "D"
            elif last == "D":
                 self.myret = "N"    
            elif last == "N":
                # tit for tat doesn't return N, we made a mistake somewhere
                 self.titfortatflag = 0
                 self.titfortatcounter = 0

        # else play the single game nash equilibrium
        if self.constantflag == 0 and self.titfortatflag == 0:
            a = random.random()
            if a <= 0.2:
                self.myret = "C"
            elif a <= 0.6:
                self.myret = "N"
            else:
                self.myret = "D"


        self.mylast = self.myret
        return self.myret

1
NashEquilibrium.round即使不使用它们也需要接受参数,以适合预期的函数原型。

谢谢您解决问题
Ofya

简短一点:class NashEquilibrium: def round(self, _): a = random.random() for k, v in [(0.2, "C"), (0.6, "N"), (1, "D")]: if a <= k: return v
罗伯特·格兰特

7

TatForTit

class TatForTit:
    def round(self, last):
        if(last == "C"):
            return "N"
        return "D"

如果我正确阅读了支付矩阵,该机器人将交替选择DNDN,而TitForTat交替选择CDCD,每轮平均净收益为3点。我认为这可能是对抗TitForTat的最佳选择。显然,检测非TFT对手并采用其他策略可能会得到改进,但是我只是针对原始的赏金。


6

模式查找器

class PatternFinder:
    def __init__(self):
        import collections
        self.size = 10
        self.moves = [None]
        self.other = []
        self.patterns = collections.defaultdict(list)
        self.counter_moves = {"C":"D", "N":"C", "D":"N"}
        self.initial_move = "D"
        self.pattern_length_exponent = 1
        self.pattern_age_exponent = 1
        self.debug = False
    def round(self, last):
        self.other.append(last)
        best_pattern_match = None
        best_pattern_score = None
        best_pattern_response = None
        self.debug and print("match so far:",tuple(zip(self.moves,self.other)))
        for turn in range(max(0,len(self.moves)-self.size),len(self.moves)):
            # record patterns ending with the move that just happened
            pattern_full = tuple(zip(self.moves[turn:],self.other[turn:]))
            if len(pattern_full) > 1:
                pattern_trunc = pattern_full[:-1]
                pattern_trunc_result = pattern_full[-1][1]
                self.patterns[pattern_trunc].append([pattern_trunc_result,len(self.moves)-1])
            if pattern_full in self.patterns:
                # we've seen this pattern at least once before
                self.debug and print("I've seen",pattern_full,"before:",self.patterns[pattern_full])
                for [response,turn_num] in self.patterns[pattern_full]:
                    score = len(pattern_full) ** self.pattern_length_exponent / (len(self.moves) - turn_num) ** self.pattern_age_exponent
                    if best_pattern_score == None or score > best_pattern_score:
                        best_pattern_match = pattern_full
                        best_pattern_score = score
                        best_pattern_response = response
                    # this could be much smarter about aggregating previous responses
        if best_pattern_response:
            move = self.counter_moves[best_pattern_response]
        else:
            # fall back to playing nice
            move = "C"
        self.moves.append(move)
        self.debug and print("I choose",move)
        return move

该机器人寻找最近的游戏状态的先前出现,以查看对手对这些出现的反应,并偏爱更长的图案匹配和更多最近的匹配,然后玩将“击败”对手的预测动作的动作。它可以跟踪所有数据,因此有很大的空间可以变得更智能,但是我花了很多时间来处理它。


有空的时候,介意给她一个优化通行证吗?这很容易浪费时间。
SIGSTACKFAULT

2
@Blacksilver我刚刚将最大花样长度从100减少到了10。如果您运行的是<200发子弹,它应该立即立即运行
Sparr 18'Nov

1
也许使用一个高度综合的数字(即12)会更好?
SIGSTACKFAULT

5

class Jade:
    def __init__(self):
        self.dRate = 0.001
        self.nRate = 0.003

    def round(self, last):
        if last == 'D':
            self.dRate *= 1.1
            self.nRate *= 1.2
        elif last == 'N':
            self.dRate *= 1.03
            self.nRate *= 1.05
        self.dRate = min(self.dRate, 1)
        self.nRate = min(self.nRate, 1)

        x = random.random()
        if x > (1 - self.dRate):
            return 'D'
        elif x > (1 - self.nRate):
            return 'N'
        else:
            return 'C'

一开始乐观,但随着对手拒绝合作而变得越来越苦。许多魔术常数可能需要调整,但是这样做可能不足以证明时间合理。


5

合奏

这运行了相关模型的集合。各个模型考虑不同的历史记录量,并且可以选择始终选择将优化预期支付差异的移动,或者随机选择与预期支付差异成比例的移动。

然后,合奏的每个成员都对他们的首选动作进行投票。他们获得的票数等于他们比对手获胜的票数更多(这意味着糟糕的模型将获得否定票)。然后选择赢得选票的任何举动。

(他们可能应该按照自己的赞成票数对票进行分配,但是我现在不在乎这样做。)

它击败了到目前为止发布的所有内容,除了EvaluaterBot和PatternFinder。(一对一,它击败了EvaluaterBot并输给了PatternFinder)。

from collections import defaultdict
import random
class Number6:
    class Choices:
        def __init__(self, C = 0, N = 0, D = 0):
            self.C = C
            self.N = N
            self.D = D

    def __init__(self, strategy = "maxExpected", markov_order = 3):
        self.MARKOV_ORDER = markov_order;
        self.my_choices = "" 
        self.opponent = defaultdict(lambda: self.Choices())
        self.choice = None # previous choice
        self.payoff = {
            "C": { "C": 3-3, "N": 4-1, "D": 0-5 },
            "N": { "C": 1-4, "N": 2-2, "D": 3-2 },
            "D": { "C": 5-0, "N": 2-3, "D": 1-1 },
        }
        self.total_payoff = 0

        # if random, will choose in proportion to payoff.
        # otherwise, will always choose argmax
        self.strategy = strategy
        # maxExpected: maximize expected relative payoff
        # random: like maxExpected, but it chooses in proportion to E[payoff]
        # argmax: always choose the option that is optimal for expected opponent choice

    def update_opponent_model(self, last):
        for i in range(0, self.MARKOV_ORDER):
            hist = self.my_choices[i:]
            self.opponent[hist].C += ("C" == last)
            self.opponent[hist].N += ("N" == last)
            self.opponent[hist].D += ("D" == last)

    def normalize(self, counts):
        sum = float(counts.C + counts.N + counts.D)
        if 0 == sum:
            return self.Choices(1.0 / 3.0, 1.0 / 3.0, 1.0 / 3.0)
        return self.Choices(
            counts.C / sum, counts.N / sum, counts.D / sum)

    def get_distribution(self):
        for i in range(0, self.MARKOV_ORDER):
            hist = self.my_choices[i:]
            #print "check hist = " + hist
            if hist in self.opponent:
                return self.normalize(self.opponent[hist])

        return self.Choices(1.0 / 3.0, 1.0 / 3.0, 1.0 / 3.0)

    def choose(self, dist):
        payoff = self.Choices()
        # We're interested in *beating the opponent*, not
        # maximizing our score, so we optimize the difference
        payoff.C = (3-3) * dist.C + (4-1) * dist.N + (0-5) * dist.D
        payoff.N = (1-4) * dist.C + (2-2) * dist.N + (3-2) * dist.D
        payoff.D = (5-0) * dist.C + (2-3) * dist.N + (1-1) * dist.D

        # D has slightly better payoff on uniform opponent,
        # so we select it on ties
        if self.strategy == "maxExpected":
            if payoff.C > payoff.N:
                return "C" if payoff.C > payoff.D else "D"
            return "N" if payoff.N > payoff.D else "D"
        elif self.strategy == "randomize":
            payoff = self.normalize(payoff)
            r = random.uniform(0.0, 1.0)
            if (r < payoff.C): return "C"
            return "N" if (r < payoff.N) else "D"
        elif self.strategy == "argMax":
            if dist.C > dist.N:
                return "D" if dist.C > dist.D else "N"
            return "C" if dist.N > dist.D else "N"

        assert(0) #, "I am not a number! I am a free man!")

    def update_history(self):
        self.my_choices += self.choice
        if len(self.my_choices) > self.MARKOV_ORDER:
            assert(len(self.my_choices) == self.MARKOV_ORDER + 1)
            self.my_choices = self.my_choices[1:]

    def round(self, last):
        if last: self.update_opponent_model(last)

        dist = self.get_distribution()
        self.choice = self.choose(dist)
        self.update_history()
        return self.choice

class Ensemble:
    def __init__(self):
        self.models = []
        self.votes = []
        self.prev_choice = []
        for order in range(0, 6):
            self.models.append(Number6("maxExpected", order))
            self.models.append(Number6("randomize", order))
            #self.models.append(Number6("argMax", order))
        for i in range(0, len(self.models)):
            self.votes.append(0)
            self.prev_choice.append("D")

        self.payoff = {
            "C": { "C": 3-3, "N": 4-1, "D": 0-5 },
            "N": { "C": 1-4, "N": 2-2, "D": 3-2 },
            "D": { "C": 5-0, "N": 2-3, "D": 1-1 },
        }

    def round(self, last):
        if last:
            for i in range(0, len(self.models)):
                self.votes[i] += self.payoff[self.prev_choice[i]][last]

        # vote. Sufficiently terrible models get negative votes
        C = 0
        N = 0
        D = 0
        for i in range(0, len(self.models)):
            choice = self.models[i].round(last)
            if "C" == choice: C += self.votes[i]
            if "N" == choice: N += self.votes[i]
            if "D" == choice: D += self.votes[i]
            self.prev_choice[i] = choice

        if C > D and C > N: return "C"
        elif N > D: return "N"
        else: return "D"

测试框架

万一其他人觉得它有用,这里是一个测试框架,用于查看各个对决。Python2。只需将所有您感兴趣的对手放入“ expanders.py”,然后将对“ Ensemble”的引用更改为您自己的即可。

import sys, inspect
import opponents
from ensemble import Ensemble

def count_payoff(label, them):
    if None == them: return
    me = choices[label]
    payoff = {
        "C": { "C": 3-3, "N": 4-1, "D": 0-5 },
        "N": { "C": 1-4, "N": 2-2, "D": 3-2 },
        "D": { "C": 5-0, "N": 2-3, "D": 1-1 },
    }
    if label not in total_payoff: total_payoff[label] = 0
    total_payoff[label] += payoff[me][them]

def update_hist(label, choice):
    choices[label] = choice

opponents = [ x[1] for x 
    in inspect.getmembers(sys.modules['opponents'], inspect.isclass)]

for k in opponents:
    total_payoff = {}

    for j in range(0, 100):
        A = Ensemble()
        B = k()
        choices = {}

        aChoice = None
        bChoice = None
        for i in range(0, 100):
            count_payoff(A.__class__.__name__, bChoice)
            a = A.round(bChoice)
            update_hist(A.__class__.__name__, a)

            count_payoff(B.__class__.__name__, aChoice)
            b = B.round(aChoice)
            update_hist(B.__class__.__name__, b)

            aChoice = a
            bChoice = b
    print total_payoff

控制器已准备就绪,您无需执行所有操作……
SIGSTACKFAULT

1
@Blacksilver我意识到,只是因为我是即将提交。但是,该版本适用于3.6之前的版本,并提供了有关单个对位的信息,可以帮助您识别薄弱环节,因此这并不是在浪费时间。

很公平; 现在运行。我可能会向控制器添加选项以执行类似的操作。
SIGSTACKFAULT

我很荣幸:“它击败了迄今为止发布的所有作品,除了Ensemble和PatternFinder之外):)
Sparr,

@Sparr糟糕。那应该说是EvaluaterBot和PatternFinder。但这是在将总分与整个领域进行比较时。PatternFinder仍然是唯一在直接比赛中胜过它的人。

4

老山雀

老派球员懒得为新规则而更新。

class OldTitForTat:
    def round(self, last):
        if(last == None)
            return "C"
        if(last == "C"):
            return "C"
        return "D"

3

NeverCOOP

class NeverCOOP:
    def round(self, last):
        try:
            if last in "ND":
                return "D"
            else:
                return "N"
        except:
            return "N"

如果对方机器人有缺陷或中立,请选择缺陷。否则,如果这是第一回合或对方机器人配合,请选择中立。我不确定这能起到多大的作用...


尝试/除了什么?
SIGSTACKFAULT

1
@Blacksilver我认为它的功能与if(last):您的Grudger机器人中的相同,以检测是否存在上一轮。
ETHproductions

嗯,我明白了。None in "ND"错误。
SIGSTACKFAULT

因为if last and last in "ND":太复杂了?
user253751

3

LastOptimalBot

class LastOptimalBot:
    def round(self, last):
        return "N" if last == "D" else ("D" if last == "C" else "C")

假定对方机器人将始终再次执行相同的动作,并选择对它最有收益的机器人。

平均值:

Me   Opp
2.6  2    vs TitForTat
5    0    vs AllC
4    1    vs AllN
3    2    vs AllD
3.5  3.5  vs Random
3    2    vs Grudger
2    2    vs LastOptimalBot
1    3.5  vs TatForTit
4    1    vs NeverCOOP
1    4    vs EvaluaterBot
2.28 2.24 vs NashEquilibrium

2.91 average overall

钱币。也许T4T会做得更好return last
SIGSTACKFAULT

我想要那个!如果是TitForTat return last,那么LOB将在6轮中获得18-9,而不是目前获得的5-10中13-10。我认为这样很好-不用担心优化示例机器人。
Spitemaster

return last我认为,应对这一挑战将是更好的T4T
Sparr

刚刚尝试过- if(last): return last; else: return "C"更糟。
SIGSTACKFAULT

是的,但是正如@Sparr所说,它可能更合适。我想取决于你。
Spitemaster

3

山寨

class CopyCat:
    def round(self, last):
        if last:
            return last
        return "C"

复制对手的最后一步。
我不希望这能很好地完成,但是还没有人实现这种经典。


2

改良的狄利克雷骰子

import random

class DirichletDice2:
    def __init__(self):

        self.alpha = dict(
                C = {'C' : 1, 'N' : 1, 'D' : 1},
                N = {'C' : 1, 'N' : 1, 'D' : 1},
                D = {'C' : 1, 'N' : 1, 'D' : 1}
        )
        self.myLast = [None, None]
        self.payoff = dict(
                C = { "C": 0, "N": 3, "D": -5 },
                N = { "C": -3, "N": 0, "D": 1 },
                D = { "C": 5, "N": -1, "D": 0 }
        )

    def DirichletDraw(self, key):
        alpha = self.alpha[key].values()
        mu = [random.gammavariate(a,1) for a in alpha]
        mu = [m / sum(mu) for m in mu]
        return mu

    def ExpectedPayoff(self, probs):
        expectedPayoff = {}
        for val in ['C','N','D']:
            payoff = sum([p * v for p,v in zip(probs, self.payoff[val].values())])
            expectedPayoff[val] = payoff
        return expectedPayoff

    def round(self, last):
        if last is None:
            self.myLast[0] = 'D'
            return 'D'

        #update dice corresponding to opponent's last response to my
        #outcome two turns ago
        if self.myLast[1] is not None:
            self.alpha[self.myLast[1]][last] += 1

        #draw probs for my opponent's roll from Dirichlet distribution and then return the optimal response
        mu = self.DirichletDraw(self.myLast[0])
        expectedPayoff = self.ExpectedPayoff(mu)
        res = max(expectedPayoff, key=expectedPayoff.get)

        #update myLast
        self.myLast[1] = self.myLast[0]
        self.myLast[0] = res

        return res    

这是Dirichlet Dice的改进版本。它不是从Dirichlet分布中获取期望的多项式分布,而是从Dirichlet分布中随机抽取了一个多项式分布。然后,与其从多项式中提取并给出最佳响应,不如使用点对给定的多项式给出最佳预期响应。因此,随机性已从多项式平局转移到Dirichlet平局。此外,现在的先验条件更加平缓,以鼓励探索。

之所以被“改进”,是因为它现在通过针对概率给出最佳期望值来说明积分系统,同时通过绘制概率本身来保持其随机性。以前,我只是尝试从预期的概率中获得最佳的预期收益,但这做得不好,因为它只是卡住了,并且没有探索足够的机会来更新其骰子。而且它更可预测和可利用。


原始提交:

Dirichlet骰子

import random

class DirichletDice:
    def __init__(self):

        self.alpha = dict(
                C = {'C' : 2, 'N' : 3, 'D' : 1},
                N = {'C' : 1, 'N' : 2, 'D' : 3},
                D = {'C' : 3, 'N' : 1, 'D' : 2}
        )

        self.Response = {'C' : 'D', 'N' : 'C', 'D' : 'N'}
        self.myLast = [None, None]

    #expected value of the dirichlet distribution given by Alpha
    def MultinomialDraw(self, key):
        alpha = list(self.alpha[key].values())
        probs = [x / sum(alpha) for x in alpha]
        outcome = random.choices(['C','N','D'], weights=probs)[0]
        return outcome

    def round(self, last):
        if last is None:
            self.myLast[0] = 'D'
            return 'D'

        #update dice corresponding to opponent's last response to my
        #outcome two turns ago
        if self.myLast[1] is not None:
            self.alpha[self.myLast[1]][last] += 1

        #predict opponent's move based on my last move
        predict = self.MultinomialDraw(self.myLast[0])
        res = self.Response[predict]

        #update myLast
        self.myLast[1] = self.myLast[0]
        self.myLast[0] = res

        return res

基本上,我假设对手对我最后一个输出的响应是一个多项式变量(加权骰子),每个输出对应一个骰子,因此有一个骰子代表“ C”,一个骰子代表“ N”,一个骰子代表“ D” 。因此,例如,如果我的上一掷骰是“ N”,那么我会掷出“ N-骰子”,以猜测他们对我的“ N”的反应。我先从Dirichlet开始,假设我的对手有点“聪明”(在我的上一掷中更有可能获得最高收益的对手,而在收益最差的那个对手中的可能性最小)。我从适当的Dirichlet先验生成“期望的”多项式分布(这是在其骰子权重上的概率分布的期望值)。我滚动最后输出的加权骰子,

从第三轮开始,在对手对我两轮前的比赛做出最后反应之前,我对适当的Dirichlet进行了贝叶斯更新。我正在尝试迭代地学习他们的骰子权重。

一旦生成骰子,我也可以简单地选择具有最佳“预期”结果的响应,而不是简单地掷骰子并响应结果。但是,我想保持随机性,以使我的机器人更不会受到那些试图预测模式的机器人的攻击。


2

凯文

class Kevin:
    def round(self, last):      
        return {"C":"N","N":"D","D":"C",None:"N"} [last]

选择最差的选择。最糟糕的机器人。

无用

import random

class Useless:
    def __init__(self):
        self.lastLast = None

    def round(self, last):
        tempLastLast = self.lastLast
        self.lastLast = last

        if(last == "D" and tempLastLast == "N"):
            return "C"
        if(last == "D" and tempLastLast == "C"):
            return "N"

        if(last == "N" and tempLastLast == "D"):
            return "C"
        if(last == "N" and tempLastLast == "C"):
            return "D"

        if(last == "C" and tempLastLast == "D"):
            return "N"
        if(last == "C" and tempLastLast == "N"):
            return "D"

        return random.choice("CND")

它查看对手执行的最后两个动作,并选择未完成的最多动作,否则将随机选择。这样做可能是更好的方法。


2

历史平均

class HistoricAverage:
    PAYOFFS = {
        "C":{"C":3,"N":1,"D":5},
        "N":{"C":4,"N":2,"D":2},
        "D":{"C":0,"N":3,"D":1}}
    def __init__(self):
        self.payoffsum = {"C":0, "N":0, "D":0}
    def round(this, last):
        if(last != None):
            for x in this.payoffsum:
               this.payoffsum[x] += HistoricAverage.PAYOFFS[last][x]
        return max(this.payoffsum, key=this.payoffsum.get)

查看历史记录,找出平均而言最好的操作。开始合作。


如果不重新计算每轮平均值,则运行速度可能会更快。
Sparr

@Sparr是的。我对其进行了编辑,现在就可以了。
MegaTom

1

加权平均

class WeightedAverageBot:
  def __init__(self):
    self.C_bias = 1/4
    self.N = self.C_bias
    self.D = self.C_bias
    self.prev_weight = 1/2
  def round(self, last):
    if last:
      if last == "C" or last == "N":
        self.D *= self.prev_weight
      if last == "C" or last == "D":
        self.N *= self.prev_weight
      if last == "N":
        self.N = 1 - ((1 - self.N) * self.prev_weight)
      if last == "D":
        self.D = 1 - ((1 - self.D) * self.prev_weight)
    if self.N <= self.C_bias and self.D <= self.C_bias:
      return "D"
    if self.N > self.D:
      return "C"
    return "N"

对手的行为被建模为直角三角形,其CND的角分别为0,0 0,1 1,0。每个对手的移动都会使该三角形内的点移向该角,然后我们努力击败该点所指示的移动(C被给予一个可调的三角形小切片)。从理论上讲,我希望它具有更长的存储空间,并且比以前的动作具有更多的权重,但是在实践中,当前的元数据偏向于快速变化的机器人,因此,这演变成针对大多数敌人的LastOptimalBot近似值。后代发贴;也许有人会受到启发。


1

四卦

import itertools

class Tetragram:
    def __init__(self):
        self.history = {x: ['C'] for x in itertools.product('CND', repeat=4)}
        self.theirs = []
        self.previous = None

    def round(self, last):
        if self.previous is not None and len(self.previous) == 4:
            self.history[self.previous].append(last)
        if last is not None:
            self.theirs = (self.theirs + [last])[-3:]

        if self.previous is not None and len(self.previous) == 4:
            expected = random.choice(self.history[self.previous])
            if expected == 'C':
                choice = 'C'
            elif expected == 'N':
                choice = 'C'
            else:
                choice = 'N'
        else:
            choice = 'C'

        self.previous = tuple(self.theirs + [choice])
        return choice

假设对方也在注视着我们的最后一招,请尝试寻找对手的举动。


1

握手

class HandshakeBot:
  def __init__(self):
    self.handshake_length = 4
    self.handshake = ["N","N","C","D"]
    while len(self.handshake) < self.handshake_length:
      self.handshake *= 2
    self.handshake = self.handshake[:self.handshake_length]
    self.opp_hand = []
    self.friendly = None
  def round(self, last):
    if last:
      if self.friendly == None:
        # still trying to handshake
        self.opp_hand.append(last)
        if self.opp_hand[-1] != self.handshake[len(self.opp_hand)-1]:
          self.friendly = False
          return "D"
        if len(self.opp_hand) == len(self.handshake):
          self.friendly = True
          return "C"
        return self.handshake[len(self.opp_hand)]
      elif self.friendly == True:
        # successful handshake and continued cooperation
        if last == "C":
          return "C"
        self.friendly = False
        return "D"
      else:
        # failed handshake or abandoned cooperation
        return "N" if last == "D" else ("D" if last == "C" else "C")
    return self.handshake[0]

识别自己何时与自己对抗,然后合作。否则,它模仿LastOptimalBot,这似乎是最好的单行策略。表现比LastOptimalBot差,其数量与回合数成反比。显然,如果在*咳嗽**眨眼*字段中有更多副本,这样做会更好。


只需提交一些具有不同非握手行为的克隆。
SIGSTACKFAULT

看来是剥削性的。对于此处表示的每个简单行为,我都可以提交一个这样的克隆。
Sparr

我添加了一条额外的条款,说您最多只能提交五个机器人。
SIGSTACKFAULT

1

ShiftingOptimalBot

class ShiftingOptimalBot:
    def __init__(self):
        # wins, draws, losses
        self.history = [0,0,0]
        self.lastMove = None
        self.state = 0
    def round(self, last):
        if last == None:
            self.lastMove = "C"
            return self.lastMove
        if last == self.lastMove:
            self.history[1] += 1
        elif (last == "C" and self.lastMove == "D") or (last == "D" and self.lastMove == "N") or (last == "N" and self.lastMove == "C"):
            self.history[0] += 1
        else:
            self.history[2] += 1

        if self.history[0] + 1 < self.history[2] or self.history[2] > 5:
            self.state = (self.state + 1) % 3
            self.history = [0,0,0]
        if self.history[1] > self.history[0] + self.history[2] + 2:
            self.state = (self.state + 2) % 3
            self.history = [0,0,0]

        if self.state == 0:
            self.lastMove = "N" if last == "D" else ("D" if last == "C" else "C")
        elif self.state == 1:
            self.lastMove = last
        else:
            self.lastMove = "C" if last == "D" else ("N" if last == "C" else "D")
        return self.lastMove

只要获胜,该机器人就会使用LastOptimalBot的算法。但是,如果另一个机器人开始对其进行预测,它将开始播放其对手最后玩过的任何举动(此举胜过击败LastOptimalBot的举动)。只要它继续丢失(或由于画很多而感到无聊),它就会循环通过这些算法的简单换位。

老实说,我在发布此文章时对LastOptimalBot排名第五感到惊讶。我相当确定这会做得更好,假设我正确编写了此python。


0

握手模式匹配

from .patternfinder import PatternFinder
import collections

class HandshakePatternMatch:
    def __init__(self):
        self.moves = [None]
        self.other = []
        self.handshake = [None,"N","C","C","D","N"]
        self.friendly = None
        self.pattern = PatternFinder()
    def round(self, last):
        self.other.append(last)
        if last:
            if len(self.other) < len(self.handshake):
                # still trying to handshake
                if self.friendly == False or self.other[-1] != self.handshake[-1]:
                    self.friendly = False
                else:
                    self.friendly = True
                move = self.handshake[len(self.other)]
                self.pattern.round(last)
            elif self.friendly == True:
                # successful handshake and continued cooperation
                move = self.pattern.round(last)
                if last == "C":
                    move = "C"
                elif last == self.handshake[-1] and self.moves[-1] == self.handshake[-1]:
                    move = "C"
                else:
                    self.friendly = False
            else:
                # failed handshake or abandoned cooperation
                move = self.pattern.round(last)
        else:
            move = self.handshake[1]
            self.pattern.round(last)
        self.moves.append(move)
        return move

为什么图案匹配自己?握手与合作走了。


import PatternFinder在我的书中作弊。
SIGSTACKFAULT

@Blacksilver它在KOTH中一直都在完成。这与将代码复制到现有答案中并使用它没什么不同。机器人轮盘赌:高风险的机器人赌博到处发生,以至于机器人可以检测到对方是否正在调用自己的代码并破坏回报。
Draco18s

那好吧 瓷砖。
SIGSTACKFAULT

我明天做仰卧起坐。
SIGSTACKFAULT

是使用其他机器人代码的完美示例。通常归结为“那个家伙计算出一些棘手的数学,我希望在这种情况下得到他的结果。” (我自己的文章取得了很好的效果; UpYours的方法更加分散)。
Draco18s

0

硬编码

class Hardcoded:
    sequence = "DNCNNDDCNDDDCCDNNNNDDCNNDDCDCNNNDNDDCNNDDNDDCDNCCNNDNNDDCNNDDCDCNNNDNCDNDNDDNCNDDCDNNDCNNDDCDCNNDNNDDCDNDDCCNNNDNNDDCNNDDNDCDNCNDDCDNNDDCCNDNNDDCNNNDCDNDDCNNNNDNDDCDNCDCNNDNNDDCDNDDCCNNNDNDDCNNNDNDCDCDNNDCNNDNDDCDNCNNDDCNDNNDDCDNNDCDNDNCDDCNNNDNDNCNDDCDNDDCCNNNNDNDDCNNDDCNNDDCDCNNDNNDDCDNDDCCNDNNDDCNNNDCDNNDNDDCCNNNDNDDNCDCDNNDCNNDNDDCNNDDCDNCNNDDCDNNDCDNDNCDDCNDNNDDCNNNDDCDNCNNDNNDDCNNDDNNDCDNCNDDCNNDCDNNDDCNNDDNCDCNNDNDNDDCDNCDCNNNDNDDCDCNNDNNDDCDNDDCCNNNDNNDDCNDNDNCDDCDCNNNNDNDDCDNCNDDCDNNDDCNNNDNDDCDNCNNDCNNDNDDNCDCDNNNDDCNNDDCNNDDNNDCDNCNDDCNNDDNDCDNNDNDDCCNCDNNDCNNDDNDDCNCDNNDCDNNNDDCNNDDCDCDNNDDCNDNCNNDNNDNDNDDCDNCDCNNNDNDDCDNCNNDDCDNNDCNNDDCNNDDCDCDNNDDCNDNCNNNDDCDNNDCDNDNCNNDNDDNNDNDCDDCCNNNDDCNDNDNCDDCDCNNNDNNDDCNDCDNDDCNNNNDNDDCCNDNNDDCDCNNNDNDDNDDCDNCCNNDNNDDCNNDDCDCNNDNNDDCNNDDNCNDDNNDCDNCNDDCNNDDNDCDNNDNDDCCNCDNNDCNNDNDDCNNDDNCDCDNNDCNNDNDDCDCDNNNNDDCNNDDNDCCNNDDNDDCNCDNNDCNNDDNDDCDNCNDDCNNNNDCDNNDDCNDNDDCDNCNNDCDNNDCNNDNDDNCDCNNDNDDCDNDDCCNNNNDNDDCNNDDCDCNNDNNDDCDCDNNDDC"
    def __init__(self):
        self.round_num = -1
    def round(self,_):
        self.round_num += 1
        return Hardcoded.sequence[self.round_num % 1000]

只需播放一系列经过优化的硬编码动作即可击败某些顶级确定性机器人。

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.