最快的python代码,可在此游戏中找到一组获胜的单词


14

这是一组针对儿童的活动卡片中的文字游戏。规则下方是使用/ usr / share / dict / words查找最佳三元组的代码。我认为这是一个有趣的优化问题,想知道人们是否可以找到改进的地方。

规则

  1. 从下面的每组中选择一个字母。
  2. 使用所选字母(和其他字母)选择一个单词。
  3. 得分。
    • 所选集合中的每个字母都会获得该集合中显示的数字(包括重复项)。
    • AEIOU 数0
    • 其他所有字母均为-2
  4. 重复两次以上的步骤1-3(不要在步骤1中重复使用字母)。
  5. 最终分数是三个单词分数的总和。

套装

(设置1分1分,设置2分2分,依此类推)

  1. LTN
  2. RDS
  3. GBM
  4. 热电联产
  5. FWV
  6. YKJ
  7. QXZ

码:

from itertools import permutations
import numpy as np

points = {'LTN' : 1,
          'RDS' : 2,
          'GBM' : 3,
          'CHP' : 4,
          'FWV' : 5,
          'YKJ' : 6,
          'QXZ' : 7}

def tonum(word):
    word_array = np.zeros(26, dtype=np.int)
    for l in word:
        word_array[ord(l) - ord('A')] += 1
    return word_array.reshape((26, 1))

def to_score_array(letters):
    score_array = np.zeros(26, dtype=np.int) - 2
    for v in 'AEIOU':
        score_array[ord(v) - ord('A')] = 0
    for idx, l in enumerate(letters):
        score_array[ord(l) - ord('A')] = idx + 1
    return np.matrix(score_array.reshape(1, 26))

def find_best_words():
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    wlist = [l for l in wlist if len(l) > 4]
    orig = [l for l in wlist]
    for rep in 'AEIOU':
        wlist = [l.replace(rep, '') for l in wlist]
    wlist = np.hstack([tonum(w) for w in wlist])

    best = 0
    ct = 0
    bestwords = ()
    for c1 in ['LTN']:
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                vals = [to_score_array(''.join(s)) for s in zip(c1, c2, c3, c4, c5, c6, c7)]
                                ct += 1
                                print ct, 6**6
                                scores1 = (vals[0] * wlist).A.flatten()
                                scores2 = (vals[1] * wlist).A.flatten()
                                scores3 = (vals[2] * wlist).A.flatten()
                                m1 = max(scores1)
                                m2 = max(scores2)
                                m3 = max(scores3)
                                if m1 + m2 + m3 > best:
                                    print orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()], m1 + m2 + m3
                                    best = m1 + m2 + m3
                                    bestwords = (orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()])
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

矩阵版本是我在用纯python(使用字典并分别给每个单词评分),在numpy中使用索引而不是矩阵乘法编写一个之后得出的。

下一个优化是从评分中完全删除元音(并使用修改的ord()函数),但是我想知道是否还有更快的方法。

编辑:添加timeit.timeit代码

编辑:我要添加一个赏金,我将给予我最喜欢的任何改进(或可能有多个答案,但如果是这样的话,我将不得不赢得更多的声誉)。


3
顺便说一句,我写了代码给我八岁的孩子三个单词,以纪念他和母亲玩游戏时的记忆。现在我知道什么是木刻描记术。

2
这是一个有趣的问题。我认为,如果您提供以下信息,您可能会更有可能获得答复:(1)链接到在线单词列表的链接,这样每个人都可以使用相同的数据集。(2)将您的解决方案放在一个函数中。(3)使用time-it模块运行该功能以显示计时。(4)确保将字典数据的加载置于函数之外,以便我们不测试磁盘速度。然后,人们可以将现有代码用作比较其解决方案的框架。

我将重写使用timeit,但是为了进行公平的比较,我必须使用自己的计算机(我很乐意为发布解决方案的人们做这件事)。单词列表在大多数系统上应该可用,但如果没有,这里有几个:wordlist.sourceforge.net

1
如果每个用户在自己的计算机上将您的解决方案和任何其他发布的解决方案与自己的时间进行比较,则可以进行公平的比较。跨平台会有一些差异,但是通常此方法有效。

1
嗯,在那种情况下,我想知道这是否是正确的网站。我认为SO是最合适的。
乔伊

Answers:


3

使用Keith预先计算每个单词的最佳分数的想法,我设法将计算机上的执行时间减少到大约0.7秒(使用75,288个单词的列表)。

诀窍是遍历要播放的单词组合,而不是挑选的所有字母组合。我们可以忽略几乎所有的单词组合(在我的单词列表中为203个),因为它们无法获得比我们已经发现的更高的分数。几乎所有执行时间都花在了预先计算单词分数上。

Python 2.7:

import collections
import itertools


WORDS_SOURCE = '../word lists/wordsinf.txt'

WORDS_PER_ROUND = 3
LETTER_GROUP_STRS = ['LTN', 'RDS', 'GBM', 'CHP', 'FWV', 'YKJ', 'QXZ']
LETTER_GROUPS = [list(group) for group in LETTER_GROUP_STRS]
GROUP_POINTS = [(group, i+1) for i, group in enumerate(LETTER_GROUPS)]
POINTS_IF_NOT_CHOSEN = -2


def best_word_score(word):
    """Return the best possible score for a given word."""

    word_score = 0

    # Score the letters that are in groups, chosing the best letter for each
    # group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts_sum = 0
        max_letter_count = 0
        for letter in group:
            if letter in word:
                count = word.count(letter)
                letter_counts_sum += count
                if count > max_letter_count:
                    max_letter_count = count
        if letter_counts_sum:
            word_score += points_if_chosen * max_letter_count
            total_not_chosen += letter_counts_sum - max_letter_count
    word_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return word_score

def best_total_score(words):
    """Return the best score possible for a given list of words.

    It is fine if the number of words provided is not WORDS_PER_ROUND. Only the
    words provided are scored."""

    num_words = len(words)
    total_score = 0

    # Score the letters that are in groups, chosing the best permutation of
    # letters for each group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts = []
        # Structure:  letter_counts[word_index][letter] = count
        letter_counts_sum = 0
        for word in words:
            this_word_letter_counts = {}
            for letter in group:
                count = word.count(letter)
                this_word_letter_counts[letter] = count
                letter_counts_sum += count
            letter_counts.append(this_word_letter_counts)

        max_chosen = None
        for letters in itertools.permutations(group, num_words):
            num_chosen = 0
            for word_index, letter in enumerate(letters):
                num_chosen += letter_counts[word_index][letter]
            if num_chosen > max_chosen:
                max_chosen = num_chosen

        total_score += points_if_chosen * max_chosen
        total_not_chosen += letter_counts_sum - max_chosen
    total_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return total_score


def get_words():
    """Return the list of valid words."""
    with open(WORDS_SOURCE, 'r') as source:
        return [line.rstrip().upper() for line in source]

def get_words_by_score():
    """Return a dictionary mapping each score to a list of words.

    The key is the best possible score for each word in the corresponding
    list."""

    words = get_words()
    words_by_score = collections.defaultdict(list)
    for word in words:
        words_by_score[best_word_score(word)].append(word)
    return words_by_score


def get_winning_words():
    """Return a list of words for an optimal play."""

    # A word's position is a tuple of its score's index and the index of the
    # word within the list of words with this score.
    # 
    # word played: A word in the context of a combination of words to be played
    # word chosen: A word in the context of the list it was picked from

    words_by_score = get_words_by_score()
    num_word_scores = len(words_by_score)
    word_scores = sorted(words_by_score, reverse=True)
    words_by_position = []
    # Structure:  words_by_position[score_index][word_index] = word
    num_words_for_scores = []
    for score in word_scores:
        words = words_by_score[score]
        words_by_position.append(words)
        num_words_for_scores.append(len(words))

    # Go through the combinations of words in lexicographic order by word
    # position to find the best combination.
    best_score = None
    positions = [(0, 0)] * WORDS_PER_ROUND
    words = [words_by_position[0][0]] * WORDS_PER_ROUND
    scores_before_words = []
    for i in xrange(WORDS_PER_ROUND):
        scores_before_words.append(best_total_score(words[:i]))
    while True:
        # Keep track of the best possible combination of words so far.
        score = best_total_score(words)
        if score > best_score:
            best_score = score
            best_words = words[:]

        # Go to the next combination of words that could get a new best score.
        for word_played_index in reversed(xrange(WORDS_PER_ROUND)):
            # Go to the next valid word position.
            score_index, word_chosen_index = positions[word_played_index]
            word_chosen_index += 1
            if word_chosen_index == num_words_for_scores[score_index]:
                score_index += 1
                if score_index == num_word_scores:
                    continue
                word_chosen_index = 0

            # Check whether the new combination of words could possibly get a
            # new best score.
            num_words_changed = WORDS_PER_ROUND - word_played_index
            score_before_this_word = scores_before_words[word_played_index]
            further_points_limit = word_scores[score_index] * num_words_changed
            score_limit = score_before_this_word + further_points_limit
            if score_limit <= best_score:
                continue

            # Update to the new combination of words.
            position = score_index, word_chosen_index
            positions[word_played_index:] = [position] * num_words_changed
            word = words_by_position[score_index][word_chosen_index]
            words[word_played_index:] = [word] * num_words_changed
            for i in xrange(word_played_index+1, WORDS_PER_ROUND):
                scores_before_words[i] = best_total_score(words[:i])
            break
        else:
            # None of the remaining combinations of words can get a new best
            # score.
            break

    return best_words


def main():
    winning_words = get_winning_words()
    print winning_words
    print best_total_score(winning_words)

if __name__ == '__main__':
    main()

这将返回['KNICKKNACK', 'RAZZMATAZZ', 'POLYSYLLABLES']得分为95的解决方案。将Keith解决方案中的单词添加到单词列表后,我得到的结果与他相同。加上thouis的“ xylopyrography”,我得到['XYLOPYROGRAPHY', 'KNICKKNACKS', 'RAZZMATAZZ']了105分。


5

这是一个主意-您可以避免注意到大多数单词的得分都太高而避免检查很多单词。假设您找到了一个不错的得分手,获得了50分。然后,任何得分超过50分的游戏必须至少拥有ceil(51/3)= 17分的单词。因此,任何不可能产生17点的单词都可以忽略。

这是完成上述操作的一些代码。我们为字典中的每个单词计算可能的最佳分数,并将其存储在按分数索引的数组中。然后,我们使用该数组仅检查具有最低分数的单词。

from itertools import permutations
import time

S={'A':0,'E':0,'I':0,'O':0,'U':0,
   'L':1,'T':1,'N':1,
   'R':2,'D':2,'S':2,
   'G':3,'B':3,'M':3,
   'C':4,'H':4,'P':4,
   'F':5,'W':5,'V':5,
   'Y':6,'K':6,'J':6,
   'Q':7,'X':7,'Z':7,
   }

def best_word(min, s):
    global score_to_words
    best_score = 0
    best_word = ''
    for i in xrange(min, 100):
        for w in score_to_words[i]:
            score = (-2*len(w)+2*(w.count('A')+w.count('E')+w.count('I')+w.count('O')+w.count('U')) +
                      3*w.count(s[0])+4*w.count(s[1])+5*w.count(s[2])+6*w.count(s[3])+7*w.count(s[4])+
                      8*w.count(s[5])+9*w.count(s[6]))
            if score > best_score:
                best_score = score
                best_word = w
    return (best_score, best_word)

def load_words():
    global score_to_words
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    score_to_words = [[] for i in xrange(100)]
    for w in wlist: score_to_words[sum(S[c] for c in w)].append(w)
    for i in xrange(100):
        if score_to_words[i]: print i, len(score_to_words[i])

def find_best_words():
    load_words()
    best = 0
    bestwords = ()
    for c1 in permutations('LTN'):
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
            print time.ctime(),c1,c2,c3
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                sets = zip(c1, c2, c3, c4, c5, c6, c7)
                                (s1, w1) = best_word((best + 3) / 3, sets[0])
                                (s2, w2) = best_word((best - s1 + 2) / 2, sets[1])
                                (s3, w3) = best_word(best - s1 - s2 + 1, sets[2])
                                score = s1 + s2 + s3
                                if score > best:
                                    best = score
                                    bestwords = (w1, w2, w3)
                                    print score, w1, w2, w3
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

最低分数迅速上升到100,这意味着我们只需要考虑33个以上的单词,这占总数的很小一部分(我/usr/share/dict/words有208662个有效单词,其中只有1723个是33个以上的积分= 0.8%)。在我的计算机上运行大约半小时,并生成:

(('MAXILLOPREMAXILLARY', 'KNICKKNACKED', 'ZIGZAGWISE'), 101)

真好 我将其添加到矩阵解决方案中(当单词的分数下降太低时删除单词),但这比我提出的任何纯python解决方案都要好得多。
thouis 2011年

1
我不确定我以前见过很多嵌套的for循环。
彼得·奥尔森

1
将您的想法与矩阵计分相结合(以及最佳分数的更严格的上限),可以将我的机器上的时间减少到大约80秒(大约一个小时)。 此处的代码
thouis 2011年

这段时间中有很大一部分是在最佳可能分数的预计算中,这可以提高很多。
thouis 2011年
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.