在Scrabble及其变体中绘制一个不包含任何有效单词的机架非常困难。下面是我编写的R程序,用于估计最初的7排架子不含有效字词的可能性。它使用了蒙特卡洛方法和Words With Friends词典(我找不到以简单格式显示的官方Scrabble词典)。每个试验包括绘制一个7位数的架子,然后检查架子是否包含有效词。
最少单词的最少词典即可。如果一个单词不包含其他单词作为子集,则该单词是最小的。例如,“ em”是一个最小的单词;“空”不是。这样做的目的是,如果机架包含单词x,则它还必须包含x的任何子集。换句话说:如果机架不包含最少的单词,则不包含单词。幸运的是,词典中的大多数单词不是最小的,因此可以将其消除。您还可以合并置换等效词。我能够将Words With Friends词典从172,820个减少到201个最小单词。
并获得了0.004的估计概率,即初始机架不包含有效字词。该估计的估计标准误差为0.0002。在我的Mac Pro上运行只花了几分钟,包括下载词典。N=100,000
P(k-tile rack does not contain a word)=1−P(k-tile rack contains a word).
P(k-tile rack contains a word)=P(∪x∈M{k-tile rack contains x}),
MP(M)MMP(k-tile rack contains a word)=P(∪x∈M{k-tile rack contains x})=∑j=1|M|(−1)j−1∑S∈P(M):|S|=jP(∩x∈S{k-tile rack contains x})
∩x∈S{k-tile rack contains x}
P(∩x∈S{k-tile rack contains x})=∑w=0n∗P(∩x∈S{k-tile rack contains x}|k-tile rack contains w wildcards)×P(k-tile rack contains w wildcards).
我认为这在计算上更容易,因为与最小单词的可能子集相比,可能的架子更少。我们相继减少了可能的的集合k-tile机架,直到获得不包含任何单词的机架。对于Scrabble(或Words With Friends),可能的7位数架子数量为数百亿个。应该用几十行R代码来计算不包含可能单词的单词数。但是我认为您应该能够比列举所有可能的机架做得更好。例如,“ aa”是一个最小的单词。这立即消除了所有包含多个“ a”的机架。您可以用其他话重复。对于现代计算机,内存不应该成为问题。一个7位数的拼字游戏机架需要少于7个字节的存储空间。在最坏的情况下,我们将使用几千兆字节来存储所有可能的机架,但是我也不认为这是个好主意。有人可能想对此进行更多考虑。
# scrabble.R
# Created by Vincent Vu on 2011-01-07.
# Copyright 2011 Vincent Vu. All rights reserved.
# The Words With Friends lexicon
# http://code.google.com/p/dotnetperls-controls/downloads/detail?name=enable1.txt&can=2&q=
url <- 'http://dotnetperls-controls.googlecode.com/files/enable1.txt'
lexicon <- scan(url, what=character())
# Words With Friends
letters <- c(unlist(strsplit('abcdefghijklmnopqrstuvwxyz', NULL)), '?')
tiles <- c(9, 2, 2, 5, 13, 2, 3, 4, 8, 1, 1, 4, 2, 5, 8, 2, 1, 6, 5, 7, 4,
2, 2, 1, 2, 1, 2)
names(tiles) <- letters
# Scrabble
# tiles <- c(9, 2, 2, 4, 12, 2, 3, 2, 9, 1, 1, 4, 2, 6, 8, 2, 1, 6, 4, 6, 4,
# 2, 2, 1, 2, 1, 2)
# Reduce to permutation equivalent words
sort.letters.in.words <- function(x) {
sapply(lapply(strsplit(x, NULL), sort), paste, collapse='')
min.dict <- unique(sort.letters.in.words(lexicon))
min.dict.length <- nchar(min.dict)
# Find all minimal words of length k by elimination
# This is held constant across iterations:
# All words in min.dict contain no other words of length k or smaller
k <- 1
while(k < max(min.dict.length))
# List all k-letter words in min.dict
k.letter.words <- min.dict[min.dict.length == k]
# Find words in min.dict of length > k that contain a k-letter word
for(w in k.letter.words)
# Create a regexp pattern
makepattern <- function(x) {
paste('.*', paste(unlist(strsplit(x, NULL)), '.*', sep='', collapse=''),
p <- paste('.*',
paste(unlist(strsplit(w, NULL)),
'.*', sep='', collapse=''),
# Eliminate words of length > k that are not minimal
eliminate <- grepl(p, min.dict) & min.dict.length > k
min.dict <- min.dict[!eliminate]
min.dict.length <- min.dict.length[!eliminate]
k <- k + 1
# Converts a word into a letter distribution
letter.dist <- function(w, l=letters) {
d <- lapply(strsplit(w, NULL), factor, levels=l)
names(d) <- w
d <- lapply(d, table)
# Sample N racks of k tiles
N <- 1e5
k <- 7
rack <- replicate(N,
paste(sample(names(tiles), size=k, prob=tiles),
contains.word <- function(rack.dist, lex.dist)
# For each word in the lexicon, subtract the rack distribution from the
# letter distribution of the word. Positive results correspond to the
# number of each letter that the rack is missing.
y <- sweep(lex.dist, 1, rack.dist)
# If the total number of missing letters is smaller than the number of
# wildcards in the rack, then the rack contains that word
any(colSums(pmax(y,0)) <= rack.dist[names(rack.dist) == '?'])
# Convert rack and min.dict into letter distributions
min.dict.dist <- letter.dist(min.dict)
min.dict.dist <- do.call(cbind, min.dict.dist)
rack.dist <- letter.dist(rack, l=letters)
# Determine if each rack contains a valid word
x <- sapply(rack.dist, contains.word, lex.dist=min.dict.dist)
message("Estimate (and SE) of probability of no words based on ",
N, " trials:")
message(signif(1-mean(x)), " (", signif(sd(x) / sqrt(N)), ")")