最少的单词搜索


18

上周,我们使用英语中的前10,000个单词来创建最短的一维字符串。现在,让我们尝试2D挑战!

您需要做的是将上述所有单词都放入一个尽可能小的矩形中,以允许重叠。例如,如果您的单词是["ape","pen","ab","be","pa"],则可能的矩形将是:

.b..
apen

上面的矩形得分为5。

规则:

  • 一个单词中可以重叠多个字母
  • 单词可以在8个方向中的任何一个方向
  • 言语不能环绕
  • 您可以在空白位置使用任何字符

您需要创建一个单词搜索,其中包含英语中排名前10,000个单词(根据Google)。 您的分数等于单词搜索中的字符数(不包括未使用的字符)。如果存在平局,或者提交被证明是最佳选择,则首先发布的提交将获胜。


1
我想指出的是,我已经知道上一个单词搜索的挑战,但是鉴于所有答案都不会在合理的时间内解决该挑战,因此我认为它不会重复。
内森·美林


我担心最佳解决方案最终将是nx 1网格,从而使该问题最终与上一个相同(原因:切线相交很少会保存很多字符,但经常会引入“孔”,浪费空间)。也许您应该在宽度+高度上而不是在宽度*高度上对其打分,以便它强烈支持平方解(更有趣)。
戴夫

嗯...我担心解决方案只是将字串彼此堆叠在一起。我认为不给空位置打分可能是个好主意
Nathan Merrill

这样做的风险是,无需保持网格尺寸较小。如果一个1000x1000的网格具有水平和垂直的列表,则其得分与紧缩的螺旋图形相同/相似。也许尝试使用width + height,然后尝试使用字母(不含空格)作为平局决胜局?可能需要更多的思考。编辑:或者也许先用字母-排除-空白,然后再用宽度+高度作为决胜局,效果会更好。
戴夫

Answers:


7

Rust,31430 30081使用的字符

这是一种贪婪的算法:我们先从一个空的网格开始,然后反复添加可以添加的单词,该单词可以用最少的新字母添加,而通过优先选择较长的单词来打破纽带。为了快速进行此操作,我们维护了候选单词放置的优先级队列(实现为双端队列向量的向量,每个新字母的向量都有一个向量,每个单词长度包含一个双端队列)。对于每个新添加的字母,我们将排队通过该字母的所有候选展示位置。

编译并运行rustc -O wordsearch.rs; ./wordsearch < google-10000-english.txt。在我的笔记本电脑上,使用531 MiB RAM可以在70秒内运行。

输出适合一个具有248列和253行的矩形。

在此处输入图片说明

use std::collections::{HashMap, HashSet, VecDeque};
use std::io::prelude::*;
use std::iter::once;
use std::vec::Vec;

type Coord = i16;
type Pos = (Coord, Coord);
type Dir = u8;
type Word = u16;

struct Placement { word: Word, dir: Dir, pos: Pos }

static DIRS: [Pos; 8] =
    [(1, 0), (1, 1), (0, 1), (-1, 1), (-1, 0), (-1, -1), (0, -1), (1, -1)];

fn fit(grid: &HashMap<Pos, u8>, (x, y): Pos, d: Dir, word: &String) -> Option<usize> {
    let (dx, dy) = DIRS[d as usize];
    let mut n = 0;
    for (i, c) in word.bytes().enumerate() {
        if let Some(c1) = grid.get(&(x + (i as Coord)*dx, y + (i as Coord)*dy)) {
            if c != *c1 {
                return None;
            }
        } else {
            n += 1;
        }
    }
    return Some(n)
}

struct PlacementQueue { queue: Vec<Vec<VecDeque<Placement>>>, extra: usize }

impl PlacementQueue {
    fn new() -> PlacementQueue {
        return PlacementQueue { queue: Vec::new(), extra: std::usize::MAX }
    }

    fn enqueue(self: &mut PlacementQueue, extra: usize, total: usize, placement: Placement) {
        while self.queue.len() <= extra {
            self.queue.push(Vec::new());
        }
        while self.queue[extra].len() <= total {
            self.queue[extra].push(VecDeque::new());
        }
        self.queue[extra][total].push_back(placement);
        if self.extra > extra {
            self.extra = extra;
        }
    }

    fn dequeue(self: &mut PlacementQueue) -> Option<Placement> {
        while self.extra < self.queue.len() {
            let mut subqueue = &mut self.queue[self.extra];
            while !subqueue.is_empty() {
                let total = subqueue.len() - 1;
                if let Some(placement) = subqueue[total].pop_front() {
                    return Some(placement);
                }
                subqueue.pop();
            }
            self.extra += 1;
        }
        return None
    }
}

fn main() {
    let stdin = std::io::stdin();
    let all_words: Vec<String> =
        stdin.lock().lines().map(|l| l.unwrap()).collect();
    let words: Vec<&String> = {
        let subwords: HashSet<&str> =
            all_words.iter().flat_map(|word| {
                (0..word.len() - 1).flat_map(move |i| {
                    (i + 1..word.len() - (i == 0) as usize).map(move |j| {
                        &word[i..j]
                    })
                })
            }).collect();
        all_words.iter().filter(|word| !subwords.contains(&word[..])).collect()
    };
    let letters: Vec<Vec<(usize, usize)>> =
        (0..128).map(|c| {
            words.iter().enumerate().flat_map(|(w, word)| {
                word.bytes().enumerate().filter(|&(_, c1)| c == c1).map(move |(i, _)| (w, i))
            }).collect()
        }).collect();

    let mut used = vec![false; words.len()];
    let mut remaining = words.len();
    let mut grids: Vec<HashMap<Pos, u8>> = Vec::new();

    while remaining != 0 {
        let mut grid: HashMap<Pos, u8> = HashMap::new();
        let mut queue = PlacementQueue::new();
        for (w, word) in words.iter().enumerate() {
            if used[w] {
                continue;
            }
            queue.enqueue(0, word.len(), Placement {
                pos: (0, 0),
                dir: 0,
                word: w as Word
            });
        }

        while let Some(placement) = queue.dequeue() {
            if used[placement.word as usize] {
                continue;
            }
            let word = words[placement.word as usize];
            if let None = fit(&grid, placement.pos, placement.dir, word) {
                continue;
            }
            let (x, y) = placement.pos;
            let (dx, dy) = DIRS[placement.dir as usize];
            let new_letters: Vec<(usize, u8)> = word.bytes().enumerate().filter(|&(i, _)| {
                !grid.contains_key(&(x + (i as Coord)*dx, y + (i as Coord)*dy))
            }).collect();
            for (i, c) in word.bytes().enumerate() {
                grid.insert((x + (i as Coord)*dx, y + (i as Coord)*dy), c);
            }
            used[placement.word as usize] = true;
            remaining -= 1;

            for (i, c) in new_letters {
                for &(w1, j) in &letters[c as usize] {
                    if used[w1] {
                        continue;
                    }
                    let word1 = words[w1];
                    for (d1, &(dx1, dy1)) in DIRS.iter().enumerate() {
                        let pos1 = (
                            x + (i as Coord)*dx - (j as Coord)*dx1,
                            y + (i as Coord) - (j as Coord)*dy1);
                        if let Some(extra1) = fit(&grid, pos1, d1 as Dir, word1) {
                            queue.enqueue(extra1, word1.len(), Placement {
                                pos: pos1,
                                dir: d1 as Dir,
                                word: w1 as Word
                            });
                        }
                    }
                }
            }
        }
        grids.push(grid);
    }

    let width = grids.iter().map(|grid| {
        grid.iter().map(|(&(x, _), _)| x).max().unwrap() -
            grid.iter().map(|(&(x, _), _)| x).min().unwrap() + 1
    }).max().unwrap();
    print!(
        "{}",
        grids.iter().flat_map(|grid| {
            let x0 = grid.iter().map(|(&(x, _), _)| x).min().unwrap();
            let y0 = grid.iter().map(|(&(_, y), _)| y).min().unwrap();
            let y1 = grid.iter().map(|(&(_, y), _)| y).max().unwrap();
            (y0..y1 + 1).flat_map(move |y| {
                (x0..x0 + width).map(move |x| {
                    *grid.get(&(x, y)).unwrap_or(&('.' as u8)) as char
                }).chain(once('\n').take(1))
            })
        }).collect::<String>()
    );
}

我还没有阅读代码,但是您是否采取任何措施鼓励非线性展示位置?我本来希望像这样的算法最终会遇到一些交叉的超字符串,但是看起来您正在获得一些相当不错的空间填充。
戴夫

@Dave没什么特别的,它就是这样解决的。超级字符串永远不会太长,以至于永远找不到更好的非线性位置,这可能是因为有更多的非线性位置可供选择。
安德斯·卡塞格

以“祝贺”开始,以“非凡”结束
YOU

我没想到你也可以走对角线。谢谢你的照片。我不知道我是否希望对代码块发表评论。:)
Titus

4

C ++,27243个字符网格(248x219,填充50.2%)

(将此作为新答案发布,因为我想保持最初发布的1D绑定作为参考)

这种公然偷窃在很大程度上受到启发@ AndersKaseorg的回答在其主体结构,但有几个调整的。首先,我使用原始程序合并字符串,直到最佳的重叠部分只有3个字符。然后,我使用AndersKaseorg描述的方法使用这些生成的字符串逐步填充2D网格。约束也略有不同:每次仍然尝试添加最少的字符,但是通过先偏爱方形网格,然后偏小的网格并最后增加最长的单词来打破联系。

它显示的行为是在填充空间和快速扩展网格之间交替(不幸的是,在快速扩展阶段之后它用尽了字,因此边缘周围有很多空白空间)。我怀疑,通过对成本函数进行一些调整,可以使其优于50%的空间填充。

这里有2个可执行文件(以避免在迭代改进算法时重新运行整个过程)。一个的输出可以直接通过管道传递到另一个:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <cstdlib>

std::size_t calcOverlap(const std::string &a, const std::string &b, std::size_t limit, std::size_t minimal) {
    std::size_t la = a.size();
    for(std::size_t p = std::min(std::min(la, b.size()), limit + 1); -- p > minimal; ) {
        if(a.compare(la - p, p, b, 0, p) == 0) {
            return p;
        }
    }
    return 0;
}

bool isSameReversed(const std::string &a, const std::string &b) {
    std::size_t l = a.size();
    if(b.size() != l) {
        return false;
    }
    for(std::size_t i = 0; i < l; ++ i) {
        if(a[i] != b[l-i-1]) {
            return false;
        }
    }
    return true;
}

int main(int argc, const char *const *argv) {
    // Usage: prog [<stop_threshold>]

    std::size_t stopThreshold = 3;

    if(argc >= 2) {
        char *check;
        long v = std::strtol(argv[1], &check, 10);
        if(check == argv[1] || v < 0) {
            std::cerr
                << "Invalid stop threshold. Should be an integer >= 0"
                << std::endl;
            return 1;
        }
        stopThreshold = v;
    }

    std::vector<std::string> words;

    // Load all words from input and their reverses (words can be backwards now)
    while(true) {
        std::string word;
        std::getline(std::cin, word);
        if(word.empty()) {
            break;
        }
        words.push_back(word);
        std::reverse(word.begin(), word.end());
        words.push_back(std::move(word));
    }

    std::cerr
        << "Input word count: " << words.size() << std::endl;

    // Remove all fully subsumed words

    for(auto p = words.begin(); p != words.end(); ) {
        bool subsumed = false;
        for(auto i = words.begin(); i != words.end(); ++ i) {
            if(i == p) {
                continue;
            }
            if(i->find(*p) != std::string::npos) {
                subsumed = true;
                break;
            }
        }
        if(subsumed) {
            p = words.erase(p);
        } else {
            ++ p;
        }
    }

    std::cerr
        << "After subsuming checks: " << words.size()
        << std::endl;

    // Sort words longest-to-shortest (not necessary but doesn't hurt. Makes finding maxlen a tiny bit easier)
    std::sort(words.begin(), words.end(), [](const std::string &a, const std::string &b) {
        return a.size() > b.size();
    });

    std::size_t maxlen = words.front().size();

    // Repeatedly combine most-compatible words until we reach the threshold
    std::size_t bestPossible = maxlen - 1;
    while(words.size() > 2) {
        auto bestA = words.begin();
        auto bestB = -- words.end();
        std::size_t bestOverlap = 0;
        for(auto p = ++ words.begin(), e = words.end(); p != e; ++ p) {
            if(p->size() - 1 <= bestOverlap) {
                continue;
            }
            for(auto q = words.begin(); q != p; ++ q) {
                std::size_t overlap = calcOverlap(*p, *q, bestPossible, bestOverlap);
                if(overlap > bestOverlap && !isSameReversed(*p, *q)) {
                    bestA = p;
                    bestB = q;
                    bestOverlap = overlap;
                }
                overlap = calcOverlap(*q, *p, bestPossible, bestOverlap);
                if(overlap > bestOverlap && !isSameReversed(*p, *q)) {
                    bestA = q;
                    bestB = p;
                    bestOverlap = overlap;
                }
            }
            if(bestOverlap == bestPossible) {
                break;
            }
        }
        if(bestOverlap <= stopThreshold) {
            break;
        }
        std::string newStr = std::move(*bestA);
        newStr.append(*bestB, bestOverlap, std::string::npos);

        if(bestA == -- words.end()) {
            words.pop_back();
            *bestB = std::move(words.back());
            words.pop_back();
        } else {
            *bestB = std::move(words.back());
            words.pop_back();
            *bestA = std::move(words.back());
            words.pop_back();
        }

        // Remove any words which are now in the result (forward or reverse)
        // (would not be necessary if we didn't have the reversed forms too)
        std::string newRev = newStr;
        std::reverse(newRev.begin(), newRev.end());
        for(auto p = words.begin(); p != words.end(); ) {
            if(newStr.find(*p) != std::string::npos || newRev.find(*p) != std::string::npos) {
                std::cerr << "Now subsumes: " << *p << std::endl;
                p = words.erase(p);
            } else {
                ++ p;
            }
        }

        std::cerr
            << "Words remaining: " << (words.size() + 1)
            << " Latest combination: (" << bestOverlap << ") " << newStr
            << std::endl;

        words.push_back(std::move(newStr));
        words.push_back(std::move(newRev));
        bestPossible = bestOverlap; // Merging existing words will never make longer merges possible
    }

    std::cerr
        << "After merging: " << words.size()
        << std::endl;

    // Remove all fully subsumed words (i.e. reversed words)

    for(auto p = words.begin(); p != words.end(); ) {
        bool subsumed = false;
        std::string rev = *p;
        std::reverse(rev.begin(), rev.end());
        for(auto i = words.begin(); i != words.end(); ++ i) {
            if(i == p) {
                continue;
            }
            if(i->find(*p) != std::string::npos || i->find(rev) != std::string::npos) {
                subsumed = true;
                break;
            }
        }
        if(subsumed) {
            p = words.erase(p);
        } else {
            ++ p;
        }
    }

    std::cerr
        << "After subsuming: " << words.size()
        << std::endl;

    // Sort words longest-to-shortest for display
    std::sort(words.begin(), words.end(), [](const std::string &a, const std::string &b) {
        return a.size() > b.size();
    });

    std::size_t len = 0;
    for(const auto &word : words) {
        std::cout
            << word
            << std::endl;
        len += word.size();
    }
    std::cerr
        << "Total size: " << len
        << std::endl;
    return 0;
}
#include <iostream>
#include <string>
#include <vector>
#include <unordered_map>
#include <unordered_set>
#include <limits>

class vec2 {
public:
    int x;
    int y;

    vec2(void) : x(0), y(0) {};
    vec2(int x, int y) : x(x), y(y) {}

    bool operator ==(const vec2 &b) const {
        return x == b.x && y == b.y;
    }

    vec2 &operator +=(const vec2 &b) {
        x += b.x;
        y += b.y;
        return *this;
    }

    vec2 &operator -=(const vec2 &b) {
        x -= b.x;
        y -= b.y;
        return *this;
    }

    vec2 operator +(const vec2 b) const {
        return vec2(x + b.x, y + b.y);
    }

    vec2 operator *(const int b) const {
        return vec2(x * b, y * b);
    }
};

class box2 {
public:
    vec2 tl;
    vec2 br;

    box2(void) : tl(), br() {};
    box2(vec2 a, vec2 b)
        : tl(std::min(a.x, b.x), std::min(a.y, b.y))
        , br(std::max(a.x, b.x) + 1, std::max(a.y, b.y) + 1)
    {}

    void grow(const box2 &b) {
        if(b.tl.x < tl.x) {
            tl.x = b.tl.x;
        }
        if(b.br.x > br.x) {
            br.x = b.br.x;
        }
        if(b.tl.y < tl.y) {
            tl.y = b.tl.y;
        }
        if(b.br.y > br.y) {
            br.y = b.br.y;
        }
    }

    bool intersects(const box2 &b) const {
        return (
            ((tl.x >= b.br.x) != (br.x > b.tl.x)) &&
            ((tl.y >= b.br.y) != (br.y > b.tl.y))
        );
    }

    box2 &operator +=(const vec2 b) {
        tl += b;
        br += b;
        return *this;
    }

    int width(void) const {
        return br.x - tl.x;
    }

    int height(void) const {
        return br.y - tl.y;
    }

    int maxdim(void) const {
        return std::max(width(), height());
    }
};

template <> struct std::hash<vec2> {
    std::size_t operator ()(const vec2 &o) const {
        return std::hash<int>()(o.x) + std::hash<int>()(o.y) * 997;
    }
};

template <class A,class B> struct std::hash<std::pair<A,B>> {
    std::size_t operator ()(const std::pair<A,B> &o) const {
        return std::hash<A>()(o.first) + std::hash<B>()(o.second) * 31;
    }
};

class word_placement {
public:
    vec2 start;
    vec2 dir;
    box2 bounds;
    const std::string *word;

    word_placement(vec2 start, vec2 dir, const std::string *word)
        : start(start)
        , dir(dir)
        , bounds(start, start + dir * (word->size() - 1))
        , word(word)
    {}

    word_placement(vec2 start, const word_placement &copy)
        : start(copy.start + start)
        , dir(copy.dir)
        , bounds(copy.bounds)
        , word(copy.word)
    {
        bounds += start;
    }

    word_placement(const word_placement &copy)
        : start(copy.start)
        , dir(copy.dir)
        , bounds(copy.bounds)
        , word(copy.word)
    {}
};

class word_placement_links {
public:
    std::unordered_set<word_placement*> placements;
    std::unordered_set<std::pair<char,word_placement*>> relativePlacements;
};

class grid {
public:
    std::vector<std::string> wordCache; // Just a block of memory for our pointers to reference
    std::unordered_map<vec2,char> state;
    std::unordered_set<word_placement*> placements;
    std::unordered_map<const std::string*,word_placement_links> wordPlacements;
    std::unordered_map<char,std::unordered_set<word_placement*>> relativeWordPlacements;
    box2 bound;

    grid(const std::vector<std::string> &words) {
        wordCache = words;
        std::vector<vec2> directions;
        directions.emplace_back(+1,  0);
        directions.emplace_back(+1, +1);
        directions.emplace_back( 0, +1);
        directions.emplace_back(-1, +1);
        directions.emplace_back(-1,  0);
        directions.emplace_back(-1, -1);
        directions.emplace_back( 0, -1);
        directions.emplace_back(+1, -1);

        wordPlacements.reserve(wordCache.size());
        placements.reserve(wordCache.size());
        relativeWordPlacements.reserve(64);

        std::size_t total = 0;
        for(const std::string &word : wordCache) {
            word_placement_links &p = wordPlacements[&word];
            p.placements.reserve(8);
            auto &rp = p.relativePlacements;
            std::size_t l = word.size();
            rp.reserve(l * directions.size());
            for(int i = 0; i < l; ++ i) {
                for(const vec2 &d : directions) {
                    word_placement *rwp = new word_placement(d * -i, d, &word);
                    rp.emplace(word[i], rwp);
                    relativeWordPlacements[word[i]].insert(rwp);
                }
            }
            total += l;
        }
        state.reserve(total);
    }

    const std::string *find_word(const std::string &word) const {
        for(const std::string &w : wordCache) {
            if(w == word) {
                return &w;
            }
        }
        throw std::string("Failed to find word in cache");
    }

    void remove_word(const std::string *word) {
        const word_placement_links &links = wordPlacements[word];
        for(word_placement *p : links.placements) {
            placements.erase(p);
            delete p;
        }
        for(auto &p : links.relativePlacements) {
            relativeWordPlacements[p.first].erase(p.second);
            delete p.second;
        }
        wordPlacements.erase(word);
    }

    void remove_placement(word_placement *placement) {
        wordPlacements[placement->word].placements.erase(placement);
        placements.erase(placement);
        delete placement;
    }

    bool check_placement(const word_placement &placement) const {
        vec2 p = placement.start;
        for(const char c : *placement.word) {
            auto i = state.find(p);
            if(i != state.end() && i->second != c) {
                return false;
            }
            p += placement.dir;
        }
        return true;
    }

    int check_new(const word_placement &placement) const {
        int n = 0;
        vec2 p = placement.start;
        for(const char c : *placement.word) {
            n += !state.count(p);
            p += placement.dir;
        }
        return n;
    }

    void check_placements(const box2 &b) {
        for(auto i = placements.begin(); i != placements.end(); ) {
            if(!b.intersects((*i)->bounds) || check_placement(**i)) {
                ++ i;
            } else {
                i = placements.erase(i);
            }
        }
    }

    void add_placement(const vec2 p, const word_placement &relative) {
        word_placement check(p, relative);
        if(check_placement(check)) {
            word_placement *wp = new word_placement(check);
            placements.insert(wp);
            wordPlacements[relative.word].placements.insert(wp);
        }
    }

    void place(word_placement placement) {
        remove_word(placement.word);
        int overlap = 0;
        for(const char c : *placement.word) {
            char &g = state[placement.start];
            if(g == '\0') {
                g = c;
                for(const word_placement *rp : relativeWordPlacements[c]) {
                    add_placement(placement.start, *rp);
                }
            } else if(g != c) {
                throw std::string("New word changes an existing character!");
            } else {
                ++ overlap;
            }
            placement.start += placement.dir;
        }
        bound.grow(placement.bounds);
        check_placements(placement.bounds);

        std::cerr
            << draw('.', "\n")
            << "Added " << *placement.word << " (overlap: " << overlap << ")"
            << ", Grid: " << bound.width() << "x" << bound.height() << " of " << state.size() << " chars"
            << ", Words remaining: " << wordPlacements.size()
            << std::endl;
    }

    int check_cost(box2 b) const {
        b.grow(bound);
        return (
            ((b.maxdim() - bound.maxdim()) << 16) |
            (b.width() + b.height() - bound.width() - bound.height())
        );
    }

    void add_next(void) {
        int bestNew = std::numeric_limits<int>::max();
        int bestCost = std::numeric_limits<int>::max();
        int bestLen = 0;
        word_placement *best = nullptr;
        for(word_placement *p : placements) {
            int n = check_new(*p);
            if(n <= bestNew) {
                int l = p->word->size();
                int cost = check_cost(box2(p->start, p->start + p->dir * l));
                if(n < bestNew || cost < bestCost || (cost == bestCost && l < bestLen)) {
                    bestNew = n;
                    bestCost = cost;
                    bestLen = l;
                    best = p;
                }
            }
        }
        if(best == nullptr) {
            throw std::string("Failed to find join to existing blob");
        }
        place(*best);
    }

    void fill(void) {
        while(!placements.empty()) {
            add_next();
        }
    }

    std::string draw(char blank, const std::string &linesep) const {
        std::string result;
        result.reserve((bound.width() + linesep.size()) * bound.height());
        for(int y = bound.tl.y; y < bound.br.y; ++ y) {
            for(int x = bound.tl.x; x < bound.br.x; ++ x) {
                auto c = state.find(vec2(x, y));
                result.push_back((c == state.end()) ? blank : c->second);
            }
            result.append(linesep);
        }
        return result;
    }

    box2 bounds(void) const {
        return bound;
    }

    int chars(void) const {
        return state.size();
    }
};

int main(int argc, const char *const *argv) {
    std::vector<std::string> words;

    // Load all words from input
    while(true) {
        std::string word;
        std::getline(std::cin, word);
        if(word.empty()) {
            break;
        }
        words.push_back(std::move(word));
    }

    std::cerr
        << "Input word count: " << words.size() << std::endl;

    // initialise grid
    grid g(words);

    // add first word (order of input file means this is longest word)
    g.place(word_placement(vec2(0, 0), vec2(1, 0), g.find_word(words.front())));

    // add all other words
    g.fill();

    std::cout << g.draw('.', "\n");

    int w = g.bounds().width();
    int h = g.bounds().height();
    int n = g.chars();
    std::cerr
        << "Final grid: " << w << "x" << h
        << " with " << n << " characters"
        << " (" << (n * 100.0 / (w * h)) << "% filled)"
        << std::endl;
    return 0;
}

最后,结果是:

最终格


替代结果(修复了程序中一些会偏向某些方向并调整成本函数的错误之后,我得到了一个更紧凑但更不理想的解决方案):29275个字符,198x195(填充75.8%):

方格

再一次,我没有做太多优化这些程序的事情,所以要花一些时间。但是您可以看到它充满网格,这很催眠。


2

C ++,34191个字符“网格”(通过最少的人工干预,可以轻松保存6或7)

这应该更多地作为2D情况的边界,因为答案仍然是1D字符串。这只是我上一个挑战的代码,但是具有反转任何字符串的新功能。这给了我们更多的组合单词的空间(特别是因为它把不重叠的超字符串的最坏情况限制为26;每个字母对应一个)。

对于一些轻微的2D视觉吸引力,如果可以免费(例如,在0个重叠的字之间),可以在结果中添加换行符。

相当慢(仍然没有缓存)。这是代码:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

std::size_t calcOverlap(const std::string &a, const std::string &b, std::size_t limit, std::size_t minimal) {
    std::size_t la = a.size();
    for(std::size_t p = std::min(std::min(la, b.size()), limit + 1); -- p > minimal; ) {
        if(a.compare(la - p, p, b, 0, p) == 0) {
            return p;
        }
    }
    return 0;
}

bool isSameReversed(const std::string &a, const std::string &b) {
    std::size_t l = a.size();
    if(b.size() != l) {
        return false;
    }
    for(std::size_t i = 0; i < l; ++ i) {
        if(a[i] != b[l-i-1]) {
            return false;
        }
    }
    return true;
}

int main() {
    std::vector<std::string> words;

    // Load all words from input and their reverses (words can be backwards now)
    while(true) {
        std::string word;
        std::getline(std::cin, word);
        if(word.empty()) {
            break;
        }
        words.push_back(word);
        std::reverse(word.begin(), word.end());
        words.push_back(std::move(word));
    }

    std::cerr
        << "Input word count: " << words.size() << std::endl;

    // Remove all fully subsumed words

    for(auto p = words.begin(); p != words.end(); ) {
        bool subsumed = false;
        for(auto i = words.begin(); i != words.end(); ++ i) {
            if(i == p) {
                continue;
            }
            if(i->find(*p) != std::string::npos) {
                subsumed = true;
                break;
            }
        }
        if(subsumed) {
            p = words.erase(p);
        } else {
            ++ p;
        }
    }

    std::cerr
        << "After subsuming checks: " << words.size()
        << std::endl;

    // Sort words longest-to-shortest (not necessary but doesn't hurt. Makes finding maxlen a tiny bit easier)
    std::sort(words.begin(), words.end(), [](const std::string &a, const std::string &b) {
        return a.size() > b.size();
    });

    std::size_t maxlen = words.front().size();

    // Repeatedly combine most-compatible words until we have only 1 word left (+ its reverse)
    std::size_t bestPossible = maxlen - 1;
    while(words.size() > 2) {
        auto bestA = words.begin();
        auto bestB = -- words.end();
        std::size_t bestOverlap = 0;
        for(auto p = ++ words.begin(), e = words.end(); p != e; ++ p) {
            if(p->size() - 1 <= bestOverlap) {
                continue;
            }
            for(auto q = words.begin(); q != p; ++ q) {
                std::size_t overlap = calcOverlap(*p, *q, bestPossible, bestOverlap);
                if(overlap > bestOverlap && !isSameReversed(*p, *q)) {
                    bestA = p;
                    bestB = q;
                    bestOverlap = overlap;
                }
                overlap = calcOverlap(*q, *p, bestPossible, bestOverlap);
                if(overlap > bestOverlap && !isSameReversed(*p, *q)) {
                    bestA = q;
                    bestB = p;
                    bestOverlap = overlap;
                }
            }
            if(bestOverlap == bestPossible) {
                break;
            }
        }
        std::string newStr = std::move(*bestA);
        if(bestOverlap == 0) {
            newStr.push_back('\n');
        }
        newStr.append(*bestB, bestOverlap, std::string::npos);

        if(bestA == -- words.end()) {
            words.pop_back();
            *bestB = std::move(words.back());
            words.pop_back();
        } else {
            *bestB = std::move(words.back());
            words.pop_back();
            *bestA = std::move(words.back());
            words.pop_back();
        }

        // Remove any words which are now in the result (forward or reverse)
        // (would not be necessary if we didn't have the reversed forms too)
        std::string newRev = newStr;
        std::reverse(newRev.begin(), newRev.end());
        for(auto p = words.begin(); p != words.end(); ) {
            if(newStr.find(*p) != std::string::npos || newRev.find(*p) != std::string::npos) {
                std::cerr << "Now subsumes: " << *p << std::endl;
                p = words.erase(p);
            } else {
                ++ p;
            }
        }

        std::cerr
            << "Words remaining: " << (words.size() + 1)
            << " Latest combination: (" << bestOverlap << ") " << newStr
            << std::endl;

        words.push_back(std::move(newStr));
        words.push_back(std::move(newRev));
        bestPossible = bestOverlap; // Merging existing words will never make longer merges possible
    }

    std::cerr
        << "After non-trivial merging: " << words.size()
        << std::endl;

    if(words.size() == 2 && !isSameReversed(words.front(), words.back())) {
        // must be 2 palindromes, so just join them
        words.front().append(words.back());
    }

    std::string result = words.front();

    std::cout
        << result
        << std::endl;
    std::cerr
        << "Word size: " << result.size() // Note this number includes newlines, so to get the grid size according to the rules, subtract newlines manually
        << std::endl;
    return 0;
}

结果:http : //pastebin.com/UTe2WMcz(比上一个挑战少4081个字符)

很明显,将xdwv线垂直放置,与怪物线相交可以节省一些琐碎的费用。然后hhidetautisbneudui可以与相交d,并lxwwwowaxocnnaesddaw。这样可以节省4个字符。nbcllilhn可以替换为现有的s重叠(如果可以找到)以保存另外2个(如果不存在这样的重叠,则仅保存1个,而必须垂直添加)。最后,mjjrajaytq可以将其垂直添加到某个位置以保存1。这意味着在最少的人工干预下,可以从结果中保存6-7个字符。

我想通过以下方法将其转换为2D,但我一直在努力寻找一种无需使算法O(n ^ 4)即可实现的方法,这在计算上是不切实际的!

  1. 按上述方法运行算法,但是当重叠达到1个字符时停止
  2. 反复:
    1. 查找一组可以排列成矩形的4个单词
    2. 在此矩形的顶部添加尽可能多的单词,其中每个单词与当前形状的至少2个字符重叠(检查所有8个方向)-这是我们实际上可以获得比当前代码更多优势的唯一阶段
  3. 合并生成的网格和单字,每次查找单字母重叠

0

的PHP

这个人理论上做这项工作;但是10000可能是太多的递归单词。该脚本正在运行。(仍在24小时后运行)
在小型目录上工作正常,但下周我可能会制作一个迭代版本。

$f=array("pen","op","po","ne","pro","aaa","abcd","dcba"); will output ABCD 近似熵 AROP AO .. although this is not an optimal result (scoring was changed ... I´m working on a generator). One optimal result is this: 打开 .RA .oa dcba`

它也不是很快。仅删除子字符串并按长度
对其余部分进行排序,其余部分则是蛮力的:尝试将单词适合到矩形中,如果失败则尝试使用更大的矩形。

顺便说一句:在我的机器上,子字符串部分需要4.5分钟才能进入大型目录,
并将其缩减为6,190个字;对它们的排序需要11秒。

$f=file('https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english.txt');
// A: remove substrings - forward or reversed
$s=join(' ',$f);
$haystack="$s ".strrev($s);
foreach($f as$w)
{
    $r=strrev($w=trim($w)); // remove trailing line break and create reverse word
    if(!preg_match("%$w\w|\w$w%",$haystack)
        // no substr match ... now: is the reverse word in the list?
        // if so, keep only the lower one (ascii values)
        &!($w>$r&&strstr($s,$r))
        // strstr does NOT render the reverse substr regex obsolete:
        // this is only executed for $w=abc, not for $w=bca!
    )
        $g[]=$w
    ;
}

// B: sort the words by length
usort($g,function($a,$b){return strlen($a)-strlen($b);});

// C1: function to fit $words into $map
function gomap($words,$map)
{
    $h=count($map);$w=strlen($map[0]);
    $len=strlen($word=array_pop($words));
    // $x,$y=position; $d=0:horizontal, $d=1:vertical; $r=0: word, $r=1: reverse word
    for($x=$w-$len;$x>=0;$x--)for($y=$h-$len;$y>=0;$y--)for($d=0;$d<2;$d++)for($r=0;$r<2;$r++)
    {
        // does the word fit there?
        $drow=$r?strrev($word):$word;
        for($ok=1,$i=0;$ok&$i<$len;$i++)
            $ok=in_array($map[$y+$d*$i][$x+$i-$d*$i], [' ',$drow[$i]])
        ;
        // it does, paint it
        if($ok)
        {
            for($i=0;$i<$len;$i++)
                $map[$y+$d*$i][$x+$i-$d*$i]=$drow[$i];
            if(!count($words))      // this was the last word: return map
                return $map;
            else                    // there are more words: recurse
                if ($ok=gomap($words,$map))
                    return $ok;
            // no fit, try next position
        }
    }
    return 0;
}

// C2: rectangle loop
for($h=0;++$h;)for($w=0;$w++<$h;)   // define a rectangle
{
    // and try to fit the words in there
    if($map=gomap($g,
        array_fill(0,$h,str_repeat(' ',$w))
    ))
    {
        // words fit; output and break loops
        echo '<pre>',implode("\n",$map),'</pre>';
        break 2;
    }
}

当程序在较小的词典上运行时,您能否提供一个示例?
罗夫霍

我实际上已经更改了得分(对不起!)。分数中不包括未使用的字符数。
内森·美林

2
这里的循环意味着这是〜O((w * h)^ n)。我们知道该解决方案将具有约35k个字母(来自上一个挑战),因此最终将调用gomap约35000 ^ 6000次。我的计算器告诉我这是“无穷大”。一个更好的计算器会告诉我实际的数字(wolframalpha.com/input/?i=35000%5E6000)。现在,如果我们假设宇宙中的每个原子都是一个专用于运行该程序的3 terrahertz处理器,那么宇宙存在的时间必须比迄今为止完成的时间长10 ^ 27154倍。我的意思是:不要等待它完成!
戴夫
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.