最快的最长公共子序列查找器

你的任务是解决最长公共子序列的ň长度为1000的字符串。

给LCS问题的一个有效的解决方案为两个或多个字符串小号₁，内容S _Ñ是任意字符串Ť使得字符最大长度的Ť出现在所有s ^ _我，以相同的顺序在Ť。

注意，T不必是S _i的子字符串。

我们已经用最短的代码解决了这个问题。这次，大小无关紧要。

例

字符串axbycz和 xaybzc具有8个常见的长度为3的子序列：

abc abz ayc ayz xbc xbz xyc xyz

这些中的任何一个都是解决LCS问题的有效解决方案。

细节

如上所述，编写一个完整的程序来解决LCS问题，并遵守以下规则：

输入将由两个或多个长度为1000的字符串组成，这些字符串由代码点在0x30和0x3F之间的ASCII字符组成。
您必须从STDIN读取输入。

输入格式有两种选择：
- 每个字符串（包括最后一个字符串）后跟一个换行符。
- 字符串被链接在一起，没有分隔符，也没有尾随换行符。
字符串数将作为命令行参数传递给您的程序。
您必须将输出（即，对LCS的任何有效解决方案之一）写入STDOUT，然后再写入一个换行符。
您选择的语言必须具有针对我的操作系统（Fedora 21）的免费（例如在啤酒中）编译器/解释器。
如果您需要任何编译器标志或特定的解释器，请在您的文章中提及。

计分

我将使用2、3等字符串运行您的代码，直到花费超过120秒才能打印出有效的解决方案。这意味着每个n值都有120秒。

您的代码及时完成的最大字符串量就是您的得分。

如果得分为n，则在最短时间内解决n个字符串问题的提交将被宣布为获胜者。

所有提交都将在我的计算机上计时（Intel Core i7-3770、16 GiB RAM，无交换）。

第（n-1）^个测试的n个字符串将通过调用（并在需要时剥离换行符）生成，其定义如下：rand nrand

rand()
{
    head -c$[500*$1] /dev/zero |
    openssl enc -aes-128-ctr -K 0 -iv $1 |
    xxd -c500 -ps |
    tr 'a-f' ':-?'
}

密钥0在上面的代码中，但是如果我怀疑有人（部分）对输出进行硬编码，则保留将其更改为未公开值的权利。

string fastest-code subsequence

— 丹尼斯
source

我们可以抛出异常吗？

— HyperNeutrino 2015年

@JamesSmith只要输出正确，就可以确定。

— 丹尼斯

由于我正在使用bufferedReader进行阅读，我可以从抛出ioexception public static void main(...)吗？

— HyperNeutrino，2015年

@JamesSmith我不太了解Java，所以我不知道那是什么，但是不必担心异常。

— 丹尼斯

@JamesSmith由于代码长度对于此挑战并不重要，因此您不能简单地捕获异常吗？

— Reto Koradi

Answers:

C，大约7秒内n = 3

算法

该算法是标准动态编程解决方案对n序列的直接概括。对于2个字符串A和B，标准重复如下：

L(p, q) = 1 + L(p - 1, q - 1)           if A[p] == B[q]
        = max(L(p - 1, q), L(p, q - 1)) otherwise

对于3串A，B，C我使用：

L(p, q, r) = 1 + L(p - 1, q - 1, r - 1)                          if A[p] == B[q] == C[r]
           = max(L(p - 1, q, r), L(p, q - 1, r), L(p, q, r - 1)) otherwise

该代码针对的任意值实现此逻辑n。

效率

我的代码的复杂度为O（s ^ n），具有s字符串的长度。根据我的发现，看来问题是NP完全的。因此，对于较大的值，发布的算法效率很低n，实际上可能无法做得更好。我唯一看到的是一些提高小字母效率的方法。由于此处的字母较小（16），因此可能会有所改善。我仍然预测，没有人会找到比n = 42分钟还要高的合法解决方案，而且n = 4看起来已经雄心勃勃。

我减少了初始实现中的内存使用量，以便可以在n = 4给定的时间内处理。但是它仅产生序列的长度，而不产生序列本身。检查这篇文章的修订历史以查看该代码。

码

由于n维矩阵上的循环比固定循环需要更多的逻辑，因此我将固定循环用于最低维度，而仅将通用循环逻辑用于其余维度。

#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>

#define N_MAX 4

int main(int argc, char* argv[]) {
    int nSeq = argc - 1;
    if (nSeq > N_MAX) {
        nSeq = N_MAX;
    }

    char** seqA = argv + 1;

    uint64_t totLen = 1;
    uint64_t lenA[N_MAX] = {0};
    uint64_t offsA[N_MAX] = {1};
    uint64_t offsSum = 0;
    uint64_t posA[N_MAX] = {0};

    for (int iSeq = 0; iSeq < nSeq; ++iSeq) {
        lenA[iSeq] = strlen(seqA[iSeq]);
        totLen *= lenA[iSeq] + 1;

        if (iSeq + 1 < nSeq) {
            offsA[iSeq + 1] = totLen;
        }

        offsSum += offsA[iSeq];
    }

    uint16_t* mat = calloc(totLen, 2);
    uint64_t idx = offsSum;

    for (;;) {
        for (uint64_t pos0 = 0; pos0 < lenA[0]; ++pos0) {
            char firstCh = seqA[0][pos0];
            int isSame = 1;
            uint16_t maxVal = mat[idx - 1];

            for (int iSeq = 1; iSeq < nSeq; ++iSeq) {
                char ch = seqA[iSeq][posA[iSeq]];
                isSame &= (ch == firstCh);

                uint16_t val = mat[idx - offsA[iSeq]];
                if (val > maxVal) {
                    maxVal = val;
                }
            }

            if (isSame) {
                mat[idx] = mat[idx - offsSum] + 1;
            } else {
                mat[idx] = maxVal;
            }

            ++idx;
        }

        idx -= lenA[0];

        int iSeq = 1;
        while (iSeq < nSeq && posA[iSeq] == lenA[iSeq] - 1) {
            posA[iSeq] = 0;
            idx -= (lenA[iSeq] - 1) * offsA[iSeq];
            ++iSeq;
        }
        if (iSeq == nSeq) {
            break;
        }

        ++posA[iSeq];
        idx += offsA[iSeq];
    }

    int score = mat[totLen - 1];

    char* resStr = malloc(score + 1);
    resStr[score] = '\0';

    for (int iSeq = 0; iSeq < nSeq; ++iSeq) {
        posA[iSeq] = lenA[iSeq] - 1;
    }

    idx = totLen - 1;
    int resIdx = score - 1;

    while (resIdx >= 0) {
        char firstCh = seqA[0][posA[0]];
        int isSame = 1;
        uint16_t maxVal = mat[idx - 1];
        int maxIdx = 0;

        for (int iSeq = 1; iSeq < nSeq; ++iSeq) {
            char ch = seqA[iSeq][posA[iSeq]];
            isSame &= (ch == firstCh);

            uint16_t val = mat[idx - offsA[iSeq]];
            if (val > maxVal) {
                maxVal = val;
                maxIdx = iSeq;
            }
        }

        if (isSame) {
            resStr[resIdx--] = firstCh;
            for (int iSeq = 0; iSeq < nSeq; ++iSeq) {
                --posA[iSeq];
            }
            idx -= offsSum;
        } else {
            --posA[maxIdx];
            idx -= offsA[maxIdx];
        }
    }

    free(mat);

    printf("score: %d\n", score);
    printf("%s\n", resStr);

    return 0;
}

跑步说明

跑步：

将代码保存在文件中，例如lcs.c。
使用高优化选项进行编译。我用了：
```
clang -O3 lcs.c
```
在Linux上，我会尝试：
```
gcc -Ofast lcs.c
```
以命令行参数给出的2至4个序列运行：
```
./a.out axbycz xaybzc
```
如有必要，请单引号命令行参数，因为示例中使用的字母包含shell特殊字符。

结果

test2.sh和test3.sh是丹尼斯的测试序列。我不知道正确的结果，但是输出看起来至少是合理的。

$ ./a.out axbycz xaybzc
score: 3
abc

$ time ./test2.sh 
score: 391
16638018802020>3??3232270178;47240;24331395?<=;99=?;178675;866002==23?87?>978891>=9<6<9381992>>7030829?255>6804:=3>:;60<9384=081;0:?66=51?0;5090724<85?>>:2>7>3175?::<9199;5=0:494<5<:7100?=95<91>1887>33>67710==;48<<327::>?78>77<6:2:02:<7=5?:;>97<993;57=<<=:2=9:8=118563808=962473<::8<816<464<1==925<:5:22?>3;65=>=;27?7:5>==3=4>>5>:107:20575347>=48;<7971<<245<??219=3991=<96<??735698;62?<98928

real  0m0.012s
user  0m0.008s
sys   0m0.003s

$ time ./test3.sh 
score: 269
662:2=2::=6;738395=7:=179=96662649<<;?82?=668;2?603<74:6;2=04=>6;=6>=121>1>;3=22=3=3;=3344>0;5=7>>7:75238;559133;;392<69=<778>3?593?=>9799<1>79827??6145?7<>?389?8<;;133=505>2703>02?323>;693995:=0;;;064>62<0=<97536342603>;?92034<?7:=;2?054310176?=98;5437=;13898748==<<?4

real  0m7.218s
user  0m6.701s
sys   0m0.513s

— 雷托·科拉迪（Reto Koradi）
source

尚不清楚，很抱歉，但是您必须打印LCS，而不仅仅是打印它的长度。

— 丹尼斯

@丹尼斯，我明白了。那时我的一些优化是徒劳的。我将不得不回到存储完整矩阵的版本，以便可以重建字符串。那将无法在n = 4的情况下运行，但是对于n = 3，它仍应在10秒以下完成。我仍然需要完整的矩阵，大约需要6-7秒。

— Reto Koradi

再次抱歉。问题还不是很清楚...当您发布输出时，我可以将其与BrainSteel的进行比较。程序报告的长度在n = 2时超出其输出长度5。顺便说一句，我必须定义N_MAX为一个宏，并添加编译器标志-std=c99以使用GCC编译您的代码。

— 丹尼斯

@丹尼斯没问题。它说解决方案“是一个字符串”，因此应该足够清楚。我几乎专门使用C ++，所以我不确定C中允许使用什么。这段代码最初是C ++，但是一旦我意识到我并没有真正使用任何C ++功能，就将其切换到了Mac上的Clang。对此感到满意，但默认情况下它可能使用其他C版本，或者更加宽容。

— Reto Koradi

@Dennis好，我添加了回溯逻辑，以便可以生成字符串。现在大约需要7秒，n = 3。

— Reto Koradi

由于存在错误，此答案目前无法使用。即将修复...

C，约35秒内2弦

这项工作仍在进行中（如可怕的混乱所示），但希望它能引发一些好的答案！

代码：

#include "stdlib.h"
#include "string.h"
#include "stdio.h"
#include "time.h"

/* For the versatility */
#define MIN_CODEPOINT 0x30
#define MAX_CODEPOINT 0x3F
#define NUM_CODEPOINT (MAX_CODEPOINT - MIN_CODEPOINT + 1)
#define CTOI(c) (c - MIN_CODEPOINT)

#define SIZE_ARRAY(x) (sizeof(x) / sizeof(*x))

int LCS(char** str, int num);
int getshared(char** str, int num);
int strcount(char* str, char c);

int main(int argc, char** argv) {
    char** str = NULL;
    int num = 0;
    int need_free = 0;
    if (argc > 1) {
        str = &argv[1];
        num = argc - 1;
    }
    else {
        scanf(" %d", &num);
        str = malloc(sizeof(char*) * num);
        if (!str) {
            printf("Allocation error!\n");
            return 1;
        }

        int i;
        for (i = 0; i < num; i++) {
            /* No string will exceed 1000 characters */
            str[i] = malloc(1001);
            if (!str[i]) {
                printf("Allocation error!\n");
                return 1;
            }

            scanf(" %1000s", str[i]);

            str[i][1000] = '\0';
        }

        need_free = 1;
    }

    clock_t start = clock();

    /* The call to LCS */
    int size = LCS(str, num);

    double dt = ((double)(clock() - start)) / CLOCKS_PER_SEC;
    printf("Size: %d\n", size);
    printf("Elapsed time on LCS call: %lf s\n", dt);

    if (need_free) {
        int i;
        for (i = 0; i < num; i++) {
            free(str[i]);
        }
        free(str);
    }

    return 0;
}

/* Some terribly ugly global variables... */
int depth, maximum, mapsize, lenmap[999999][2];
char* (strmap[999999][20]);
char outputstr[1000];

/* Solves the LCS problem on num strings str of lengths len */
int LCS(char** str, int num) {
    /* Counting variables */
    int i, j;

    if (depth + getshared(str, num) <= maximum) {
        return 0;
    }

    char* replace[num];
    char match;
    char best_match = 0;
    int earliest_set = 0;
    int earliest[num];
    int all_late;
    int all_early;
    int future;
    int best_future = 0;
    int need_future = 1;

    for (j = 0; j < mapsize; j++) {
        for (i = 0; i < num; i++)
            if (str[i] != strmap[j][i])
                break;
        if (i == num) {
            best_match = lenmap[j][0];
            best_future = lenmap[j][1];
            need_future = 0;
            if (best_future + depth < maximum || !best_match)
                goto J;
            else {
                match = best_match;
                goto K;
            }
        }
    }

    for (match = MIN_CODEPOINT; need_future && match <= MAX_CODEPOINT; match++) {

    K:
        future = 1;
        all_late = earliest_set;
        all_early = 1;
        for (i = 0; i < num; replace[i++]++) {
            replace[i] = strchr(str[i], match);
            if (!replace[i]) {
                future = 0;
                break;
            }

            if (all_early && earliest_set && replace[i] - str[i] > earliest[i])
                all_early = 0;
            if (all_late && replace[i] - str[i] < earliest[i])
                all_late = 0;
        }
        if (all_late) {
            future = 0;
        }

    I:
        if (future) {
            if (all_early || !earliest_set) {
                for (i = 0; i < num; i++) {
                    earliest[i] = (int)(replace[i] - str[i]);
                }
            }

            /* The recursive bit */
            depth++;
            future += LCS(replace, num);
            depth--;

            best_future = future > best_future ? (best_match = match), future : best_future;
        }
    }

    if (need_future) {
        for (i = 0; i < num; i++)
            strmap[mapsize][i] = str[i];
        lenmap[mapsize][0] = best_match;
        lenmap[mapsize++][1] = best_future;
    }

J:
    if (depth + best_future >= maximum) {
        maximum = depth + best_future;
        outputstr[depth] = best_match;
    }

    if (!depth) {
        mapsize = 0;
        maximum = 0;
        puts(outputstr);
    }

    return best_future;
}

/* Return the number of characters total (not necessarily in order) that the strings all share */
int getshared(char** str, int num) {
    int c, i, tot = 0, min;
    for (c = MIN_CODEPOINT; c <= MAX_CODEPOINT; c++) {
        min = strcount(str[0], c);
        for (i = 1; i < num; i++) {
            int count = strcount(str[i], c);
            if (count < min) {
                min = count;
            }
        }
        tot += min;
    }

    return tot;
}

/* Count the number of occurrences of c in str */
int strcount(char* str, char c) {
    int tot = 0;
    while ((str = strchr(str, c))) {
        str++, tot++;
    }
    return tot;
}

执行所有LCS计算的相关功能是LCS。上面的代码将定时调用该函数。

另存为main.c并编译：gcc -Ofast main.c -o FLCS

可以使用命令行参数或通过stdin运行该代码。使用stdin时，它需要多个字符串，然后是字符串本身。

~ Me$ ./FLCS "12345" "23456"
2345
Size: 4
Elapsed time on LCS call: 0.000056 s

要么：

~ Me$ ./FLCS
6 
2341582491248123139182371298371239813
2348273123412983476192387461293472793
1234123948719873491234120348723412349
1234129388234888234812834881423412373
1111111112341234128469128377293477377
1234691237419274737912387476371777273
1241231212323
Size: 13
Elapsed time on LCS call: 0.001594 s

在装有1.7Ghz Intel Core i7并提供了Dennis测试用例的Mac OS X盒子上，我们得到以下包含2个字符串的输出：

16638018800200>3??32322701784=4240;24331395?<;=99=?;178675;866002==23?87?>978891>=9<66=381992>>7030829?25265804:=3>:;60<9384=081;08?66=51?0;509072488>2>924>>>3175?::<9199;330:494<51:>748571?153994<45<??20>=3991=<962508?7<2382?489
Size: 386
Elapsed time on LCS call: 33.245087 s

这种方法与我在这里解决早期挑战的方法非常相似。除了先前的优化之外，我们现在还检查每次递归在字符串之间共享的字符总数，如果没有办法获得比现有字符串更长的子字符串，则提早退出。

现在，它可以处理2个字符串，但是在更多字符串上往往会崩溃。将会有更多的改进和更好的解释！

— 脑钢
source

我想我错过了一些东西。使用2个字符串不是一个经典的动态编程问题，需要大约1000 ^ 2个步骤来解决？换句话说，不到一秒。

@Lembik是的，应该。该方法专为处理2个以上的字符串而设计，但最终由于字符串长度的伸缩性太差而无法获得良好的结果。我有更多的技巧可以解决，如果其中任何一个确实有效，那么事情应该会大大改善。

— BrainSteel 2015年

某处似乎有问题。@ RetoKoradi的代码找到一个有效的共同对于n = 2串长度391的，而你的代码报告的386的长度，并打印长度229的字符串

— 丹尼斯

@Dennis Umm ...是的，是的...哦，亲爱的。好吧，这很尴尬。我正在研究:)我将编辑答案以反映该错误。

— BrainSteel