修补段落


32

本着Patch the Image的精神,这是一个类似的挑战,但带有文本。

挑战

腐烂折磨了您宝贵的文字!给定一个由ASCII字符组成的段落,该段落中有一个矩形孔,您的程序应尝试用适当的文本填充该孔,以使该段落尽可能地融合。

进一步的定义

  • 该孔将始终为矩形,并且可能会跨越多条线。
  • 只会有一个洞。
  • 请注意,孔不一定落在单词边界上(实际上,通常不会)。
  • 孔最多为输入段落的25%,但可能会重叠或延伸到“普通”文本的“末端”(请参见下面的Euclid或Badger示例)。
  • 由于发现孔不是挑战的重点,因此它将仅由井号组成#,以便于识别。
  • 输入段落中的其他任何位置都不会带有井号。
  • 您的代码无法在下面的示例中使用“常规”文本-它只会接收和处理带有漏洞的文本。
  • 输入可以是单个多行字符串,也可以是字符串数组(每行一个元素),也可以是文件等。-您可以选择最适合您的语言的语言。
  • 如果需要的话,可以采用可选的附加输入来详述孔的坐标(例如,坐标元组等)。
  • 请在您的提交中描述您的算法。

表决

要求选民根据算法填补文本空缺的程度来判断条目。一些建议包括以下内容:

  • 填充的区域是否与该段落的其余部分相匹配的空格和标点的近似分布?
  • 填充区域是否引入错误的语法?(例如,连续两个空格,一个句点和一个问号,一个错误的序列,例如, ,,等等)
  • 如果您起眼睛(因此您实际上并没有在阅读文字),您能看到孔过去的位置吗?
  • 如果孔外没有CamelCase单词,该孔是否包含任何单词?如果孔外没有大写字母,该孔中是否包含大写字母?如果孔外有很多大写字母,该孔是否包含一定比例的量?

有效性标准

为了使提交的内容被认为是有效的,不得在孔外(包括尾随空格)更改段落的任何文本。末尾的单个换行符是可选的。

测试用例

格式是代码块中的原始段落,后跟带孔的同一段落。带孔的段落将用于输入。

1(修补图像)

In a popular image editing software there is a feature, that patches (The term
used in image processing is inpainting as @minxomat pointed out.) a selected
area of an image, based on the information outside of that patch. And it does a
quite good job, considering it is just a program. As a human, you can sometimes
see that something is wrong, but if you squeeze your eyes or just take a short
glance, the patch seems to fill in the gap quite well.

In a popular image editing software there is a feature, that patches (The term
used in image processing is inpainting as @minxomat pointed out.) a selected
area of an image, #############information outside of that patch. And it does a
quite good job, co#############is just a program. As a human, you can sometimes
see that something#############t if you squeeze your eyes or just take a short
glance, the patch seems to fill in the gap quite well.

2(葛底斯堡地址)

But, in a larger sense, we can not dedicate, we can not consecrate, we can not
hallow this ground. The brave men, living and dead, who struggled here, have
consecrated it, far above our poor power to add or detract. The world will
little note, nor long remember what we say here, but it can never forget what
they did here. It is for us the living, rather, to be dedicated here to the
unfinished work which they who fought here have thus far so nobly advanced. It
is rather for us to be here dedicated to the great task remaining before us-
that from these honored dead we take increased devotion to that cause for which
they gave the last full measure of devotion-that we here highly resolve that
these dead shall not have died in vain-that this nation, under God, shall have
a new birth of freedom-and that government of the people, by the people, for
the people, shall not perish from the earth.

But, in a larger sense, we can not dedicate, we can not consecrate, we can not
hallow this ground. The brave men, living and dead, who struggled here, have
consecrated it, far above our poor power to add or detract. The world will
little note, nor long remember what we say here, but it can never forget what
they did here. It is for us the living, rather, to be dedicated here to the
unfinished work which they who fought here h######################advanced. It
is rather for us to be here dedicated to the######################before us-
that from these honored dead we take increas######################use for which
they gave the last full measure of devotion-######################solve that
these dead shall not have died in vain-that ######################, shall have
a new birth of freedom-and that government of the people, by the people, for
the people, shall not perish from the earth.

3(蜂胶)

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo conse################irure dolor in reprehenderit
in voluptate velit esse cil################giat nulla pariatur. Excepteur
sint occaecat cupidatat non################in culpa qui officia deserunt
mollit anim id est laborum.

4(Jabberwocky)

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.

'Twas brillig, and the slithy toves
Did gyre a######### in the wabe;
All mimsy #########borogoves,
And the mome raths outgrabe.

5(欧几里德毕达哥拉斯定理的证明)

1.Let ACB be a right-angled triangle with right angle CAB.
2.On each of the sides BC, AB, and CA, squares are drawn,
CBDE, BAGF, and ACIH, in that order. The construction of
squares requires the immediately preceding theorems in Euclid,
and depends upon the parallel postulate. [footnote 14]
3.From A, draw a line parallel to BD and CE. It will
perpendicularly intersect BC and DE at K and L, respectively.
4.Join CF and AD, to form the triangles BCF and BDA.
5.Angles CAB and BAG are both right angles; therefore C, A,
and G are collinear. Similarly for B, A, and H.
6.Angles CBD and FBA are both right angles; therefore angle ABD
equals angle FBC, since both are the sum of a right angle and angle ABC.
7.Since AB is equal to FB and BD is equal to BC, triangle ABD
must be congruent to triangle FBC.
8.Since A-K-L is a straight line, parallel to BD, then rectangle
BDLK has twice the area of triangle ABD because they share the base
BD and have the same altitude BK, i.e., a line normal to their common
base, connecting the parallel lines BD and AL. (lemma 2)
9.Since C is collinear with A and G, square BAGF must be twice in area
to triangle FBC.
10.Therefore, rectangle BDLK must have the same area as square BAGF = AB^2.
11.Similarly, it can be shown that rectangle CKLE must have the same
area as square ACIH = AC^2.
12.Adding these two results, AB^2 + AC^2 = BD × BK + KL × KC
13.Since BD = KL, BD × BK + KL × KC = BD(BK + KC) = BD × BC
14.Therefore, AB^2 + AC^2 = BC^2, since CBDE is a square.

1.Let ACB be a right-angled triangle with right angle CAB.
2.On each of the sides BC, AB, and CA, squares are drawn,
CBDE, BAGF, and ACIH, in that order. The construction of
squares requires the immediately preceding theorems in Euclid,
and depends upon the parallel postulate. [footnote 14]
3.From A, draw a line parallel to BD and CE. It will
perpendicularly intersect BC and DE at K and L, respectively.
4.Join CF and AD, to form the triangles BCF and BDA.
5.Angles CAB and BAG are both right angles; therefore C, A,
and G are #############milarly for B, A, and H.
6.Angles C#############e both right angles; therefore angle ABD
equals ang############# both are the sum of a right angle and angle ABC.
7.Since AB#############FB and BD is equal to BC, triangle ABD
must be co#############iangle FBC.
8.Since A-#############ight line, parallel to BD, then rectangle
BDLK has t############# of triangle ABD because they share the base
BD and hav#############titude BK, i.e., a line normal to their common
base, conn#############rallel lines BD and AL. (lemma 2)
9.Since C #############with A and G, square BAGF must be twice in area
to triangl#############
10.Therefo############# BDLK must have the same area as square BAGF = AB^2.
11.Similar############# shown that rectangle CKLE must have the same
area as square ACIH = AC^2.
12.Adding these two results, AB^2 + AC^2 = BD × BK + KL × KC
13.Since BD = KL, BD × BK + KL × KC = BD(BK + KC) = BD × BC
14.Therefore, AB^2 + AC^2 = BC^2, since CBDE is a square.

6(Badger,Badger,weebl的Badger)

Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mushroom, mushroom, a-
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mushroom, mushroom, a-
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mush-mushroom, a
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Argh! Snake, a snake!
Snaaake! A snaaaake, oooh its a snake!

Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mushroom, mushroom, a-
Badger##################badger, badger,
badger##################badger, badger
Mushro##################
Badger##################badger, badger,
badger##################badger, badger
Mush-mushroom, a
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Argh! Snake, a snake!
Snaaake! A snaaaake, oooh its a snake!

我可以假设孔的宽度至少为三个字符
Rohan Jhunjhunwala

@RohanJhunjhunwala好的。给定文本的大小,这是一个相当安全的假设。
AdmBorkBork,

葛底斯堡的例子显然包含破折号,而不是纯ASCII。只是指出这一点是因为您在评论中说了您将使用普通ascii测试用例的答案之一。
SuperJedi224

@ SuperJedi224谢谢-固定。
AdmBorkBork

Answers:


15

Python 2

我知道@atlasologist已经在Python 2中发布了一个解决方案,但是我的工作方式有些不同。这是通过从上到下,从左到右遍历所有孔,向后看5个字符并在上面的字符并找到匹配的字符来实现的。如果找到多个字符,它将选择最常见的一个。如果没有找到字符,它将删除上述字符限制。如果仍然找不到字符,它将减少回溯并重复的字符数量。

def fix(paragraph, holeChar = "#"):
    lines = paragraph.split("\n")
    maxLineWidth = max(map(len, lines))
    lines = [list(line + " " * (maxLineWidth - len(line))) for line in lines]
    holes = filter(lambda pos: lines[pos[0]][pos[1]] == holeChar, [[y, x] for x in range(maxLineWidth) for y in range(len(lines))])

    n = 0
    for hole in holes:
        for i in range(min(hole[1], 5), 0, -1):
            currCh = lines[hole[0]][hole[1]]
            over = lines[hole[0] - 1][hole[1]]
            left = lines[hole[0]][hole[1] - i : hole[1]]

            same = []
            almost = []
            for y, line in enumerate(lines):
                for x, ch in enumerate(line):
                    if ch == holeChar:
                        continue
                    if ch == left[-1] == " ":
                        continue
                    chOver = lines[y - 1][x]
                    chLeft = lines[y][x - i : x]
                    if chOver == over and chLeft == left:
                        same.append(ch)
                    if chLeft == left:
                        almost.append(ch)
            sortFunc = lambda x, lst: lst.count(x) / (paragraph.count(x) + 10) + lst.count(x)
            if same:
                newCh = sorted(same, key=lambda x: sortFunc(x, same))[-1]
            elif almost:
                newCh = sorted(almost, key=lambda x: sortFunc(x, almost))[-1]
            else:
                continue
            lines[hole[0]][hole[1]] = newCh
            break


    return "\n".join(map("".join, lines))

这是Badger,Badger,Badger的结果:

Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger 
Mushroom, mushroom, a-                 
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger 
Mushroom, mushroom, a- b               
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger 
Mush-mushroom, a                       
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger 
Argh! Snake, a snake!                  
Snaaake! A snaaaake, oooh its a snake! 

这是证明的结果:

1.Let ACB be a right-angled triangle with right angle CAB.                 
2.On each of the sides BC, AB, and CA, squares are drawn,                  
CBDE, BAGF, and ACIH, in that order. The construction of                   
squares requires the immediately preceding theorems in Euclid,             
and depends upon the parallel postulate. [footnote 14]                     
3.From A, draw a line parallel to BD and CE. It will                       
perpendicularly intersect BC and DE at K and L, respectively.              
4.Join CF and AD, to form the triangles BCF and BDA.                       
5.Angles CAB and BAG are both right angles; therefore C, A,                
and G are the same areamilarly for B, A, and H.                            
6.Angles CAB and CA, sqe both right angles; therefore angle ABD            
equals angle ABD becaus both are the sum of a right angle and angle ABC.   
7.Since ABD because theFB and BD is equal to BC, triangle ABD              
must be construction ofiangle FBC.                                         
8.Since A-angle ABD becight line, parallel to BD, then rectangle           
BDLK has the same area  of triangle ABD because they share the base        
BD and have the base thtitude BK, i.e., a line normal to their common      
base, conngle and G, sqrallel lines BD and AL. (lemma 2)                   
9.Since C = BD × BK + with A and G, square BAGF must be twice in area     
to triangle FBC. (lemma                                                    
10.Therefore angle and  BDLK must have the same area as square BAGF = AB^2.
11.Similarly for B, A,  shown that rectangle CKLE must have the same       
area as square ACIH = AC^2.                                                
12.Adding these two results, AB^2 + AC^2 = BD × BK + KL × KC             
13.Since BD = KL, BD × BK + KL × KC = BD(BK + KC) = BD × BC             
14.Therefore, AB^2 + AC^2 = BC^2, since CBDE is a square.

Jabberwocky的结果是:

'Twas brillig, and the slithy toves
Did gyre and the mo in the wabe;   
All mimsy toves, anborogoves,      
And the mome raths outgrabe.       

5
那首《 ger》令人印象深刻,而贾伯沃基(Jabberwocky)似乎可能是一首合法的诗。辛苦了
AdmBorkBork

6

Python 2

这是一个非常简单的解决方案。它创建一个示例字符串,该字符串由平均单词长度A-(A/ 2)和A+(A / 2),然后将前导和尾随空间修整后的大块从示例应用到补丁区域。它不能处理大写字母,而且我敢肯定有一个曲线球测试用例可以打破大写字母,但是在示例中还可以。请参阅下面的链接以运行所有测试。

我还在代码中添加了补丁,以很好地解决问题。

def patch(paragraph):
    sample = [x.split() for x in paragraph if x.count('#') < 1]
    length = max([x.count('#') for x in paragraph if x.find('#')])
    s = sum(####################
    sample,[####################
    ])      ####################
    avg=sum(####################
    len(w)  ####################
    for w in####################
    s)//len(s)
    avg_range = range(avg-(avg//2),avg+(avg//2))
    sample = filter(lambda x:len(x) in avg_range, s)
    height=0
    for line in paragraph:
        if line.find('#'):height+=1
        print line.replace('#'*length,' '.join(sample)[(height-1)*length:height*length].strip())
    print '\n'

Lorem Ipsum,原始然后修补:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo conseore dolore magnairure dolor in reprehenderit
in voluptate velit esse cilenim minim quisgiat nulla pariatur. Excepteur
sint occaecat cupidatat nonnisi mollit aniin culpa qui officia deserunt
mollit anim id est laborum.

试试吧


3
呵呵mushroger...
AdmBorkBork '16

好吧,它不会以有趣的方式修补您的代码。
mbomb007 '16

@ mbomb007是因为#代码中的其他字符。
Atlasologist '16

@atlasologist即使您将它们更改为其他类似的内容@,也没什么有趣的。
mbomb007'8

4

爪哇莎士比亚

谁需要掌握标准的英语约定?自己动手!就像吟游诗人被允许编造自己的话一样。这个bot不用担心更正截断单词,他实际上只是插入随机单词。结果是一些美丽的诗歌。作为一项额外功能,吟游诗人口径更高,并且只要它们的大小相同,就可以处理多个孔!


样本输入

 我们希望从最美的生物中增加,
  这样美丽的玫瑰可能永远不会消失,
  但是随着时间的推移,成熟的时间会越来越长,
  他的嫩人#############承载着他的记忆:
  但是你自己却拥有明亮的眼睛,
  给第一个#############同时添加自燃燃料,
  饥荒在哪里,
  你自己的仇敌,对你甜美的自我太残酷:
  您认为这是当今世界上最新鲜的装饰品,
  只有预示着艳丽的春天,
  在你自己的芽中,你的内心最快乐,
  嫩的churl mak'st是############# ding:
    怜悯世界,否则t ############ be
    要吃世界应得的东西,b #############然后你。


                     2
  当四十个冬天围困你的额头时,
  在你美丽的领域挖深沟,
  你的青年骄傲的制服现在注视着,
  将是一堆破烂不堪的小杂草:  
  然后被问到,你所有的美丽都在哪里,
  你光辉岁月的所有宝藏;
  要说在你自己深沉的眼睛里,
  简直是全食的耻辱,而且是不懈的赞美。
  你的美丽应该得到更多的赞美,
  如果您能回答“我这个美丽的孩子
  应该加总我的数量,并成为我的老借口。
  通过继承证明他的美丽。
    当你老的时候,这是新的,
    当你感到冷的时候,看看你的血液温暖。


                     3
  看着你的玻璃杯,告诉你你的脸,
  现在是时候换一张脸了,
  如果您现在不续约,可以进行新的维修,
  你迷惑世界,保佑一些母亲。
  她在哪儿这么公平呢?
  鄙视耕种耕种?
  否则他将是谁那么喜欢这个坟墓,
  要停止后代的自爱?  
  你是你母亲的玻璃杯,她在你里面
  回顾她黄金时期的美好四月,
  所以你要透过你的年龄的窗户看到,
  尽管############# s是您的黄金时间。
    但是如果#############
    死了一个############# image与您死亡。

漂亮的输出

 我们希望从最美的生物中增加,
  这样美丽的玫瑰可能永远不会消失,
  但是随着时间的推移,成熟的时间会越来越长,
  他的温柔应该引起他的记忆:
  但是,你们所有的人都拥有自己明亮的眼睛,
  用自给自足的燃料饲料
  饥荒在哪里,
  你自己的仇敌,对你甜美的自我太残酷:
  您认为这是当今世界上最新鲜的装饰品,
  只有预示着艳丽的春天,
  在你自己的芽中,你的内心最快乐,
  而温柔的笑声是他你我的叮当:
    怜悯世界,否则就这样,
    为了吃世界应得的东西,你就这样。


                     2
  当四十个冬天围困你的额头时,
  在你美丽的领域挖深沟,
  你的青年骄傲的制服现在注视着,
  将是一堆破烂不堪的小杂草:  
  然后被问到,你所有的美丽都在哪里,
  你光辉岁月的所有宝藏;
  要说在你自己深沉的眼睛里,
  简直是全食的耻辱,而且是不懈的赞美。
  你的美丽应该得到更多的赞美,
  如果您能回答“我这个美丽的孩子
  应该加总我的数量,并成为我的老借口。
  通过继承证明他的美丽。
    当你老的时候,这是新的,
    当你感到冷的时候,看看你的血液温暖。


                     3
  看着你的玻璃杯,告诉你你的脸,
  现在是时候换一张脸了,
  如果您现在不续约,可以进行新的维修,
  你迷惑世界,保佑一些母亲。
  她在哪儿这么公平呢?
  鄙视耕种耕种?
  否则他将是谁那么喜欢这个坟墓,
  要停止后代的自爱?  
  你是你母亲的玻璃杯,她在你里面
  回顾她黄金时期的美好四月,
  所以你要透过你的年龄的窗户看到,
  尽管Look凝视着您的黄金时间。
    但是如果不存在,
    模具与您修复图像模具。

如果我自己这么说的话,最后几句话很诗意。它在葛底斯堡演说中的表现也令人惊讶。

But, in a larger sense, we can not dedicate, we can not consecrate, we can not
hallow this ground. The brave men, living and dead, who struggled here, have
consecrated it, far above our poor power to add or detract. The world will
little note, nor long remember what we say here, but it can never forget what
they did here. It is for us the living, rather, to be dedicated here to the
unfinished work which they who fought here h to of rather us of advanced. It
is rather for us to be here dedicated to the who be it, vain who before us 
that from these honored dead we take increas be dead the the what use for which
they gave the last full measure of devotion  dead government The solve that
these dead shall not have died in vain that  the take nor world , shall have
a new birth of freedom and that government of the people, by the people, for
the people, shall not perish from the earth.


让我们来看看是什么使莎士比亚滴答作响。这是代码。本质上,他努力根据输入内容建立词汇库。然后,他使用这些单词并将它们随机放置在孔中(以确保它非常合适)。他具有确定性,因为他使用固定种子进行随机性。

package stuff;

import java.io.File;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.Random;
import java.util.Scanner;
import java.util.Stack;

/**
 *
 * @author rohan
 */
public class PatchTheParagraph {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.println("File Name :");
        String[] text = getWordsFromFile(in.nextLine());
System.out.println("==ORIGINAL==");
        for(String s:text){
    System.out.println(s);
}
                    int lengthOfHole= 0;
        int rows = 0;
            for(String s: text){
                s = s.replaceAll("[^#]", "");

//      System.out.println(s);
                if(s.length()>0){
                    lengthOfHole = s.length();
                rows++;
                }
            }
            ArrayList<String> words = new ArrayList<>();
            words.add("I");
            for(String s:text){
                String[] w = s.replaceAll("#", " ").split(" ");
for(String a :w){
    words.add(a);
            }

            }
                        Iterator<String> j = words.iterator();
            while(j.hasNext()){
                String o;
                if((o = j.next()).equals("")){
                    j.remove();
                }
            }
            System.out.println(words);
            Stack<String> out = new Stack<>();
            String hashRow = "";
            for(int i = 0;i<lengthOfHole;i++){
                hashRow+="#";
            }

        for(int i = 0;i<rows;i++){
            int length = lengthOfHole-1; 
            String outPut = " ";
            while(length>2){
String wordAttempt = words.get(getRandom(words.size()-1));
while(wordAttempt.length()>length-1){
 wordAttempt = words.get(getRandom(words.size()-1));
}           
length -= wordAttempt.length()+1;
            outPut+=wordAttempt;
                outPut+=" ";
            }
        out.push(outPut);
    }
System.out.println("==PATCHED==");
        for(String s : text){
            if(s.contains(hashRow)){
                System.out.println(s.replaceAll(hashRow,out.pop()));
            }else{
                System.out.println(s);
            }
        }
                                    }
public static final Random r = new Random(42);
    public static int getRandom(int max){
    return (int) (max*r.nextDouble());
}
    /**
     *
     * @param fileName is the path to the file or just the name if it is local
     * @return the number of lines in fileName
     */
    public static int getLengthOfFile(String fileName) {
        int length = 0;
        try {
            File textFile = new File(fileName);
            Scanner sc = new Scanner(textFile);
            while (sc.hasNextLine()) {
                sc.nextLine();
                length++;
            }
        } catch (Exception e) {
System.err.println(e);
        }
        return length;
    }

    /**
     *
     * @param fileName is the path to the file or just the name if it is local
     * @return an array of Strings where each string is one line from the file
     * fileName.
     */
    public static String[] getWordsFromFile(String fileName) {
        int lengthOfFile = getLengthOfFile(fileName);
        String[] wordBank = new String[lengthOfFile];
        int i = 0;
        try {
            File textFile = new File(fileName);
            Scanner sc = new Scanner(textFile);
            for (i = 0; i < lengthOfFile; i++) {
                wordBank[i] = sc.nextLine();
            }
            return wordBank;
        } catch (Exception e) {
            System.err.println(e);
            System.exit(55);
        }
        return null;
    }
}


莎士比亚的大部分诗歌都是公共领域的。


评论不作进一步讨论;此对话已转移至聊天
丹尼斯

3

Python 2.7

另一种采用不同方法的Python解决方案。我的程序将文本视为马尔可夫链,其中每个字母后跟另一个具有一定概率的字母。因此,第一步是建立概率表。下一步是将该概率应用于补丁。

完整代码,包括一个示例文本如下。因为一个示例使用了unicode字符,所以我加入了一个明确的代码页(utf-8)以与该示例兼容。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from collections import defaultdict
import numpy

texts = [
"""'Twas brillig, and the slithy toves
Did gyre a######### in the wabe;
All mimsy #########borogoves,
And the mome raths outgrabe."""
]

class Patcher:
    def __init__(self):
        self.mapper = defaultdict(lambda: defaultdict(int))

    def add_mapping(self, from_value, to_value):
        self.mapper[from_value][to_value] += 1

    def get_patch(self, from_value):
        if from_value in self.mapper:
            sum_freq = sum(self.mapper[from_value].values())
            return numpy.random.choice(
                self.mapper[from_value].keys(),
                p = numpy.array(
                    self.mapper[from_value].values(),dtype=numpy.float64) / sum_freq)
        else:
            return None

def add_text_mappings(text_string, patcher = Patcher(), ignore_characters = ''):
    previous_letter = text_string[0]
    for letter in text_string[1:]:
        if not letter in ignore_characters:
            patcher.add_mapping(previous_letter, letter)
            previous_letter = letter
    patcher.add_mapping(text_string[-1], '\n')

def patch_text(text_string, patcher, patch_characters = '#'):
    result = previous_letter = text_string[0]
    for letter in text_string[1:]:
        if letter in patch_characters:
            result += patcher.get_patch(previous_letter)
        else:
            result += letter
        previous_letter = result[-1]
    return result

def main():
    for text in texts:
        patcher = Patcher()
        add_text_mappings(text, patcher, '#')
        print patch_text(text, patcher, '#')
        print "\n"

if __name__ == '__main__':
    main()

Lorem Ipsum的样本输出:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo conse Exe eut ccadamairure dolor in reprehenderit
in voluptate velit esse cilore indipserexepgiat nulla pariatur. Excepteur
sint occaecat cupidatat non upir alostat adin culpa qui officia deserunt
mollit anim id est laborum.

贾伯沃基(Jabberwocky)中另一条诗意的诗句:

'Twas brillig, and the slithy toves
Did gyre and me the in the wabe;
All mimsy was
An inborogoves,
And the mome raths outgrabe.

哪个示例文本具有Unicode?它们都应为纯ASCII。请让我知道,我会予以纠正。
AdmBorkBork '16

Python在第一个文本中抱怨了mınxomaτ,参考PEP 263
13:59

啊-甚至都没有意识到。我已经将其编辑为纯ASCII。谢谢你让我知道!
AdmBorkBork '16

2

C#5 一如既往的庞大

我把它们放在一起,有点混乱,但有时会产生一些好的结果。它是一种主要是确定性的算法,但添加了一些(固定种子)随机性,以避免针对相似的间隙产生相同的字符串。尽力避免在间隙的两边仅留有几列空格。

它通过将输入标记为单词和标点符号来工作(标点符号来自手动输入的列表,因为如果Unicode可以帮我解决这个问题,我就不烦了),以便可以在单词之前而不是之前放置空格标点符号,因为这是相当典型的。它在典型的空白处分割。按照马尔科夫链的观点(我认为),它计算每个标记彼此跟随的频率,然后为此计算概率(我认为由于文档太小,我们最好偏向事物我们可以在很多地方看到)。然后,我们执行广度优先搜索,填充哈希和“偏”字两边剩下的空间,其成本计算为-fabness(last, cur) * len(cur_with_space),其中fabness返回次数cur也跟着last对于生成的字符串中的每个附加令牌。当然,我们会尽量降低成本。由于我们不能总是用文档中找到的单词和标点符号来填补空白,因此它还会考虑来自某些状态的许多“特殊”标记,包括两边的部分字符串,我们对这些特殊标记的使用会随心所欲地增加成本。

如果BFS找不到解决方案,那么我们会天真地尝试选择一个随机副词,或者只是插入空格来填充空格。

结果

所有6个都可以在这里找到:https : //gist.github.com/anonymous/5277db726d3f9bdd950b173b19fec82a

Euclid测试用例进行得并不顺利。

修补图像

In a popular image editing software there is a feature, that patches (The term
used in image processing is inpainting as @minxomat pointed out.) a selected
area of an image, that patches information outside of that patch. And it does a
quite good job, co the patch a is just a program. As a human, you can sometimes
see that something In a short it if you squeeze your eyes or just take a short
glance, the patch seems to fill in the gap quite well.

贾伯沃基

'Twas brillig, and the slithy toves
Did gyre and the in in the wabe;
All mimsy the mome borogoves,
And the mome raths outgrabe.

Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mushroom, mushroom, a-
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mushroom, badger, badger
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mush-mushroom, a
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Argh! Snake, a snake!
Snaaake! A snaaaake, oooh its a snake!

_我对这一结果如何感到高兴...很幸运,“ bad,r”很合适,否则这件事情做得不好

运行它

csc ParaPatch.cs
ParaPatch.exe infile outfile

有很多。唯一的远程有趣的位是Fill方法。我包括堆实现,因为.NET没有一个(为什么MS为什么?!)。

using System;
using System.Collections.Generic;
using System.Linq;

namespace ParaPatch
{
    class Program
    {
        private static string[] Filler = new string[] { "may", "will", "maybe", "rather", "perhaps", "reliably", "nineword?", "definitely", "elevenword?", "inexplicably" }; // adverbs
        private static char[] Breaking = new char[] { ' ', '\n', '\r', '\t' };
        private static char[] Punctuation = new char[] { ',', '.', '{', '}', '(', ')', '/', '?', ':', ';', '\'', '\\', '"', ',', '!', '-', '+', '[', ']', '£', '$', '%', '^', '—' };

        private static IEnumerable<string> TokenizeStream(System.IO.StreamReader reader)
        {
            System.Text.StringBuilder sb = new System.Text.StringBuilder();

            HashSet<char> breaking = new HashSet<char>(Breaking);
            HashSet<char> punctuation = new HashSet<char>(Punctuation);

            while (!reader.EndOfStream)
            {
                int ci = reader.Read();
                if (ci == -1) // sanity
                    break;

                char c = (char)ci;

                if (breaking.Contains(c))
                {
                    if (sb.Length > 0)
                        yield return sb.ToString();
                    sb.Clear();
                }
                else if (punctuation.Contains(c))
                {
                    if (sb.Length > 0)
                        yield return sb.ToString();
                    yield return ""+c;
                    sb.Clear();
                }
                else
                {

                    sb.Append(c);
                }
            }

            if (sb.Length > 0)
                yield return sb.ToString();
        }

        private enum DocTokenTypes
        {
            Known,
            LeftPartial,
            RightPartial,
            Unknown,
        }

        private class DocToken
        {
            public DocTokenTypes TokenType { get; private set; }
            public string StringPart { get; private set; }
            public int Length { get; private set; }

            public DocToken(DocTokenTypes tokenType, string stringPart, int length)
            {
                TokenType = tokenType;
                StringPart = stringPart;
                Length = length;
            }
        }

        private static IEnumerable<DocToken> DocumentTokens(IEnumerable<string> tokens)
        {
            foreach (string token in tokens)
            {
                if (token.Contains("#"))
                {
                    int l = token.IndexOf("#");
                    int r = token.LastIndexOf("#");

                    if (l > 0)
                        yield return new DocToken(DocTokenTypes.LeftPartial, token.Substring(0, l), l);

                    yield return new DocToken(DocTokenTypes.Unknown, null, r - l + 1);

                    if (r < token.Length - 1)
                        yield return new DocToken(DocTokenTypes.RightPartial, token.Substring(r + 1), token.Length - r - 1);
                }
                else
                    yield return new DocToken(DocTokenTypes.Known, token, token.Length);
            }
        }

        private class State : IComparable<State>
        {
            // missing readonly params already... maybe C#6 isn't so bad
            public int Remaining { get; private set; }
            public int Position { get; private set; }
            public State Prev { get; private set; }
            public string Token { get; private set; }
            public double H { get; private set; }
            public double Fabness { get; private set; }
            public string FullFilling { get; private set; }

            public State(int remaining, int position, Program.State prev, double fabness, double h, string token, string toAdd)
            {
                Remaining = remaining;
                Position = position;
                Prev = prev;
                H = h;
                Fabness = fabness;
                Token = token;

                FullFilling = prev != null ? prev.FullFilling + toAdd : toAdd;
            }

            public int CompareTo(State other)
            {
                return H.CompareTo(other.H);
            }
        }

        public static void Main(string[] args)
        {
            if (args.Length < 2)
                args = new string[] { "test.txt", "testout.txt" };

            List<DocToken> document;
            using (System.IO.StreamReader reader = new System.IO.StreamReader(args[0], System.Text.Encoding.UTF8))
            {
                document = DocumentTokens(TokenizeStream(reader)).ToList();
            }

            foreach (DocToken cur in document)
            {
                Console.WriteLine(cur.StringPart + " " + cur.TokenType);
            }

            // these are small docs, don't bother with more than 1 ply
            Dictionary<string, Dictionary<string, int>> FollowCounts = new Dictionary<string, Dictionary<string, int>>();
            Dictionary<string, Dictionary<string, int>> PreceedCounts = new Dictionary<string, Dictionary<string, int>>(); // mirror (might be useful)

            HashSet<string> knowns = new HashSet<string>(); // useful to have lying around

            // build counts
            DocToken last = null;
            foreach (DocToken cur in document)
            {
                if (cur.TokenType == DocTokenTypes.Known)
                {
                    knowns.Add(cur.StringPart);
                }

                if (last != null && last.TokenType == DocTokenTypes.Known && cur.TokenType == DocTokenTypes.Known)
                {
                    {
                        Dictionary<string, int> ltable;
                        if (!FollowCounts.TryGetValue(last.StringPart, out ltable))
                        {
                            FollowCounts.Add(last.StringPart, ltable = new Dictionary<string, int>());
                        }

                        int count;
                        if (!ltable.TryGetValue(cur.StringPart, out count))
                        {
                            count = 0;
                        }
                        ltable[cur.StringPart] = count + 1;
                    }


                    {
                        Dictionary<string, int> ctable;
                        if (!PreceedCounts.TryGetValue(cur.StringPart, out ctable))
                        {
                            PreceedCounts.Add(cur.StringPart, ctable = new Dictionary<string, int>());
                        }

                        int count;
                        if (!ctable.TryGetValue(last.StringPart, out count))
                        {
                            count = 0;
                        }
                        ctable[last.StringPart] = count + 1;
                    }
                }

                last = cur;
            }

            // build probability grid (none of this efficient table filling dynamic programming nonsense, A* all the way!)
            // hmm... can't be bothered
            Dictionary<string, Dictionary<string, double>> fabTable = new Dictionary<string, Dictionary<string, double>>();
            foreach (var k in FollowCounts)
            {
                Dictionary<string, double> t = new Dictionary<string, double>();

                // very naive
                foreach (var k2 in k.Value)
                {
                    t.Add(k2.Key, (double)k2.Value);
                }

                fabTable.Add(k.Key, t);
            }

            string[] knarr = knowns.ToArray();
            Random rnd = new Random("ParaPatch".GetHashCode());

            List<string> fillings = new List<string>();
            for (int i = 0; i < document.Count; i++)
            {
                if (document[i].TokenType == DocTokenTypes.Unknown)
                {
                    // shuffle knarr
                    for (int j = 0; j < knarr.Length; j++)
                    {
                        string t = knarr[j];
                        int o = rnd.Next(knarr.Length);
                        knarr[j] = knarr[o];
                        knarr[o] = t;
                    }

                    fillings.Add(Fill(document, fabTable, knarr, i));
                    Console.WriteLine(fillings.Last());
                }
            }

            string filling = string.Join("", fillings);

            int fi = 0;

            using (System.IO.StreamWriter writer = new System.IO.StreamWriter(args[1]))
            using (System.IO.StreamReader reader = new System.IO.StreamReader(args[0]))
            {
                while (!reader.EndOfStream)
                {
                    int ci = reader.Read();
                    if (ci == -1)
                        break;

                    char c = (char)ci;
                    c = c == '#' ? filling[fi++] : c;

                    writer.Write(c);
                    Console.Write(c);
                }
            }

//            using (System.IO.StreamWriter writer = new System.IO.StreamWriter(args[1], false, System.Text.Encoding.UTF8))
//            using (System.IO.StreamReader reader = new System.IO.StreamReader(args[0]))
//            {
//                foreach (char cc in reader.ReadToEnd())
//                {
//                    char c = cc;
//                    c = c == '#' ? filling[fi++] : c;
//                    
//                    writer.Write(c);
//                    Console.Write(c);
//                }
//            }

            if (args[0] == "test.txt")
                Console.ReadKey(true);
        }

        private static string Fill(List<DocToken> document, Dictionary<string, Dictionary<string, double>> fabTable, string[] knowns, int unknownIndex)
        {
            HashSet<char> breaking = new HashSet<char>(Breaking);
            HashSet<char> punctuation = new HashSet<char>(Punctuation);

            Heap<State> due = new Heap<Program.State>(knowns.Length);

            Func<string, string, double> fabness = (prev, next) =>
            {
                Dictionary<string, double> table;
                if (!fabTable.TryGetValue(prev, out table))
                    return 0; // not fab
                double fab;
                if (!table.TryGetValue(next, out fab))
                    return 0; // not fab
                return fab; // yes fab
            };

            DocToken mostLeft = unknownIndex > 2 ? document[unknownIndex - 2] : null;
            DocToken left = unknownIndex > 1 ? document[unknownIndex - 1] : null;
            DocToken unknown = document[unknownIndex];
            DocToken right = unknownIndex < document.Count - 2 ? document[unknownIndex + 1] : null;
            DocToken mostRight = unknownIndex < document.Count - 3 ? document[unknownIndex + 2] : null;

            // sum of empty space and partials' lengths
            int spaceSize = document[unknownIndex].Length
                + (left != null && left.TokenType == DocTokenTypes.LeftPartial ? left.Length : 0)
                + (right != null && right.TokenType == DocTokenTypes.RightPartial ? right.Length : 0);

            int l = left != null && left.TokenType == DocTokenTypes.LeftPartial ? left.Length : 0;
            int r = l + unknown.Length;

            string defaultPrev =
                left != null && left.TokenType == DocTokenTypes.Known ? left.StringPart :
                mostLeft != null && mostLeft.TokenType == DocTokenTypes.Known ? mostLeft.StringPart :
                "";

            string defaultLast =
                right != null && right.TokenType == DocTokenTypes.Known ? right.StringPart :
                mostRight != null && mostRight.TokenType == DocTokenTypes.Known ? mostRight.StringPart :
                "";

            Func<string, string> topAndTail = str =>
            {
                return str.Substring(l, r - l);
            };

            Func<State, string, double, bool> tryMove = (State prev, string token, double specialFabness) => 
            {
                bool isPunctionuation = token.Length == 1 && punctuation.Contains(token[0]);
                string addStr = isPunctionuation || prev == null ? token : " " + token;
                int addLen = addStr.Length;

                int newRemaining = prev != null ? prev.Remaining - addLen : spaceSize - addLen;
                int oldPosition = prev != null ? prev.Position : 0;
                int newPosition = oldPosition + addLen;

                // check length
                if (newRemaining < 0)
                    return false;

                // check start
                if (oldPosition < l) // implies left is LeftPartial
                {
                    int s = oldPosition;
                    int e = newPosition > l ? l : newPosition;
                    int len = e - s;
                    if (addStr.Substring(0, len) != left.StringPart.Substring(s, len))
                        return false; // doesn't match LeftPartial
                }

                // check end
                if (newPosition > r) // implies right is RightPartial
                {
                    int s = oldPosition > r ? oldPosition : r;
                    int e = newPosition;
                    int len = e - s;
                    if (addStr.Substring(s - oldPosition, len) != right.StringPart.Substring(s - r, len))
                        return false; // doesn't match RightPartial
                }

                if (newRemaining == 0)
                {
                    // could try to do something here (need to change H)
                }

                string prevToken = prev != null ? prev.Token : defaultPrev;
                bool isLastunctionuation = prevToken.Length == 1 && punctuation.Contains(prevToken[0]);

                if (isLastunctionuation && isPunctionuation) // I hate this check, it's too aggresive to be realistic
                    specialFabness -= 50;

                double fab = fabness(prevToken, token);

                if (fab < 1 && (token == prevToken))
                    fab = -1; // bias against unrecognised repeats

                double newFabness = (prev != null ? prev.Fabness : 0.0)
                    - specialFabness // ... whatever this is
                    - fab * addLen; // how probabilistic

                double h = newFabness; // no h for now

                State newState = new Program.State(newRemaining, newPosition, prev, newFabness, h, token, addStr);

//                Console.WriteLine((prev != null ? prev.Fabness : 0) + "\t" + specialFabness);
//                Console.WriteLine(newFabness + "\t" + h + "\t" + due.Count + "\t" + fab + "*" + addLen + "\t" + newState.FullFilling);

                due.Add(newState);
                return true;
            };

            // just try everything everything
            foreach (string t in knowns)
                tryMove(null, t, 0);

            if (left != null && left.TokenType == DocTokenTypes.LeftPartial)
                tryMove(null, left.StringPart, -1);

            while (!due.Empty)
            {
                State next = due.RemoveMin();

                if (next.Remaining == 0)
                {
                    // we have a winner!!
                    return topAndTail(next.FullFilling);
                }

                // just try everything
                foreach (string t in knowns)
                    tryMove(next, t, 0);
                if (right != null && right.TokenType == DocTokenTypes.RightPartial)
                    tryMove(next, right.StringPart, -5); // big bias
            }

            // make this a tad less stupid, non?
            return Filler.FirstOrDefault(f => f.Length == unknown.Length) ?? new String(' ', unknown.Length); // oh dear...
        }
    }

    //
    // Ultilities
    //

    public class Heap<T> : System.Collections.IEnumerable where T : IComparable<T>
    {
        // arr is treated as offset by 1, all idxes stored need to be -1'd to get index in arr
        private T[] arr;
        private int end = 0;

        private void s(int idx, T val)
        {
            arr[idx - 1] = val;
        }

        private T g(int idx)
        {
            return arr[idx - 1];
        }

        public Heap(int isize)
        {
            if (isize < 1)
                throw new ArgumentException("Cannot be less than 1", "isize");

            arr = new T[isize];
        }

        private int up(int idx)
        {
            return idx / 2;
        }

        private int downLeft(int idx)
        {
            return idx * 2;
        }

        private int downRight(int idx)
        {
            return idx * 2 + 1;
        }

        private void swap(int a, int b)
        {
            T t = g(a);
            s(a, g(b));
            s(b, t);
        }

        private void moveUp(int idx, T t)
        {
        again:
            if (idx == 1)
            {
                s(1, t);
                return; // at end
            }

            int nextUp = up(idx);
            T n = g(nextUp);
            if (n.CompareTo(t) > 0)
            {
                s(idx, n);
                idx = nextUp;
                goto again;
            }
            else
            {
                s(idx, t);
            }
        }

        private void moveDown(int idx, T t)
        {
        again:
            int nextLeft = downLeft(idx);
            int nextRight = downRight(idx);

            if (nextLeft > end)
            {
                s(idx, t);
                return; // at end
            }
            else if (nextLeft == end)
            { // only need to check left
                T l = g(nextLeft);

                if (l.CompareTo(t) < 0)
                {
                    s(idx, l);
                    idx = nextLeft;
                    goto again;
                }
                else
                {
                    s(idx, t);
                }
            }
            else
            { // check both
                T l = g(nextLeft);
                T r = g(nextRight);

                if (l.CompareTo(r) < 0)
                { // left smaller (favour going right if we can)
                    if (l.CompareTo(t) < 0)
                    {
                        s(idx, l);
                        idx = nextLeft;
                        goto again;
                    }
                    else
                    {
                        s(idx, t);
                    }
                }
                else
                { // right smaller or same
                    if (r.CompareTo(t) < 0)
                    {
                        s(idx, r);
                        idx = nextRight;
                        goto again;
                    }
                    else
                    {
                        s(idx, t);
                    }
                }
            }
        }

        public void Clear()
        {
            end = 0;
        }

        public void Trim()
        {
            if (end == 0)
                arr = new T[1]; // don't /ever/ make arr len 0
            else
            {
                T[] narr = new T[end];
                for (int i = 0; i < end; i++)
                    narr[i] = arr[i];
                arr = narr;
            }
        }

        private void doubleSize()
        {
            T[] narr = new T[arr.Length * 2];
            for (int i = 0; i < end; i++)
                narr[i] = arr[i];
            arr = narr;
        }

        public void Add(T item)
        {
            if (end == arr.Length)
            {
                // resize
                doubleSize();
            }

            end++;
            moveUp(end, item);
        }

        public T RemoveMin()
        {
            if (end < 1)
                throw new Exception("No items, mate.");

            T min = g(1);

            end--;
            if (end > 0)
                moveDown(1, g(end + 1));

            return min;
        }

        public bool Empty
        {
            get
            {
                return end == 0;
            }
        }

        public int Count
        {
            get
            {
                return end;
            }
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }

        public IEnumerator<T> GetEnumerator()
        {
            return (IEnumerator<T>)arr.GetEnumerator();
        }
    }
}
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.