满是胡说八道的书：确定打油诗

众所周知，打油诗是一首短短的五行诗，带有AABBA押韵方案和麻醉剂（无论是哪种），都是偶尔出现的：

用一句话写出五行民谣的荒谬的第一行和第五行押韵，
正如您已经算过的，
他们与第二
行押韵。第四行必须与第三行押韵

您的任务是编写最短的程序，该程序在馈入输入文本时会打印是否认为输入是有效的打油诗。输入既可以在命令行中进行选择，也可以通过标准输入进行选择，输出也可以是简单的“ Y” /“ N”或置信度得分，同样可以选择。

这是正确的打油诗的另一个例子：

有一位年轻的女士的眼睛
都以颜色和大小的唯一
当她打开他们广泛的
人都转过身去
，惊讶地拔腿就走

但是下面的诗显然不是油画，因为它不押韵：

有一个圣蜜蜂的老人
被黄蜂st 住了胳膊。
当被问到“疼吗？”
他回答说：“不，不是，
我很高兴那不是大黄蜂。”

这也不是，因为仪表都是错误的：

我听说有一个柏林
人讨厌他所在的房间。
当我问
他为什么叹口气时说：
“好吧，你知道，昨晚周围有几个流氓正在庆祝熊队夺得冠军。世界杯，他们的声音真的很大，所以因为喧嚣我无法入睡。”

线索

以下是一些可用来确定您的输入是否为打油诗的线索：

五行民谣总是五行。
第1、2和5行应该押韵。
第三行和第四行应该押韵。
第1、2和5行大约有3x3 = 9个音节，而第3行和第4行有2x3 = 6个音节

请注意，除了第一个以外，其他所有都不是一成不变的：不可能达到100％的正确率。

规则

您的输入至少应该以确定性方式正确地对示例1到3进行分类。
您是允许使用您想要的任何编程语言，除了专门为这次比赛设计的（见当然编程语言在这里）。
你是不是允许使用任何图书馆除了你的编程语言的标准产品。
您是允许假设该文件中，CMU狮身人面像语音上的字典，是在当前目录下名为“C”文件。
你是不是允许硬编码的测试输入：你的程序应该是一个一般的打油诗分类。
您是允许假设输入是ASCII，没有任何特殊的格式（如中的例子），但你的程序不应该由标点符号混淆。

奖金

提供以下奖金：

您的程序将其结果输出为打油诗吗？减去150个字符的长度奖励！
您的程序还能正确识别十四行诗吗？减去150个字符的额外长度奖励！
在十四行诗中使用时，您的程序将其结果输出为十四行诗吗？减去100个字符额外的额外长度奖励！

最后...

请记住提及您认为应该获得的奖金（如果有的话），并从角色数中减去奖金即可得出分数。这是一场代码高尔夫比赛：最短的比赛（即得分最低的比赛）获胜。

如果您需要更多（阳性）测试数据，请查看OEDILF或“废话”。阴性测试数据应易于构建。

祝好运！

code-golf natural-language

— 漫步瑙塔
source

这应该是code-challenge因为奖金。请阅读标签说明

— 2014年

@ user80551 关于meta的共识似乎是相反的。

— 门把手

我已经澄清了奖金的性质，希望能消除这种困惑。

— Wander Nauta

熊！

— alvonellos 2014年

我不明白奖金。我应该如何以打油诗形式输出“ Y”？

— r3mainer 2014年

Answers:

Python：400-150-150 = 100

我能想到的最短的脚本是...

import re,sys;f,e,c=re.findall,lambda l,w:f('^'+w.upper()+'  (.+)',l),lambda*v:all([a[i]==a[v[0]]for i in v]);a=[sum([[e(l,w)[0].split()for l in open('c')if e(l,w)][0]for w in f(r'\w+',v)],[])[-2:]for v in sys.stdin];n=len(a);print n==14and c(0,3,4,7)*c(1,2,5,6)*c(8,11)*c(9,12)*c(10,13)*"Sonnet"or"For a critic\nOf limerick\nWell-equipped\nIs this script.\n%s limerick!"%(n==5and c(0,1,4)and c(2,3))

...但甚至不要尝试。它为遇到的每个单词解析提供的字典，因此非常慢。此外，只要单词不在词典中，都会生成错误。

但是，代码仍然符合要求：识别文本是否通过stdin传递是打油诗，十四行诗还是两者都不是。

仅剩下20个字符，以下是优化版本：

import re,sys;f,e,c=re.findall,lambda l:f(r'^(\w+)  (.+)',l),lambda*v:all([a[i]==a[v[0]]for i in v]);d={e(l)[0][0]:e(l)[0][1].split()for l in open('c')if e(l)};a=[sum([d.get(w.upper(),[])for w in f(r'\w+',v)],[])[-2:]for v in sys.stdin];n=len(a);print n==14and c(0,3,4,7)*c(1,2,5,6)*c(8,11)*c(9,12)*c(10,13)*"Sonnet"or"For a critic\nOf limerick\nWell-equipped\nIs this script.\n%s limerick!"%(n==5and c(0,1,4)and c(2,3))

特征

能够识别十四行诗（-150）
用打油诗回答打油诗（-150）
相对快：每次执行仅解析一个文件

用法

cat poem.txt | python poem-check.py

3种不同的输出是可能的：

轻声说如果是这样的话，输入就是一个
轻声说如果是这种情况，则输入不是一个
如果输入被识别为“ Sonnet”

扩展代码及说明

import re, sys

# just a shortened version of the 're.findall' function...
f = re.findall
# function used to parse a line of the dictionary
e = lambda l:f(r'^(\w+)  (.+)', l)

# create a cache of the dictionary, where each word is associated with the list of phonemes it contains
d = {e(l)[0][0]:e(l)[0][1].split(' ') for l in open('c') if e(l)}

# for each verse (line) 'v' found in the input 'sys.stdin', create a list of the phoneme it contains;
# the result array 'a' contains a list, each item of it corresponding to the last two phonemes of a verse
a = [sum([d.get(w.upper(), []) for w in f(r'\w+',v)],[])[-2:] for v in sys.stdin]

# let's store the length of 'a' in 'n'; it is actually the number of verses in the input
n = len(a)
# function used to compare the rhymes of the lines which indexes are passed as arguments
c = lambda*v:all([a[i] == a[v[0]] for i in v])

# test if the input is a sonnet, aka: it has 14 verses, verses 0, 3, 4 and 7 rhyme together, verses 1, 2, 5 and 6 rhyme together, verses 8 and 11 rhyme together, verses 9 and 12 rhyme together, verses 10 and 13 rhyme together
if n==14 and c(0,3,4,7) and c(1,2,5,6) and c(8,11) and c(9,12) and c(10,13):
    print("Sonnet")
else:
    # test if the input is a limerick, aka: it has 5 verses, verses 0, 1 and 4 rhyme together, verses 2 and 3 rhyme together
    is_limerick = n==5 and c(0,1,4) and c(2,3)
    print("For critics\nOf limericks,\nWell-equipped\nIs this script.\n%s limerick!", is_limmerick)

— 马修·罗迪克（Mathieu Rodic）
source

看起来很酷！我尚未测试过，但是您确定这需要“在命令行上或通过标准输入”输入（请参阅问题）吗？如果不是，则应添加（可能是a sys.stdin.read()或open(sys.argv[1]).read()某处）并重新计数。

— Wander Nauta

好的！更正了:)

— Mathieu Rodic 2014年

该算法如何检查韵律？

— DavidC 2014年

在Wander Nauta提供的文件的帮助下！真的有帮助。

— Mathieu Rodic'3

整齐！可惜我不能两次投票赞成你。

— Wander Nauta

ECMAScript 6（138分；在Firefox中尝试）：

288- 150包括手枪游戏的点数奖励（由@MathieuRodic捏制）。

a=i.split(d=/\r?\n/).map(x=>x.split(' '));b=/^\W?(\w+) .*? (\w+\d( [A-Z]+)*)$/;c.split('\r\n').map(x=>b.test(x)&&eval(x.replace(b,'d["$1"]="$2"')));e=f=>d[a[f][a[f].length-1]];alert('For critics\nOf limericks,\nWell-equipped\nIs this script.\n'+(a[4]&&e(0)==e(1)&e(0)==e(4))+' limerick!')

笔记：

期望该变量c包含字典文件的内容，因为您无法在普通ECMAScript中读取文件。

ECMAScript没有标准输入，但prompt通常被认为是“标准输入”。但是，由于prompt在大多数（如果不是全部）浏览器中将换行符转换为空格，我正在接受来自variable的输入i。

取消程式码：

// If you paste a string with multiple lines into a `prompt`, the browser replaces each line break with a space, for some reason.
//input = prompt();

// Split into lines, with each line split into words
lines = input.split('\n').map(x => x.split(' '));

dictionaryEntryRegEx = /^\W?(\w+) .*? (\w+\d( [A-Z]+)*)$/;
dictionary = {};
// Split it into
c.split(/\r?\n/).map(x => dictionaryEntryRegEx && eval(x.replace(dictionaryEntryRegEx, 'dictionary["$1"] = "$2"')));

// Get the last word in the line
getLastWordOfLine = (lineNumber) => dictionary[line[lineNumber][line[lineNumber].length - 1]]

alert('For critics\nOf limericks,\nWell-equipped\nIs this script.\n' + (lines[4] && getLastWordOfLine(0) === getLastWordOfLine(1) && getLastWordOfLine(0) === getLastWordOfLine(4)) + ' limerick!');

— 牙刷
source

整齐！但是，这并不需要“在命令行上输入或通过标准输入”，这是问题所必需的。也许您可以重写它以使用Node.js之类的东西。

— Wander Nauta

@WanderNauta谢谢。请参阅最新的编辑，因为我解释了为什么不使用标准输入。

— 牙刷