K&R C - 188个196 199 229字符
通过更改规范以指定功能,我可以节省大量的c开销。还更改为使用Strigoides的音节计数技巧,这比我的公式调整要好,并扩展到可以处理单词的过度计数。
当我发现一种更短的方法来进行元音检测时(可悲的是基于该方法)stdchr
,我有动力从我一直在使用的令人讨厌的可憎事物中挤出更多的信息,这样我就不必感到无聊了。
d,a,v,s,t,w;float R(char*c){for(;*c;++c){s+=*c=='.';if(isalpha(*c)){
w+=!a++;d=(*c&30)>>1;if(*c&1&(d==7|((!(d&1))&(d<6|d>8)))){t+=!v++;}
else v=0;}else v=a=0;}return 206.835-1.*w/s-82.*t/w;}
这里的逻辑是一个简单的状态机。它仅按句点计数句子,按字母字符串对单词进行计数,并将音节作为元音字符串(包括y)进行计数。
为了使它具有正确的数字,我不得不稍微增加一些常数,但是我借用了Strigoides的技巧,即只将音节低了一个固定的分数。
Ungolfed,带有注释和一些调试工具:
#include <stdlib.h>
#include <stdio.h>
d,a,/*last character was alphabetic */
v,/*lastcharacter was a vowel */
s, /* sentences counted by periods */
t, /* syllables counted by non-consequtive vowels */
w; /* words counted by non-letters after letters */
float R/*eadability*/(char*c){
for(;*c;++c){
s+=*c=='.';
if(isalpha(*c)){ /* a letter might mark the start of a word or a
vowel string */
w+=!a++; /* It is only the start of a word if the last character
wasn't a letter */
/* Extract the four bits of the character that matter in determining
* vowelness because a vowel might mark a syllable */
d=(*c&30)>>1;
if( *c&1 & ( d==7 | ( (!(d&1)) & (d<6|d>8) ) )
) { /* These bits 7 or even and not 6, 8 make for a
vowel */
printf("Vowel: '%c' (mangled as %d [0x%x]) counts:%d\n",*c,d,d,!v);
t+=!v++;
} else v=0; /* Not a vowel so set the vowel flag to zero */
}else v=a=0; /* this input not alphabetic, so set both the
alphabet and vowel flags to zero... */
}
printf("Syllables: %3i\n",t);
printf("Words: %3i (t/w) = %f\n",w,(1.0*t/w));
printf("Sentences: %3i (w/s) = %f\n",s,(1.0*w/s));
/* Constants tweaked here due to bad counting behavior ...
* were: 1.015 84.6 */
return 206.835-1. *w/s-82. *t/w;
}
main(c){
int i=0,n=100;
char*buf=malloc(n);
/* Suck in the whole input at once, using a dynamic array for staorage */
while((c=getc(stdin))!=-1){
if(i==n-1){ /* Leave room for the termination */
n*=1.4;
buf=realloc(buf,n);
printf("Reallocated to %d\n",n);
}
buf[i++]=c;
printf("%c %c\n",c,buf[i-1]);
}
/* Be sure the string is terminated */
buf[i]=0;
printf("'%s'\n",buf);
printf("%f\n",R/*eadability*/(buf));
}
输出:(使用长版的脚手架,但使用打高尔夫球的功能。)
$ gcc readability_golf.c
readability_golf.c:1: warning: data definition has no type or storage class
$ ./a.out < readability1.txt
'I would not, could not, in the rain.
Not in the dark, not on a train.
Not in a car, not in a tree.
I do not like them, Sam, you see.
Not in a house, not in a box.
Not with a mouse, not with a fox.
I will not eat them here or there.
I do not like them anywhere!
'
104.074631
$ ./a.out < readability2.txt
'It was a bright cold day in April, and the clocks were striking thirteen.
Winston Smith, his chin nuzzled into his breast in an effort to escape
the vile wind, slipped quickly through the glass doors of Victory Mansions,
though not quickly enough to prevent a swirl of gritty dust from entering
along with him.
'
63.044090
$ ./a.out < readability3.txt
'When in the Course of human events, it becomes necessary for one people to
dissolve the political bands which have connected them with another, and to
assume among the powers of the earth, the separate and equal station to
which the Laws of Nature and of Nature's God entitle them, a decent respect
to the opinions of mankind requires that they should declare the causes
which impel them to the separation.
'
-1.831667
缺陷:
- 句子计数逻辑是错误的,但是我避免了,因为只有一个输入具有a
!
或a ?
。
- 单词计数逻辑会将紧缩视为两个单词。
- 音节计数逻辑会将那些相同的收缩视为一个音节。但平均而言可能会高估(例如,
there
被算作两个,许多结尾的单词e
将被算作一个太多),因此我应用了96.9%的校正常数。
- 假定一个ASCII字符集。
- 我相信元音检测将承认
[
和{
,这显然是不正确的。
- 很大程度上依赖于K&R语义,这很难看,但是,嘿,这是代码高尔夫。
要看的东西: