强调高尔夫的语法!


14

高尔夫球手。

我们共同努力,产生的代码简洁,功能漂亮且比原始小说中的《歌剧魅影》丑陋。

现在是时候让我们将美丽带回到编程世界了。带颜色。以简洁的方式,功能上的美观比原始小说中的歌剧魅影更丑陋。

我们将要编写一个色彩丰富的语法荧光笔。以最短的代码量。

您将通过输入文件或Stdin收到有效的C文件。C文件将使用您选择的行约定,并且仅包含ASCII字符32-126。您必须将其转换为至少在Chrome中正确显示的HTML文件,该文件以语法高亮显示源代码。输出可能在文件中或到Stdout。

您必须突出显示:

  • 绿色(#00FF00)中的所有字符串和字符(包括引号字符)。字符串可能包含转义字符。

  • 蓝色的所有C保留字(#0000FF)。

  • 黄色所有评论(#FFFF00)。

  • Pink中的所有C预处理程序指令(#FF00FF)。

在Chrome中显示时,输出必须:

  • 采用固定宽度的字体

  • 在原始来源中出现的任何地方显示新行

  • 准确地复制空白。制表符应视为4个空格。

奖金

  • x 0.9(如果包含行号)。行号必须至少能够达到99999。所有源都必须仍然对齐-因此,行号较小的源代码应与行号较高的源代码在同一位置开始

  • 如果每行的背景在浅灰色(#C0C0C0)和白色(#FFFFFF)之间交替,则为x 0.8

  • x 0.9(如果您的源代码是用C编写的,并且可以正确格式化自己)。

计分

这是代码高尔夫。分数是源代码的字节数乘以任何奖励。获胜者是得分最低的高尔夫球手。


1
@Charles是较大的IDE的标准行为,即不会进一步突出显示注释,这意味着注释保留为注释。并没有太多地使用预处理器指令,但是Wikipedia在这里也没有进一步突出显示代码...字符串也是字符串,它们不需要进一步的评估...
Vogel612 2014年

1
确认Vogel612所说的是正确的行为
lochok 2014年

2
“至少在Chrome中有效”或“至少在Chrome中正确显示”?
John Dvorak 2014年

3
@Trimsty等一下,我是个白痴。xD没关系。
cjfaure 2014年

4
有史以来最差的颜色!
格雷格2014年

Answers:


8

Perl 769个字符* 0.9 * 0.8 = 554

可能仍需要对某些正则表达式进行一些改进,但进展缓慢!

$_=join"",<>;$s="<tt class";$c="</tt>";$d=counter;$e=color;s/\t/    /g;s!<!&lt;!g;s!>!&gt;!g;s!^#.+(?=$|
)!$s=d>$&$c!gm;s!//.+!$s=c>$&$c!g;s|(['"]).*?(?<!\\)(\\\\)*\1|($h=$&)=~s!/!&#47;!g;"$s=s>$h$c"|smeg;s!/\*.*?\*/!$s=c>$&$c!smg;s!\b(_Packed|(au|go)to|break|c(ase|har|onst|ontinue)|d(efault|o|ouble)|e(lse|num|xtern)|f(loat|or)|if|int|long|re(gister|turn)|short|(un)?signed|s(izeof|tatic|truct|witch)|typedef|union|vo(id|latile)|while)\b!$s=r>$&$c!g;s!=(\w)>.+?$c!join"$c
$s=$1>",split$/,$&!smeg;s!
!<tr><td>!g;print"<style>body{font:10px monospace;$d-reset:n}td{white-space:pre}tr:nth-child(even){background:#c0c0c0}tr:before{$d-increment:n;content:$d(n)}.d{$e:#f0f}.r{$e:#00f}.c{$e:#ff0}.s{$e:#0f0}tt tt{$e:inherit!important}</style><table cellspacing=0><tr><td>$_"

带有注释的混淆不明显的版本:

$_=join"",<>; # slurp file
$s="<tt class"; # used later - use <tt/> instead of <span/>, fewer chars!
$c="</tt>";
$d=counter;
$e=color;
s/\t/    /g; # convert tabs to spaces
s!<!&lt;!g; # htmlentity < and >
s!>!&gt;!g;
s!^#.+(?=$|\n)!$s=d>$&$c!gm; # directives
s!//.+!$s=c>$&$c!g; # inline comments
s|(['"]).*?(?<!\\)(\\\\)*\1|($h=$&)=~s!/!&#47;!g;"$s=s>$h$c"|smeg; # strings, might have 0 length - thanks @Einacio; work-around string that contain /* by converting them to HTML entities
s!/\*.*?\*/!$s=c>$&$c!smg; # multi-line comments
s!\b(_Packed|(au|go)to|break|c(ase|har|onst|ontinue)|d(efault|o|ouble)|e(lse|num|xtern)|f(loat|or)|if|int|long|re(gister|turn)|short|(un)?signed|s(izeof|tatic|truct|witch)|typedef|union|vo(id|latile)|while)\b!$s=r>$&$c!g; # reserved words, don't optimise this too much! - thanks @bwoebi
s!=(\w)>.+?$c!join"$c
$s=$1>",split$/,$&!smeg; # any multi-line string/comment, ensure <span/>s are repeated
s!\n!<tr><td>!g; # strip newlines, replace with <tr><td>, don't need </td> _or_ </tr> - thanks @xfix!
print"<style>
body{font:10px monospace;$d-reset:n} /* init counter */
td{white-space:pre} /* preserve whitespace */
tr:nth-child(odd){background:#c0c0c0} /* alternating rows */
tr:before{$d-increment:n;content:$d(n)} /* place counter */
.d{$e:#f0f} /* highlights */
.r{$e:#00f}
.c{$e:#ff0}
.s{$e:#0f0}
tt tt{$e:inherit!important} /* ignore reserved words/comments in strings */
</style><table cellspacing=0><tr><td>$_"

现在成功突出显示@xfix的条目。

借用了这个想法 </tr>从@xfix条目中,谢谢!

@xfix解决方案的输出示例


1
您的解决方案无法正确突出显示我的解决方案,因为它无法逃脱HTML标记,并且不能正确显示空格。
Konrad Borowski14年

1
在HTML5中,</tr></td>是完全可选的,因此我只是忽略了它们。
Konrad Borowski

1
实际上,您有时可以在正则表达式中使用完整的关键字名称来保存一些字符。例如if|int比少一个字符i(f|nt)。还是d(efault|o|ouble)另一个字符少于d(efault|o(uble)?)
bwoebi 2014年

1
在小提琴上,第61行的“ \\”未标记为字符串。是示例错误还是边缘案例失败?
Einacio

1
如果将<style>块移到末尾并省略end标签,该代码仍然可以使用。然后,您也可以省略}样式的最后一个。当然,这完全无效,但可以在Chrome中使用!
user2428118 2014年

7

C - 1605 1200个字符* 0.9 * 0.8 * 0.9 = 777个字符

绝对太长,但是随便。关键字列表本身使用的264。长款内胆版。不使用内存分配,因此内存使用率非常低(而且所有操作都是全局的,因此未真正使用堆栈)。JSFiddle上的示例HTML。我认为,注释支持是代码中最复杂的事情。

char*k[]={"auto","break","case","char","const","continue","default","do","double","else","enum","extern","float","for","goto","if","int","long","register","return","short","signed","sizeof","static","struct","switch","typedef","union","unsigned","void","volatile","while"},b[9];c;p;e;p;l=1;q;s;i;main(){printf("<style>tr:nth-child(2n){background:#C0C0C0}</style><table style=font-family:monospace;white-space:pre-wrap><tr><td>1<td>");while(~(c=getchar())){if(!e&&q){if(q==c){printf("%c</span>",c);q=0;continue;}}else if(isalpha(c)&&p<8){b[p++]=c;continue;}else if(b[0]){for(i=0;i<32;i++){s!=2&&!strcmp(k[i],b)&&(printf("<span style=color:#00F>%s</span>",b),b[0]=0);}printf("%s",b);memset(b,0,9);}p=0;switch(c){case'<':printf("&lt;");goto e;case'&':printf("&amp;");goto e;case 92:putchar(c);e^=1;goto e;case'/':q||1==s?putchar('/'):3==s?(printf("*/"),s=0):(s=1);break;case'*':if(s==1){s=2;printf("<span style=color:#FF0>/*");}else if(s==2){s=3;}else{goto d;}break;case 39:case'"':e||q||(q=c,printf("<span style=color:#0F0>"));e=0;goto d;case 10:l+=1;printf("<tr><td>%d<td>",l);if(p&&e){case'#':e=0;q||(p=1,printf("<span style=color:#F0F>"));}else{p=0;}default:d:10!=c&&putchar(c);e:s=s/2*2;}}puts(b);}

还有更长的版本(与真实程序一样可读),除了一些代码高尔夫技巧外,我认为我在高尔夫程序时不能轻易应用。

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
/* Sample comment. */
int main(void) {
    const char *keywords[] = {
        "auto",     "break",    "case",     "char",     "const",
        "continue", "default",  "do",       "double",   "else",
        "enum",     "extern",   "float",    "for",      "goto",
        "if",       "int",      "long",     "register", "return",
        "short",    "signed",   "sizeof",   "static",   "struct",
        "switch",   "typedef",  "union",    "unsigned", "void",
        "volatile", "while",
    };
    int character;
    int preprocessor = 0;
    int escape = 0;
    char buffer[9] = {0};
    int pos = 0;
    int line = 1;
    int quote = 0;
    int comment_state = 0;
    printf("<style>tr:nth-child(2n){background:#C0C0C0}</style><table style=font-family:monospace;white-space:pre-wrap><tr><td>1<td>");
    while ((character = getchar()) != EOF) {
        if (!escape && quote) {
            if (quote == character) {
                printf("%c</span>", character);
                quote = 0;
                continue;
            }
        }
        else if (isalpha(character) && pos < 8) {
            buffer[pos] = character;
            pos += 1;
            continue;
        }
        else if (buffer[0]) {
            int i;
            for (i = 0; i < ARRAY_SIZE(keywords); i++) {
                if (comment_state != 2 && strcmp(keywords[i], buffer) == 0) {
                    printf("<span style=color:#00F>%s</span>", buffer);
                    buffer[0] = 0;
                }
            }
            printf("%s", buffer);
            memset(buffer, 0, 9);
        }
        pos = 0;
        switch (character) {
        case '<':
            printf("&lt;");
            goto e;
        case '&':
            printf("&amp;");
            goto e;
        case '\\':
            putchar(character);
            escape ^= 1;
            goto e;
        case '/':
            if (quote || comment_state == 1) {
                putchar('/');
            }
            else if (comment_state == 3) {
                printf("*/");
                comment_state = 0;
            }
            else {
                comment_state = 1;
            }
            break;
        case '*':
            if (comment_state == 1) {
                comment_state = 2;
                printf("<span style=color:#FF0>/*");
            }
            else if (comment_state == 2) {
                comment_state = 3;
            }
            else {
                goto d;
            }
            break;
        case '\'':
        case '"':
            if (!escape && !quote) {
                quote = character;
                printf("<span style=color:#0F0>");
            }
            escape = 0;
            goto d;
        case '\n':
            line += 1;
            printf("<tr><td>%d<td>", line);

            /* Execute next only if conditions match. */
            if (preprocessor && escape) {
        case '#':
                escape = 0;
                if (!quote) {
                    preprocessor = 1;
                    printf("<span style=color:#F0F>");
                }
            }
            else {
                preprocessor = 0;
            }
            /* fallthru */
        default:
        d:
            if (character != '\n') putchar(character);
        e:
            comment_state = comment_state / 2 * 2;
        }
    }
    printf("%s</table>", buffer);
}

输出看起来棒极了!
lochok 2014年

@lochok:现在缩短到1200个字节。这仍然比Perl解决方案更长,但是现在差距更小了。
Konrad Borowski14年

1
您的代码似乎不适用于多行注释。
nickguletskii 2014年

3

PHP 606字节×0.9×0.8 = 436

<style>li:nth-child(odd){background:#f5f5f5}pre{tab-size:4;-moz-tab-size:4}</style><pre><ol><li><?php $p='preg_match';preg_match_all('_\w+|("|\')(\\\\?.)*?\1|#(.(?!/[/*]))*|//.*|(?s)/\*.*?\*/|.+?_',stream_get_contents(STDIN),$m);foreach($m[0]as$t)echo'</font><font color=#',$t[0]=='#'?'d0d':($p('_^/[/*]_',$t)?'bb0':($p('/"|\'/',$t)?'0d0':($p('/^(auto|break|(cas|continu|doubl|els|volatil|whil)e|char|(cons|defaul|floa|in|shor|struc)t|do|enum|extern|for|goto|if|long|register|return|sizeof|static|switch|typedef|union|(un|)signed|void)$/',$t)?'00f':0))),'>'.preg_replace("/\r?\n/",'<li>',htmlentities($t));

格式:

<style>li:nth-child(odd){background:#f5f5f5}pre{tab-size:4;-moz-tab-size:4}</style>
<pre><ol><li><?php
$p='preg_match';
preg_match_all('_\w+|("|\')(\\\\?.)*?\1|#(.(?!/[/*]))*|//.*|(?s)/\*.*?\*/|.+?_',
    stream_get_contents(STDIN),$m);
foreach($m[0]as$t)
    echo'</font><font color=#',
        $t[0]=='#'?'d0d':(
        $p('_^/[/*]_',$t)?'bb0':(
        $p('/"|\'/',$t)?'0d0':(
        $p('/^(auto|break|(cas|continu|doubl|els|volatil|whil)e|char|'.
        '(cons|defaul|floa|in|shor|struc)t|do|enum|extern|for|goto|if|long|register|'.
        'return|sizeof|static|switch|typedef|union|(un|)signed|void)$/',$t)?'00f':
        0))),
        '>'.preg_replace("/\r?\n/",'<li>',htmlentities($t));
  • 从stdin读取并写入stdout。

  • 可接受的行尾为\ n和\ r \ n。

  • 行号和行颜色是否交替。

  • 我用了一点不同的颜色,因此我可以忍受一下,尽管不会以影响字节数的方式出现。

  • 尽管没有Firefox的功能,但我没有Chrome可以对其进行测试。


2

C ++- 5067字节 4612 * 0.9 * 0.8 = 3320(* 0.9 = 2988如果能够格式化自身就算-用C ++编写)

我意识到这比这里已经介绍的解决方案要大,但是我还是决定发布此内容,因为我在发布xfix的C解决方案之前就开始处理我的版本。

  • 它适用于多行注释
  • 输出的HTML包含错误(但在Chrome中正确显示)
  • 它从input.c中读取并产生output.html

其中一半是大量的C和C ++关键字。

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <locale>
#define B break
#define Z(a,b)if(s[i]==a){z=b;o+=OP+p(s[i]);i++;B;}
#define V(a,b)if(e(s,i,a)){z=b;o+=OC+p(s.substr(i,2));i+=2;B;}
#define N(a) if(s[i]=='\n'){a;o+=nl();i++;B;}
#define Q(a)if(e(s,i,"\\")){o+=p(s.substr(i,2));i+=2;B;}if(s[i]==a){z=0;o+=p(s[i])+CL;i++;B;}
#define C case
using namespace std;string k[]={"__abstract","__alignof","_Alignas","_alignof","and","and_eq","__asm","_asm","asm","__assume","_assume","auto","__based","_based","bitand","bitor","bool","_Bool","__box","break","__builtin_alignof","_builtin_alignof","__builtin_isfloat","case","catch","__cdecl","_cdecl","_Complex","cdecl","char","class","__compileBreak","_compileBreak","compl","const","const_cast","continue","__declspec","_declspec","default","__delegate","delete","do","double","dynamic_cast","else","enum","__event","__except","_except","explicit","__export","_export","extern","false","__far","_far","far","__far16","_far16","__fastcall","_fastcall","__feacpBreak","_feacpBreak","__finally","_finally","float","for","__forceinline","_forceinline","__fortran","_fortran","fortran","friend","_Generic","__gc","goto","__hook","__huge","_huge","huge","_Imaginary","__identifier","if","__if_exists","__if_not_exists","__inline","_inline","inline","int","__int128","__int16","_int16","__int32","_int32","__int64","_int64","__int8","_int8","__interface","__leave","_leave","long","__multiple_inheritance","_multiple_inheritance","mutable","namespace","__near","_near","near","new","_Noreturn","__nodefault","__nogc","__nontemporal","not","not_eq","__nounwind","__novtordisp","_novtordisp","operator","or","or_eq","__pascal","_pascal","pascal","__pin","__pragma","_pragma","private","__probability","__property","protected","__ptr32","_ptr32","__ptr64","_ptr64","public","__raise","register","reinterpret_cast","restrict","__restrict","__resume","return","__sealed","__serializable","_serializable","short","signed","__single_inheritance","_single_inheritance","sizeof","static","static_cast","_Static_assert","__stdcall","_stdcall","struct","__super","switch","__sysapi","__syscall","_syscall","template","this","__thiscall","_thiscall","throw","_Thread_local","__transient","_transient","true","__try","_try","try","__try_cast","typedef","typeid","typename","__typeof","__unaligned","__unhook","union","unsigned","using","__uuidof","_uuidof","__value","virtual","__virtual_inheritance","_virtual_inheritance","void","volatile","__w64","_w64","__wchar_t","wchar_t","while","xor","xor_eq"};string OS="<font color=\"#00FF00\">";string OC="<font color=\"#FFFF00\">";string OK="<font color=\"#0000FF\">";string OP="<font color=\"#FF00FF\">";string CL="</font>";string NL[]={ "<li class=\"li l1\">","<li class=\"li l2\">" };bool lo=1;string nl() {lo=!lo;return "</li>"+NL[lo];}bool r(char c,string s) {for (size_t i=0; i<s.size(); i++)if (c==s[i])return 0;return 0;}bool is(string s,int i) {return !(i<0||i>=s.size())&&((s[i]=='_')||isalpha(s[i]));}bool ic(string s,int i){return !(i<0||i>=s.size())&&(is(s,i)||('0'<=s[i]&&s[i]<='9'));}bool e(string a,int s,string b) {return !(a.size()-s<b.size())&&a.substr(s,b.size())==b;}string p(char c) {switch (c) {C  '&':return "&amp;";C  '\"':return "&quot;";C  '\'':return "&apos;";C  '<':return "&lt;";C  '>':return "&gt;";}stringstream s;s<<c;return s.str();}string p(string s) {string ans="";for (size_t i=0; i<s.size(); i++) {ans+=p(s[i]);}return ans;}string h(string s) {int z=0;size_t i=0;string o ="<html><body><style type=\"text/css\">.l{list-style-type: decimal;margin-top:0;margin-bottom:0;} .li{display:list-item;word-wrap:B-word;} .l1{background-color:#FFFFFF;} .l2{background-color:#EEEEEE;} .cd{white-space:pre;}</style><code class=\"cd\"><ul class=\"l\">";o+=NL[1];for (; i<s.size();) {switch (z) {C 0:{Z('#',6)Z('"',3)Z('\'',2)V("//",4)V("/*",5)N()for (size_t j=0; j<201; j++)if (e(s,i,k[j])&&!ic(s,i+k[j].size())) {o+=OK+p(k[j])+CL;i+=k[j].size();B;}if (is(s,i)) {z=7;o+=p(s[i]);i++;if (i+1==s.size()||!ic(s,i+1)) {z=0;}B;}o+=p(s[i]);i++;B;}o+=p(s[i]);i++;B;C  2:Q('\'')C  3:Q('"')C 4:{N(z=0)o+=p(s[i]); i++;B;}C 5:{if (e(s,i,"*/")) {z=0;o+=p(s.substr(i,2))+CL;i+=2;B;}N()o+=p(s[i]);i++;B;}C 6:{if (s[i]=='\n') {int j=i-1;for (; j>=0&&r(s[j],"\n\t "); j--);if (j<0||s[j] != '\\') {z=0;o+=CL+nl();i++;B;}o+=nl();i++;B;}o+=p(s[i]);i++;B;}C 7:{if (i+1==s.size()||!ic(s,i+1)) {z=0;}o+=p(s[i]);i++;B;}}}o+="</ul></code>";return o;}int main() {ifstream i("input.c");ofstream o("output.html");string cCode((istreambuf_iterator<char>(i)),istreambuf_iterator<char>());o<<h(cCode)<<endl;}

可读版本:

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
using namespace std;

//The 201 keywords from C and C++. Not sure if all of them are listed here!
const size_t NUMBER_OF_KEYWORDS = 201;
string keywords[] = { "__abstract", "__alignof", "_Alignas", "_alignof", "and",
        "and_eq", "__asm", "_asm", "asm", "__assume", "_assume", "auto",
        "__based", "_based", "bitand", "bitor", "bool", "_Bool", "__box",
        "break", "__builtin_alignof", "_builtin_alignof", "__builtin_isfloat",
        "case", "catch", "__cdecl", "_cdecl", "_Complex", "cdecl", "char",
        "class", "__compileBreak", "_compileBreak", "compl", "const",
        "const_cast", "continue", "__declspec", "_declspec", "default",
        "__delegate", "delete", "do", "double", "dynamic_cast", "else", "enum",
        "__event", "__except", "_except", "explicit", "__export", "_export",
        "extern", "false", "__far", "_far", "far", "__far16", "_far16",
        "__fastcall", "_fastcall", "__feacpBreak", "_feacpBreak", "__finally",
        "_finally", "float", "for", "__forceinline", "_forceinline",
        "__fortran", "_fortran", "fortran", "friend", "_Generic", "__gc",
        "goto", "__hook", "__huge", "_huge", "huge", "_Imaginary",
        "__identifier", "if", "__if_exists", "__if_not_exists", "__inline",
        "_inline", "inline", "int", "__int128", "__int16", "_int16", "__int32",
        "_int32", "__int64", "_int64", "__int8", "_int8", "__interface",
        "__leave", "_leave", "long", "__multiple_inheritance",
        "_multiple_inheritance", "mutable", "namespace", "__near", "_near",
        "near", "new", "_Noreturn", "__nodefault", "__nogc", "__nontemporal",
        "not", "not_eq", "__nounwind", "__novtordisp", "_novtordisp",
        "operator", "or", "or_eq", "__pascal", "_pascal", "pascal", "__pin",
        "__pragma", "_pragma", "private", "__probability", "__property",
        "protected", "__ptr32", "_ptr32", "__ptr64", "_ptr64", "public",
        "__raise", "register", "reinterpret_cast", "restrict", "__restrict",
        "__resume", "return", "__sealed", "__serializable", "_serializable",
        "short", "signed", "__single_inheritance", "_single_inheritance",
        "sizeof", "static", "static_cast", "_Static_assert", "__stdcall",
        "_stdcall", "struct", "__super", "switch", "__sysapi", "__syscall",
        "_syscall", "template", "this", "__thiscall", "_thiscall", "throw",
        "_Thread_local", "__transient", "_transient", "true", "__try", "_try",
        "try", "__try_cast", "typedef", "typeid", "typename", "__typeof",
        "__unaligned", "__unhook", "union", "unsigned", "using", "__uuidof",
        "_uuidof", "__value", "virtual", "__virtual_inheritance",
        "_virtual_inheritance", "void", "volatile", "__w64", "_w64",
        "__wchar_t", "wchar_t", "while", "xor", "xor_eq" };

// Different states
const int NONE = 0;
const int WHITESPACE = 1;
const int CHAR_UNCLOSED = 2;
const int STRING_UNCLOSED = 3;
const int LINE_COMMENT_UNCLOSED = 4;
const int MULTILINE_COMMENT_UNCLOSED = 5;
const int PREPROCESSOR_UNCLOSED = 6;
const int IDENTIFIER = 7;

//Different elements
const string OPEN_STRING = "<font color=\"#00FF00\">";
const string CLOSE_STRING = "</font>";
const string OPEN_COMMENT = "<font color=\"#FFFF00\">";
const string CLOSE_COMMENT = "</font>";
const string OPEN_KEYWORD = "<font color=\"#0000FF\">";
const string CLOSE_KEYWORD = "</font>";
const string OPEN_PREPROCESSOR = "<font color=\"#FF00FF\">";
const string CLOSE_PREPROCESSOR = "</font>";

//Alternating background
const string NEW_LINE[] = { "<li class=\"li l1\">", "<li class=\"li l2\">" };
bool lineOdd = true;
string getNewLineHTML() {
    lineOdd = !lineOdd;
    return "</li>" + NEW_LINE[lineOdd];
}

//Check if the character is in the string chars
bool inRange(char c, string chars) {
    for (size_t i = 0; i < chars.size(); i++)
        if (c == chars[i])
            return true;
    return false;
}

//Check if the character is the start of an identifier
bool isIdentifierStart(string input, int i) {
    if (i < 0 || i >= input.size())
        return false;
    return (input[i] == '_') || ('a' <= input[i] && input[i] <= 'z')
            || ('A' <= input[i] && input[i] <= 'Z');
}
//Check if the character is the continuation of an identifier
bool isIdentifierCont(string input, int i) {
    if (i < 0 || i >= input.size())
        return false;
    return ('0' <= input[i] && input[i] <= '9') || isIdentifierStart(input, i);
}
//Check if a[start + i] == b[i], i<b.size()
bool eqRange(string a, int start, string b) {
    if (a.size() - start < b.size())
        return false;
    return a.substr(start, b.size()) == b;
}
//Escape the sourcecode for HTML
string escape(char c) {
    switch (c) {
    case '&':
        return "&amp;";
    case '\"':
        return "&quot;";
    case '\'':
        return "&apos;";
    case '<':
        return "&lt;";
    case '>':
        return "&gt;";
    }
    //Is there a better way to do this?
    stringstream strm;
    strm << c;
    return strm.str();
}
string escape(string str) {
    string ans = "";
    for (size_t i = 0; i < str.size(); i++) {
        ans += escape(str[i]);
    }
    return ans;
}
string highlight(string input) {
    //The current state
    int state = NONE;
    //The current position
    size_t i = 0;
    //Styles
    string output =
            "<html><body><style type=\"text/css\">.l{list-style-type: decimal; margin-top: 0; margin-bottom: 0;} .li{ display: list-item; word-wrap: break-word;} .l1{background-color: #FFFFFF;} .l2{background-color: #EEEEEE;} .cd{white-space: pre;}</style><code class=\"cd\"><ul class=\"l\">";
    output += NEW_LINE[1];
    for (; i < input.size();) {
        switch (state) {
        case NONE: {
            if (input[i] == '#') { //Start a preprocessor statement
                state = PREPROCESSOR_UNCLOSED;
                output += OPEN_PREPROCESSOR + escape(input[i]);
                i++;
                break;
            }
            if (input[i] == '"') { //Start a string
                state = STRING_UNCLOSED;
                output += OPEN_STRING + escape(input[i]);
                i++;
                break;
            }
            if (input[i] == '\'') { //Start a character
                state = CHAR_UNCLOSED;
                output += OPEN_STRING + escape(input[i]);
                i++;
                break;
            }
            if (eqRange(input, i, "//")) { //Start a single line comment
                state = LINE_COMMENT_UNCLOSED;
                output += OPEN_COMMENT + escape(input.substr(i, 2));
                i += 2;
                break;
            }
            if (eqRange(input, i, "/*")) { //Start a multi-line comment
                state = MULTILINE_COMMENT_UNCLOSED;
                output += OPEN_COMMENT + escape(input.substr(i, 2));
                i += 2;
                break;
            }
            if (input[i] == '\n') { //New lines are special!
                output += getNewLineHTML();
                i++;
                break;
            }
            for (size_t j = 0; j < NUMBER_OF_KEYWORDS; j++) //Iterate through keywords
                if (eqRange(input, i, keywords[j])
                        && !isIdentifierCont(input, i + keywords[j].size())) { // The keyword can't be a prefix of an identifier, so test for that
                    output += OPEN_KEYWORD + escape(keywords[j])
                            + CLOSE_KEYWORD;
                    i += keywords[j].size();
                    break;
                }
            //Treat identifiers separately because we need to separate identifiers from keywords.
            if (isIdentifierStart(input, i)) {
                state = IDENTIFIER;
                output += escape(input[i]);
                i++;
                //If the next character is not a part of the identifier, go to the NONE state
                if (i + 1 == input.size() || !isIdentifierCont(input, i + 1)) {
                    state = NONE;
                }
                break;
            }
            //Other characters
            output += escape(input[i]);
            i++;
            break;
        }
        case CHAR_UNCLOSED: {
            if (eqRange(input, i, "\\")) { //Treat escape sequences inside quotes
                output += escape(input.substr(i, 2));
                i += 2;
                break;
            }
            if (input[i] == '\'') { //Close quote
                state = NONE;
                output += escape(input[i]) + CLOSE_STRING;
                i++;
                break;
            }
            output += escape(input[i]); //Other characters go into the literal
            i++;
            break;
        }
        case STRING_UNCLOSED: {
            if (eqRange(input, i, "\\")) { //Treat escape sequences inside quotes
                output += escape(input.substr(i, 2));
                i += 2;
                break;
            }
            if (input[i] == '"') { //Close quote
                state = NONE;
                output += escape(input[i]) + CLOSE_STRING;
                i++;
                break;
            }
            output += escape(input[i]); //Other characters go into the literal
            i++;
            break;
        }
        case LINE_COMMENT_UNCLOSED: {
            if (input[i] == '\n') { //Close comment with new line
                state = NONE;
                output += CLOSE_COMMENT + getNewLineHTML();
                i++;
                break;
            }
            output += escape(input[i]); //Comment body
            i++;
            break;
        }
        case MULTILINE_COMMENT_UNCLOSED: {
            if (eqRange(input, i, "*/")) { //Close multiline comment
                state = NONE;
                output += escape(input.substr(i, 2)) + CLOSE_COMMENT;
                i += 2;
                break;
            }
            if (input[i] == '\n') { //New lines are special!
                output += getNewLineHTML();
                i++;
                break;
            }
            output += escape(input[i]); //Comment body
            i++;
            break;
        }
        case PREPROCESSOR_UNCLOSED: {
            if (input[i] == '\n') { //Close preprocessor statement or go to next line
                int j = i - 1;
                for (; j >= 0 && inRange(input[j], "\n\t "); j--)
                    //Seek las non-whitespace character
                    ;
                if (j < 0 || input[j] != '\\') { //Check if the last non-whitespace character is a backslash
                    state = NONE; //... If it isn't, close the preprocessor statement
                    output += CLOSE_PREPROCESSOR + getNewLineHTML();
                    i++;
                    break;
                }
                output += getNewLineHTML(); //... If it is, we need to extend the preprocessor statement to the next line
                i++;
                break;
            }
            output += escape(input[i]);
            i++;
            break;
        }
        case IDENTIFIER: {
            //If the next character is not a part of the identifier, go to the NONE state
            if (i + 1 == input.size() || !isIdentifierCont(input, i + 1)) {
                state = NONE;
            }
            output += escape(input[i]);
            i++;
            break;
        }
        }
    }
    output += "</ul></code>";
    return output;
}
int main() {
    ifstream input("input.c");
    ofstream output("output.html");
    std::string cCode((std::istreambuf_iterator<char>(input)),
            std::istreambuf_iterator<char>());
    output << highlight(cCode) << endl;
}

突出显示编译器关键字是否真的有必要。它们不是标准的。但是,如果您确实想要,请假设以__(两个下划线开头)的任何内容作为关键字,因为规范说该保留用于实现目的。
Konrad Borowski
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.