26

正如许多极客可能知道的那样，维基百科上的大多数页面（我认为其中的95％）最终都会导致这样的哲学：

单击第一个非斜体或括号链接，该链接指向另一篇普通文章：（即不是File：或Special ：，但是类似Wikipedia：可以），然后重复该操作，直到您找到哲学。

该脚本必须：

以第一页为输入
打印获得的每篇文章的名称
并打印出多少篇有关哲学的文章，如果不是这样的话。

您从1000分开始，然后在代码中的每个字符上损失1分，对于以下情况，奖励分：

检测循环文章并停止：+50

检测正在循环播放的文章并询问用户是否应转到文章的下一个链接：+170

允许在前一个检查中使用命令行arg或类似参数作为默认值：+140

最高分获胜。

code-challenge

— AlphaModder
source

7

+1，巨大的挑战！括号检测很难：P

— 门把手

1

我觉得这可以使用更好的定义，但是我不确定到底怎么做。

— Iszi 2014年

3

每个键入的字符损失1分。嗯太好了，我明白了，我只是复制粘贴角色！没有积分丢失！

— 贾斯汀

5

答案发布后，请不要更改规则；这是很不礼貌的，并且在这里的社区中普遍不被接受……

— Doorknob

2

xefer.com/wikipedia

— 查理（Charlie）

8

红宝石，1000 - 303 299 337 - 50 373 - 170 382 - 170 - 140 379 - 170 - 140个字符= 697 701 713 797 928 931

我确定有很多改进要做。

（这需要Nokogiri）

require'open-uri'
require'nokogiri'
x="/wiki/"+gets.chomp
r=[n=i=0]
until x=~/\/Philosophy/
d=Nokogiri.HTML open"http://en.wikipedia.org#{x}"
x=d.css('p a').select{|a|t=a.xpath('preceding::text()').map(&:text)*'';t.count('(')==t.count(')')&&a.attr('href')=~/^.wiki[^:]+$/}[i].attr'href'
i=0
puts r.index(x)?"#{$><<'i=';i=($*[0]||gets).to_i;''}": r.push(x)[-1][6..-1]
n+=1
end
p n

示例运行：

c:\a\ruby>wikipedia_crawl_philosophy
Latin (note: this is my input)
Classical_antiquity
History
Umbrella_term
Terminology
Word
Linguistics
Science
Knowledge
Fact
Proof_(truth)
Argument
Logic
Reasoning
Consciousness
Quality_(philosophy)
Property_(philosophy)
Modern_philosophy
Philosophy
18

我必须去另一个链接的示例

c:\a\ruby>wikipedia_crawl_philosophy
Snow
Precipitation_(meteorology)
Meteorology
Atmospheric_physics
Synoptic_scale_meteorology
i=2 // I put the 0-indexed number of the link I wanted to go to (so, the third link)

Weather
Atmosphere
Gas
State_of_matter#The_four_fundamental_states
Physics
Natural_science
Sciences
Knowledge
Fact
Proof_(truth)
Argument
Logic
Reasoning
Consciousness
Quality_(philosophy)
Property_(philosophy)
Modern_philosophy
Philosophy
25

我使用的技巧：

我使用选择器p a仅获取非斜体链接，因为实际文章中所有非斜体链接始终位于Wikipedia的段落元素中。

— 门把手
source

嗯...也许我应该禁止任何语言，但是语言

— 附带的图书馆

@ user1825860它不是该语言附带的库；这是一颗宝石。我编辑了答案。但是，实际上，您想接受这个已经很艰巨的挑战，并迫使我们也不要使用HTML解析库吗？：P

— 门把手

我不是不允许这样做，但您会输掉积分：P

— AlphaModder 2014年

您应该重新阅读第一篇文章，并进行相应的编辑：P

— AlphaModder 2014年

2

@ user1825860请不要在发布答案后更改规则。这很不礼貌……

— 门把手

5

“重击 ” –（如果没记错：1000-397 + 170 + 140 = 913点）
“重击” -（如果没记错：1000-386 + 170 + 140 = 924点）

“重击” –（如果没有记错的话：1000-381 + 170 + 140 = 929点）

BASH特意用引号引起来，因为这是* nix外壳中使用但打包在bash脚本中的工具的混合。

编辑1：

删除http://作为curl默认了这一点。
将href=锚点上的匹配更改f=为<a>，没有其他以结尾的普通属性f。（这是自定义标签的一种可能性。到目前为止尚未见过。）
将未找到退出消息设置为，!Phil而不是NoPhil。这一个是一个有点古怪的人们也可以说如!，0，N，!P或类似的。
怪癖二：可以删除-son curl以减少三个以上的字节，但这会产生混乱的输出。不知道这是否是一个问题。
更新了此页面上的帮助。

使用怪癖，代码最终将以379个字节（931点）结尾。

我还可以通过添加六个字节（减去六个点）来实现@plannapus匹配（希望）导航框的用法(p|ul).*?<(\1)。

编辑2：

使用${#c[@]}打印度的分离，而不是$i柜台。

使用怪癖，代码将最终达到374个字节（936点）。

我召唤克苏鲁，并寻求一个正则表达式+ bash / shell / * nix解决方案。

被盗：

使用Snow的@Doorknob中的<p>技巧。

实施：

检测循环并询问是否应采用下一个链接。
（可选）选择重复项上的下一个链接作为选项。

要求：

bash v。？
grep与-P（PCRE）支持。
sed
curl
cut

用法：

script PATH [OPTIONS]

Print separation of article from ``PATH'' to ``Philosophy'' on Wikipedia.
Degrees of separation, if found, is printed as last line. 
If not found last line yields ``!Phil''.

PATH    
     Absolute path to starting article, e.g: /wiki/Word 
OPTIONS
     y   Automatically select next link if already visited.
     n   (Or other) Quit if next link already visited.
BUGS
     1. On previous visit; "next link" is not checked. Thus if next link
     has already been visited we get eternal loop. Not sure if this
     disqualify +170 points.
     2. Sure there are.

内联代码。复制到文件。chmod +x filename。./script /wiki/…从bash shell 运行。

u=($1);c=($1);while ! [[ "$u" =~ /Philosophy$ ]];do echo "$u";u=($(curl -s "en.wikipedia.org$u"|tr '\n' ' '|grep -Po '<p>.*?</p>'|sed 's/>[^<]*([^)]*)//g'|grep -o '<a [^>]*f="/wiki/[^":]*"'|cut -d\" -f2));for x in "${c[@]}";do if [ "$x" = "$u" ];then [ $2 ] &&s=$2||read -p "${u[0]}?" s;[ $s = y ] &&u[0]=${u[1]}||{ echo "!Phil";exit;} fi;done;c=("${c[@]}" "$u");done;echo ${#c[@]};

扩展并解释了代码：

u=($1); # Array of paths.
c=($1); # Array of visited paths.
# While $u != /Philosophy, ugly trick is to use $u instead of ${u[0]}.
while ! [[ "$u" =~ /Philosophy$ ]];do   
        echo "$u";      # Print current page.
        # curl   : prints retreived page to stdout. "-s" could be skipped.
        # tr     : replace all newline with space. This is a ®sanity thing when it comes to 
        #          twidling with html using regex.
        # grep 1 : match <p> tags. Using -P's ungreedy *?.
        # sed    : remove all occurences of "(" something ")".
        # grep 2 : match links where "href" attribute starts with /wiki/ and is not e.g. File:
        # cut    : match actual href value.
        # Result is added to array ``u''.
        u=($(curl -s "en.wikipedia.org$u" |
                tr '\n' ' ' | 
                grep -Po '<p>.*?</p>' | 
                sed 's/>[^<]*([^)]*)//g' | 
                grep -o '<a [^>]*f="/wiki/[^":]*"' | 
                cut -d\" -f2));

        # For previously visited pages as x.
        for x in "${c[@]}"; do 
                # If x equals to first page ...
                if [ "$x" = "$u" ]; then        
                        # Use option or ask.
                        [ $2 ] && s=$2 || read -p "${u[0]}?" s; 
                        # If response is "y" use next link, else exit with status.
                        [ $s = y ] && u[0]=${u[1]} || { 
                                echo "!Phil"; 
                                exit;
                        } 
                fi;
        done;
        # Append current link to "visited"
        c=("${c[@]}" "$u"); 
done;
# Print number of visited pages.
echo ${#c[@]}

— 鲁尼姆
source

该死，你把我打败了一点！：P我将不得不更多地解决我的问题

— Doorknob

是;），但不确定这是有效的代码。以这种方式使用工具。

— Runium 2014年

5

JavaScript 726（444个字符[556] + 170）

现在，我意识到这可能不能作为书签使用，但是无论如何我还是很喜欢修改它。

用法：导航至您要开始的页面，然后在控制台中运行以下命令：

(function(a){c=0,o="";$(u="html")[u](f=$('<iframe src="'+location+'?">').on("load",function(){$=f.contentWindow.$;p=f.contentDocument.title[s="split"](" - ")[0];c++;p=="Philosophy"?document.write("<pre>"+o+p+"\n"+c):(i=RegExp("^"+p+"$","m").test(o)?a||confirm("Loop, try next?")?2:0:1)&&(f.src=$("p>a").filter(function(){return(t=$(this).parent()[u]()[s](this.outerHTML)[0])[s]("(").length==t[s](")").length})[--i].href);o+=p+"\n"})[0])})(true)

对于JavaScript，输出如下：

JavaScript
Interpreter (computing)
Computer science
Science
Knowledge
Fact
Proof (truth)
Argument
Logic
Reason
Consciousness
Quality (philosophy)
Property (philosophy)
Modern philosophy
Philosophy
15

此解决方案将假定您要跳转到被检测到的循环上的下一个链接，但是如果true将末尾更改为false，则会弹出一个确认框（非常令人讨厌...），不确定是否符合二次奖金与否。我假设没有。

缩进：

(function(l){
    c=0,o='';
    $(u='html')[u](f=$('<iframe src="'+location+'?">').on('load',function(){ // Firefox needs the ? to properly load the frame
        $=f.contentWindow.$; // reference repeated calls as strings to save more bytes
        p=f.contentDocument.title[s='split'](' - ')[0]; // get the title

        c++;
        p=='Philosophy'?
            document.write('<pre>'+o+p+'\n'+c): // pre for nice formatting
            (i=RegExp('^'+p+'$','m').test(o)?
                l||confirm('Loop, try next?')?
                    2: // desired index + 1 so we can use as a boolean
                    0
                :
                1)&&
            (f.src=$('p>a').filter(function(){
                return (t=$(this).parent()[u]()[s](this.outerHTML)[0])[s]('(').length == t[s](')').length // shorter, but still not overly happy with this...
            })[--i].href);
            o+=p+'\n' // update output
    })[0])
})(true) // change this to show confirm box when loop detected

因此，我最初错过了忽略括号中的项目的部分，并添加了更多的单词，因此我希望降低该过滤器功能（或希望将其完全替换掉）。

在Chrome和Firefox中均可使用（已在Firefox 26中测试）

— 唐·黑斯廷斯
source

2

看起来真棒，但在Firefox 20失败

— 布思

啊！我只测试了Chrome。我会调查一下！

— Dom Hastings 2014年

@boothby应该现在就可以在Firefox中工作了...仍然想在我选择的链接上工作！

— Dom Hastings 2014年

5

C＃-813个字符

得分：1000-813 + 50 + 170 + 140 = 547 :(

没有外部库。循环检测。

第一个论点是源文章，第二个论点是目标文章。

高尔夫球版：

class Program
{
    static void Main(string[] a)
    {
        Func<XmlDocument,IList<string>> G=delegate(XmlDocument xd){return xd.SelectNodes("//p//a[starts-with(@href,'/wiki/') and not(contains(@href,':'))]").Cast<XmlNode>().Select(n=>n.Attributes["href"].InnerText).ToList();};Action<string> W=delegate(string s){Console.WriteLine(s);};var h=new HashSet<string>();var c=new WebClient();var x=new XmlDocument();var t=c.DownloadString(@"http://wikipedia.org/wiki/"+a[0]);int i=0,C=0;
    GO:
        x.LoadXml(t);var ns=G(x);
    COL:
        var f=ns[i];if(f.Equals("/wiki/"+a[1],StringComparison.OrdinalIgnoreCase)){goto END;}if(h.Contains(f)){W("loop: "+f);i++;goto COL;}else{h.Add(f);i=0;C++;}W(f);t=c.DownloadString(@"http://wikipedia.org"+f);goto GO;
    END:
        W("Found in "+C);
    }
}

可理解的版本：

class Program
{
    // arg[0] source article. arg[1] target article
    static void Main(string[] arg)
    {
        Func<XmlDocument, IList<string>> G = delegate(XmlDocument xd)
        {
            return xd.SelectNodes("//p//a[starts-with(@href,'/wiki/') and not(contains(@href,':'))]").Cast<XmlNode>().Select(n => n.Attributes["href"].InnerText).ToList();
        };
        Action<string> W = delegate(string s) { Console.WriteLine(s); };
        var h = new HashSet<string>(); var c = new WebClient(); var x = new XmlDocument();
        var allText = c.DownloadString(@"http://wikipedia.org/wiki/" + arg[0]);
        int i = 0; int C = 0;
    GO:
        x.LoadXml(allText);
        var ns = G(x);
    COL:
        var f = ns[i];
        if (f.Equals("/wiki/" + arg[1], StringComparison.OrdinalIgnoreCase))
        {
            goto END;
        }
        if (h.Contains(f))
        {
            W("loop: " + f); i++; goto COL;
        }
        else
        {
            h.Add(f); i = 0; C++;
        }
        W(f);
        allText = c.DownloadString(@"http://wikipedia.org" + f);
        goto GO;
    END:
        W("Found in " + C);
    }
}

示例运行，从“天空”到“哲学”：

C:\>wiki.exe Sky Philosophy

/wiki/Earth
/wiki/Geometric_albedo
/wiki/Phase_angle_(astronomy)
/wiki/Observational_astronomy
/wiki/Astronomy
/wiki/Natural_science
/wiki/Sciences
/wiki/Latin_language
/wiki/Classical_antiquity
/wiki/History
/wiki/Ancient_Greek
/wiki/Greek_language
/wiki/Modern_Greek
loop: /wiki/Greek_language
/wiki/Colloquialism
/wiki/Word
/wiki/Linguistics
/wiki/Science
loop: /wiki/Latin_language
/wiki/Knowledge
/wiki/Fact
/wiki/Latin
loop: /wiki/Classical_antiquity
/wiki/Italic_languages
/wiki/Indo-European_languages
/wiki/Language_family
/wiki/Language
/wiki/Human
/wiki/Extinct
/wiki/Biology
loop: /wiki/Natural_science
/wiki/Life
loop: /wiki/Earth
/wiki/Physical_body
/wiki/Physics
loop: /wiki/Greek_language
loop: /wiki/Natural_science
/wiki/Matter
/wiki/Rest_mass
/wiki/Center_of_momentum_frame
loop: /wiki/Physics
/wiki/Inertial_frame
loop: /wiki/Physics
/wiki/Frame_of_reference
loop: /wiki/Physics
/wiki/Coordinate_system
/wiki/Geometry
loop: /wiki/Ancient_Greek
/wiki/Mathematics
/wiki/Quantity
/wiki/Property_(philosophy)
/wiki/Modern_philosophy
Found in 41

C:\>

— thepirat000
source

5

Scala（294个字符=> 1000-294 + 140 = 846点）

更新的解决方案将自动获取下一个链接（如果已使用下一个链接）。感谢您的140点奖励积分。

逻辑： 拾取第一个没有“：”链接的“ / wiki”链接（因此它将忽略“ File：”链接）。漂洗并重复递归，每次返回计数+1。我方便地保存了所有先前输出的列表，因此程序不会陷入无限循环。

正则表达式：我有2种形式的正则表达式。

"<p>.*?\"/wiki/([^:]*?)\".*?/p>"在<p>标签内找到链接
"p>.*?/wiki/([^:]*?)\""这是一个稍微更具实验性的标签，已被证明可以工作，但提供不同的结果，因为它有时会从右侧的信息栏中获取链接。这些是常规文章，因此我认为它仍然有效。如果被裁定不是，则OP（或其他人）可以给我评论，然后可以将解决方案更新为更好的reg-ex。

我将使用第二个正则表达式，直到找到一个无法正常工作的测试用例，或者OP不允许从侧边栏拾取链接（我认为信息栏仍然是实际文章本身；更多是摘要）。

缩小来源：

object W extends App{print(x(Seq(args(0))));def x(s:Seq[Any]):Int={val? =s.last;println(?);?match{case "Philosophy"=>1;case _=>x(s:+"p>.*?/wiki/([^:]*?)\".*?/p>".r.findAllMatchIn(io.Source.fromURL("http://en.wikipedia.org/wiki/"+ ?).getLines.mkString).map(_ group 1).filter(!s.contains(_)).next)+1}}}

可读来源：

object W extends App {
  print(x(Seq(args(0))))

  def x(s: Seq[Any]): Int = {
    val ? = s.last
    println(?)
    ? match {
      case "Philosophy" => 1
      case _ => x(s :+ "p>.*?/wiki/([^:]*?)\"".r.findAllMatchIn(io.Source.fromURL("http://en.wikipedia.org/wiki/" + ?).getLines.mkString).map(_ group 1).filter(!s.contains(_)).next) + 1
    }
  }
}

样本输出：

输入项

Space_toilet

输出量

Space_toilet
Weightlessness
G-force
Weight
Force
SI_unit
French_language
Second_language
Language_acquisition
Word
Linguistics
Science
Latin_language
Pontifical_Academy_for_Latin
Pope_Benedict_XVI
Pope_Benedict_(disambiguation)
Regnal_name#Catholic_Church
Monarch
State_(polity)
Community
Commutative_property
Mathematics
Quantity
Property_(philosophy)
Modern_philosophy
Philosophy
26

— Javatarz
source

1

Scala不需要主要对象或方法。您可以使用解释器将其作为“ scala <文件名> [args ..]”运行。使用args(0)获得的第一个参数，摆脱你的object和main定义，我想你可以删除:Int了。pastebin.com/YqywKcG8

— KChaloux 2014年

原来您无法删除: Int。没有意识到您正在进行递归调用。同样，我的binbin来自旧的可读源，但是同样的概念也适用。

— KChaloux 2014年

我将尝试摆脱主要方法。是的，递归调用使我在:Int此处添加了。今天晚些时候，我还将添加一种可读的333 char解决方案形式。感谢您的建议@KChaloux

— javatarz 2014年

1

就像我说的那样，object Q extends App { ... }如果您使用解释器运行代码而不是使用scalac进行编译，则完全不需要引用。只需运行scala <filename> [args..]

— KChaloux 2014年

4

R，379个字符；1000-379 + 170 = 791点

请求用户检测到循环时如何继续的版本

library(XML);w="http://en.wikipedia.org";W="/wiki/";n=1;A=c(scan(,""));while(A[n]!="Philosophy"){a=paste0(w,W,A[n]);d=sapply(strsplit(grep(W,sapply(xpathApply(xmlParse(readLines(a)),"//p/a[@href]|//ul/li/a[@href]",xmlAttrs),`[`,'href'),v=T),"/"),`[`,3);B=d[-grep(":",d)];n=n+1;if(B[1]%in%A)if(readline("Loop!2nd link?")=="n")break;A[n]=head(B[!B%in%A],1);cat(A[n],"\n")};cat(n-1)

带有缩进和注释：

library(XML) #Uses package XML
w="http://en.wikipedia.org"
W="/wiki/"
n=1
A=c(scan(,"")) #Stdin + makes it a vector so we can store each iteration
while(A[n]!="Philosophy"){
    a=paste0(w,W,A[n])
    d=sapply(strsplit(grep(W,sapply( #The heart of the program
             xpathApply(xmlParse(readLines(a)),"//p/a[@href]|//ul/li/a[@href]",xmlAttrs),
             `[`,'href'),v=T),"/"),`[`,3)
    B=d[-grep(":",d)] #get rid of Templates, Files ,etc...
    n=n+1
    #Ask user if should proceed when loop encountered 
    #(any answer other than "n" is considered agreement):
    if(B[1]%in%A)if(readline("Loop!2nd link?")=="n")break
    A[n]=head(B[!B%in%A],1) #Take the first link that is not redundant
    cat(A[n],"\n")
    }
cat(n-1)

示例运行：

> library(XML);w="http://en.wikipedia.org";W="/wiki/";n=1;A=c(scan(,""));while(A[n]!="Philosophy"){a=paste(w,W,A[n],sep="");d=sapply(strsplit(grep(W,sapply(xpathApply(xmlParse(readLines(a)),"//p/a[@href]|//ul/li/a[@href]",xmlAttrs),`[`,'href'),v=T),"/"),`[`,3);B=d[-grep(":",d)];n=n+1;if(B[1]%in%A)if(readline("Loop!2nd link?")=="n")break;A[n]=head(B[!B%in%A],1);cat(A[n],"\n")};cat(n-1)
1: Extended_ASCII
2: 
Read 1 item
Eight-bit 
Computer_architecture 
Computer_science 
Science 
Logic 
List_of_aestheticians 
Art 
Human_behavior 
Behavior 
Organism 
Biology 
Loop!2nd link?y
Mathematics 
Quantity 
Property_(philosophy) 
Modern_philosophy 
Philosophy 
16

R，325个字符；??? 点数

默认情况下采用第一个非冗余链接（即非循环）的版本。

library(XML);w="http://en.wikipedia.org";W="/wiki/";n=1;A=c(scan(,""));while(A[n]!="Philosophy"){a=paste0(w,W,A[n]);d=sapply(strsplit(grep(W,sapply(xpathApply(xmlParse(readLines(a)),"//p/a[@href]|//ul/li/a[@href]",xmlAttrs),`[`,'href'),v=T),"/"),`[`,3);B=d[-grep(":",d)];n=n+1;A[n]=head(B[!B%in%A],1);cat(A[n],"\n")};cat(n-1)

— 浮游动物
source

维基百科：哲学！

红宝石，1000 - 303 299 337 - 50 373 - 170 382 - 170 - 140 379 - 170 - 140个字符= 697 701 713 797 928 931

“重击” –（如果没有记错的话：1000-381 + 170 + 140 = 929点）

编辑1：

编辑2：

扩展并解释了代码：

JavaScript 726（444个字符[556] + 170）

C＃-813个字符

Scala（294个字符=> 1000-294 + 140 = 846点）

R，379个字符；1000-379 + 170 = 791点

R，325个字符；??? 点数