PowerShell,532字节
$1,$2=$args
$a={irm "api.stackexchange.com/2.2/questions/$args/answers?pagesize=100&site=codegolf&filter=!9YdnSMKKT"|% i*}
$r={$args.body-replace"(?sm).*?^(<pre.*?>)?<code>(.*?)</code>.*",'$2'}
$1=&$a $1;$2=&$a $2
(0..($1.count-1)|%{
$c=&$r $1[$_]
0..($2.count-1)|%{
&{$c,$d=$args;$e,$f=$c,$d|% le*;$m=[object[,]]::new($f+1,$e+1);0..$e|%{$m[0,$_]=$_};0..$f|%{$m[$_,0]=$_};1..$e|%{$i=$_;1..$f|%{$m[$_,$i]=(($m[($_-1),$i]+1),($m[$_,($i-1)]+1),($m[($_-1),($i-1)]+((1,0)[($c[($i-1)]-eq$d[($_-1)])]))|sort)[0]}};$m[$f,$e]} $c $d
}
}|sort)[0]
为了方便阅读,我在其中添加了换行符。仍然反映在我的字节数中。
可以肯定,我对此有把握。对我而言,最困难的部分实际上是获得Levenshtein距离,因为据我所知,PowerShell没有内置的距离。因此,我能够回答有关Levenshtein距离的挑战。当我的代码引用LD的匿名函数时,您可以参考该答案以获取有关其工作原理的更详细说明。
带有注释和进度指示器的代码
代码可能真的很慢(由于LD),所以我为自己建立了一些进度指示器,因此我可以按照动作进行操作,而不必假设它陷入某个地方的循环中。监视进度的代码不在顶层,也没有计入我的字节数。
# Assign the two integers into two variables.
$1,$2=$args
# Quick function to download up to 100 of the answer object to a given question using the SE API
$a={irm "api.stackexchange.com/2.2/questions/$args/answers?pagesize=100&site=codegolf&filter=!9YdnSMKKT"|% i*}
# Quick function that takes the body (as HTML) of an answer and parses out the likely codeblock from it.
$r={$args.body-replace"(?sm).*?^(<pre.*?>)?<code>(.*?)</code>.*",'$2'}
# Get the array of answers from the two questions linked.
$1=&$a $1;$2=&$a $2
# Hash table of parameters used for Write-Progress
# LD calcuations can be really slow on larger strings so I used this for testing so I knew
# how much longer I needed to wait.
$parentProgressParameters = @{
ID = 1
Activity = "Get LD of all questions"
Status = "Counting poppy seeds on the bagel"
}
$childProgressParameters = @{
ID = 2
ParentID = 1
Status = "Progress"
}
# Cycle each code block from each answer against each answer in the other question.
(0..($1.count-1)|%{
# Get the code block from this answer
$c=&$r $1[$_]
# Next line just for displaying progress. Not part of code.
Write-Progress @parentProgressParameters -PercentComplete (($_+1) / $1.count * 100) -CurrentOperation "Answer $($_+1) from question 1"
0..($2.count-1)|%{
# Get the code block from this answer
$d=&$r $2[$_]
# Next two lines are for progress display. Not part of code.
$childProgressParameters.Activity = "Comparing answer $($_+1) of $($2.count)"
Write-Progress @childProgressParameters -PercentComplete (($_+1) / $2.count * 100) -CurrentOperation "Answer $($_+1) from question 2"
# Anonymous function to calculate Levenstien Distance
# Get a better look at that function here: /codegolf//a/123389/52023
&{$c,$d=$args;$e,$f=$c,$d|% le*;$m=[object[,]]::new($f+1,$e+1);0..$e|%{$m[0,$_]=$_};0..$f|%{$m[$_,0]=$_};1..$e|%{$i=$_;1..$f|%{$m[$_,$i]=(($m[($_-1),$i]+1),($m[$_,($i-1)]+1),($m[($_-1),($i-1)]+((1,0)[($c[($i-1)]-eq$d[($_-1)])]))|sort)[0]}};$m[$f,$e]} $c $d
}
# Collect results and sort leaving the smallest number on top.
}|sort)[0]
查找代码块的逻辑是将答案作为HTML并寻找一个代码标签集,该标签集可选地由以其自己的行开始的pre标签集包围。在测试中,它在6个不同的问题集上找到了所有正确的数据。
我尝试使用降价代码工作,但是很难找到正确的代码块。
样品运行
Challenge-Similarity-Detector 97752 122740
57
Challenge-Similarity-Detector 115715 116616
23