世界上最小的Web浏览器


72

幕后故事:

您可以在一家大型跨国公司享受新的编程工作。但是,由于您的计算机只有一个CLI,因此不允许浏览网络。它们还会对所有员工的硬盘进行扫描,因此您不能简单地下载大型的CLI Web浏览器。您决定制作一个尽可能小的小型文本浏览器,以便记住并每天将其键入到临时文件中。

挑战:

您的任务是在命令行界面中创建高尔夫Web浏览器。这应该:

  • 通过args或stdin输入一个URL
  • 拆分URL 的directoryhost组件
  • 发送一个简单的HTTP请求到host以请求directory
  • 打印任何<p>段落</p>标签的内容
  • 然后退出或要求另一页

更多信息:

一个简单的HTTP请求如下所示:

GET {{path}} HTTP/1.1
Host: {{host}}
Connection: close
\n\n

结束换行符强调。

典型的响应如下:

HTTP/1.1 200 OK\n
<some headers separated by newlines>
\n\n
<html>
....rest of page

规则:

  • 它只需要在端口80上工作(不需要SSL)
  • 您可能不使用netcat
  • 无论使用哪种编程语言,都仅允许使用低级TCP API(netcat除外)
  • 您可能使用GUI,请记住,这是一个CLI
  • 除内置解析器外,您不能使用HTML解析器(BeautifulSoup不是内置的)
  • 奖金!!如果您的程序循环返回并要求另一个URL而不是退出,则为-40个字符(只要您不使用递归)
  • 没有第三方程序。请记住,您无法安装任何东西。
  • ,所以最短的字节数获胜

7
Python,import webbrowser;webbrowser.open(url)
蓝色

8
@muddyfish阅读规则
TheDoctor 2015年

4
您可以提供某种示例网页进行测试吗?很难找到使用<p>:P的地方
意大利面2015年


3
低级套接字接口的限制似乎禁止了大多数具有TCP级API的语言的TCP级API。
彼得·泰勒

Answers:


63

纯Bash(无实用程序),200字节-40奖励= 160

while read u;do
u=${u#*//}
d=${u%%/*}
exec 3<>/dev/tcp/$d/80
echo "GET /${u#*/} HTTP/1.1
host:$d
Connection:close
">&3
mapfile -tu3 A
a=${A[@]}
a=${a#*<p>}
a=${a%</p>*}
echo "${a//<\/p>*<p>/"
"}"
done

我认为这符合规范,尽管当然要提防使用正则表达式解析HTML我认为比使用正则表达式解析HTML唯一糟糕的是使用外壳模式匹配解析HTML。

现在,这涉及<p>...</p>跨多行。每个<p>...</p>都在单独的输出行上:

$ echo "http://example.com/" | ./smallbrowse.sh
This domain is established to be used for illustrative examples in documents. You may use this     domain in examples without prior coordination or asking for permission.
<a href="http://www.iana.org/domains/example">More information...</a>
$ 

35
您需要在明天之前记住此内容。
Conor O'Brien 2015年

14
+∞用于“使用外壳模式匹配解析HTML”
SztupY

76
-1,因为您的头像是潜意识消息
TheDoctor 2015年

1
...您可以从Bash建立TCP连接吗?现在我真的很害怕!
MathematicalOrchid

2
注意:/dev/tcp是可选扩展名,可能不存在于您的bash构建中。您需要进行编译--enable-net-redirections才能拥有它。
克里斯·唐

21

PHP,175个字节(215 - 40奖金)227 229 239 202 216 186字节

玩得开心浏览网页:

for(;$i=parse_url(trim(fgets(STDIN))),fwrite($f=fsockopen($h=$i[host],80),"GET $i[path] HTTP/1.1
Host:$h
Connection:Close

");preg_match_all('!<p>(.+?)</p>!si',stream_get_contents($f),$r),print join("
",$r[1])."
");

STDINlike 读取URL http://www.example.com/。输出以换行符“ \n” 分隔的段落。


不打高尔夫球

for(; $i=parse_url(trim(fgets(STDIN))); ) {
    $h = $i['host'];
    $f = fsockopen($h, 80);

    fwrite($f, "GET " . $i['path'] . " HTTP/1.1\nHost:" . $h . "\nConnection:Close\n\n");

    $c = stream_get_contents($f)

    preg_match_all('!<p>(.+?)</p>!si', $c, $r);
    echo join("\n", $r[1]) . "\n";
}

第一个版本仅支持一个URL

$i=parse_url($argv[1]);fwrite($f=fsockopen($h=$i[host],80),"GET $i[path] HTTP/1.1\nHost:$h\nConnection:Close\n\n");while(!feof($f))$c.=fgets($f);preg_match_all('!<p>(.+?)</p>!sim',$c,$r);foreach($r[1]as$p)echo"$p\n";

在此处输入图片说明


编辑

  • 正如Braintist在评论中所指出的那样,我完全忘记了路径。现在已解决,谢谢。增加了 30个字节
  • 通过用代替重置(保留页面内容)节省了3个字节$c$c=$i=parse_url(trim(fgets(STDIN)));$c=''
  • 通过用新行(5个字节)替换,用-2个字节(2个字节)替换一个循环,将几乎所有内容放入(2个字节)表达式中并用(3个字节)替换来节省12个字节。感谢黑洞\nwhileforforforeachjoin
  • 通过感谢bwoebi替换,节省了3个字节fgetsstream_get_contents
  • 通过删除不再需要的重新初始化,节省了5个字节$c $c
  • 通过从正则表达式中删除模式修饰符,节省了1个字节m。感谢manatwork


1
@briantist哦,老兄,我完全错过了。:D谢谢,现在已修复。
此处插入用户名,2015年

1
我不能忍受Perl击败PHP,所以不要忘记:while高尔夫时禁止使用(for通常较短,但永远不会更长),而要换行,只需按Enter(1个字节代替2个字节\n)即可!这是您(未试用的)代码,使用了更多的高尔夫(227字节),换行符替换为for(;$c=$i=parse_url(trim(fgets(STDIN))),fwrite($f=fsockopen($h=$i[host],80),"GET $i[path] HTTP/1.1↵Host:$h↵Connection:Close↵↵");preg_match_all('!<p>(.+?)</p>!sim',$c,$r),print join('↵',$r[1]).'↵')for(;!feof($f);)$c.=fgets($f);
Blackhole

1
我并不是说“禁止”是“违反规则”,我的意思是根本没有用,因为for-loop总是比while-loop 更好;)。
Blackhole,2015年

1
@MichaelDibbets实际上,我已经按照编辑中的说明进行了操作。嗯 让我看看。哈哈,我忘了抄录最后的摘要了。Duh:D如果您在早餐前更新代码,就会发生类似的事情。感谢您指出。
2015年

14

Perl,132个字节

155个字节的代码+ 17个表示-ln -MIO::Socket--40个连续询问网址

与@DigitalTrauma的答案一样,正则表达式解析HTML,请让我知道是否不可接受。不再继续解析URL ...我待会儿再看……虽然接近Bash!非常感谢@ Schwern拥有救了我59(!)个字节,并@ skmrx用于修复该错误,使奖金的要求!

m|(http://)?([^/]+)(/(\S*))?|;$s=new IO::Socket::INET"$2:80";print$s "GET /$4 HTTP/1.1
Host:$2
Connection:close

";local$/=$,;print<$s>=~m|<p>(.+?)</p>|gs

用法

$perl -ln -MIO::Socket -M5.010 wb.pl 
example.com
This domain is established to be used for illustrative examples in documents. You may use this
    domain in examples without prior coordination or asking for permission.<a href="http://www.iana.org/domains/example">More information...</a>
example.org
This domain is established to be used for illustrative examples in documents. You may use this
    domain in examples without prior coordination or asking for permission.<a href="http://www.iana.org/domains/example">More information...</a>

我修复了一个错误,并通过消除了声明$ h和$ p或具有默认路径的需要来缩短了代码。它还不再需要主机上的尾随/。
Schwern 2015年

1
我们是现在要击败的人。:)
Schwern

我想我整晚都做完了。:)
Schwern

由于脚本要求输入另一个URL而不是退出,因此您可以声明一个额外的-40个字节
svsd 2015年

1
@DigitalTrauma你确实是正确的!由于skmrx用'$ /'修复了我的错误,我已经获得了奖金,而且如果不是Schwern的话,我也不会靠近你!
唐·黑斯廷斯

13

PowerShell,315294268262254字节

355 334 308 302 294-40进行提示

$u=[uri]$args[0]
for(){
$h=$u.Host
$s=[Net.Sockets.TcpClient]::new($h,80).GetStream()
$r=[IO.StreamReader]::new($s)
$w=[IO.StreamWriter]::new($s)
$w.Write("GET $($u.PathAndQuery) HTTP/1.1
HOST: $h

")
$w.Flush()
($r.ReadToEnd()|sls '(?s)(?<=<p>).+?(?=</p>)'-a).Matches.Value
[uri]$u=Read-Host
}

需要PowerShell v5

所有行尾(包括嵌入在字符串中的行尾)仅是换行符\n(感谢Blackhole),PowerShell完全支持换行符(但是如果您要进行测试,请当心; ISE使用\r\n)。


4
+1使我的服务器管理员职责看起来效率更高
2015年

HTTP需要CRLF,而不是LF![ HTTPSYNTAX ]
牙刷,

2
@牙刷哈!采取了观点,但容忍规定似乎已完全生效。显然,此任务是关于有效的方法,而不是正确的方法(否则,我们将不会使用regex解析HTML并使用低级TCP库而不是经过良好测试的现有库)。
briantist

1
@briantist greenbytes.de/tech/webdav/rfc7230.html#rfc.section.3.5说:“接收者可以将单个LF识别为行终止符,而忽略前面的任何CR”。我的意思是,大多数Web服务器都可以实现它,但问题绝对不是说它必须生成正确的 GET请求…… :)
Toothbrush 2015年

8

Groovy脚本,89,61字节

环回以获取奖金101- 40 = 61

System.in.eachLine{l->l.toURL().text.findAll(/<p>(?s)(.*?)<\/p>/).each{println it[3..it.length()-5]}}

仅args,89个字节

this.args[0].toURL().text.findAll(/<p>(?s)(.*?)<\/p>/).each{println it[3..it.length()-5]}

1
Groovy超越了所有人。应该的。
意大利面条

1
@quartata如果保持这种状态,这将是有史以来第一次,所以...;)
Geobits 2015年

11
“仅允许使用低级TCP API”
Digital Trauma 2015年

是的,我要同意@DigitalTrauma的意见,即这没有使用低级TCP API。规则指出,您必须自己拆分主机和路径。
TheDoctor

6

重击(可能会作弊,但似乎在规则范围内)144-40 = 105

while read a;do
u=${a#*//}
d=${u%%/*}
e=www.w3.org
exec 3<>/dev/tcp/$d/80
echo "GET /services/html2txt?url=$a HTTP/1.1
Host:$d
">&3
cat <&3
done

感谢数字创伤。

由于我不需要拆分URL,因此也可以:122-40 = 82

while read a;do
d=www.w3.org
exec 3<>/dev/tcp/$d/80
echo "GET /services/html2txt?url=$a HTTP/1.1
Host:$d
">&3   
cat <&3
done

8
我认为使用此在线html2txt转换器是一个标准漏洞
Digital Trauma

1
是。而且我还使用cat,因此您的解决方案是安全的。
philcolbourn 2015年

5

C 512字节

#include <netdb.h>
int main(){char i,S[999],b[99],*p,s=socket(2,1,0),*m[]={"<p>","</p>"};long n;
gets(S);p=strchr(S,'/');*p++=0;struct sockaddr_in a={0,2,5<<12};memcpy(&a.
sin_addr,gethostbyname(S)->h_addr,4);connect(s,&a,16);send(s,b,sprintf(b,
"GET /%s HTTP/1.0\r\nHost:%s\r\nAccept:*/*\r\nConnection:close\r\n\r\n",p,S),0);
p=m[i=0];while((n=recv(s,b,98,0))>0)for(char*c=b;c<b+n;c++){while(*c==*p &&*++p)
c++;if(!*p)p=m[(i=!i)||puts("")];else{while(p>m[i]){if(i)putchar(c[m[i]-p]);p--;}
if(i)putchar(*c);}}} 

粗略地基于我在此处的输入,它采用的网址不带前导“ https://”。它不会<p>正确处理嵌套对:(

经过广泛测试,使用www.w3.org/People/Berners-Lee/
It编译Apple LLVM version 6.1.0 (clang-602.0.53) / Target: x86_64-apple-darwin14.1.1
时可以正常工作,它具有足够的不确定行为,可能无法在其他任何地方工作。


我经历了大致相同的跟踪(使用gcc编译时会出现此段错误),但是在C语言中应该可以达到400字节以下。不确定clang,但是您不必声明main的返回类型。您也可以删除包含并改为将结构作为整数数组“访问”。我也收到了“ GET /%s HTTP / 1.1 \ r \ n \ r \ n \”的响应,但是根据网站的不同,行驶里程可能会有所不同...
Comintern 2015年

5

红宝石,118

147个字节的源;11个字节' -lprsocket'; -40个字节用于循环。

*_,h,p=$_.split'/',4
$_=(TCPSocket.new(h,80)<<"GET /#{p} HTTP/1.1
Host:#{h}
Connection:close

").read.gsub(/((\A|<\/p>).*?)?(<p>|\Z)/mi,'
').strip

用法示例:

$ ruby -lprsocket wb.rb
http://example.org/
This domain is established to be used for illustrative examples in documents. You may use this
    domain in examples without prior coordination or asking for permission.
<a href="http://www.iana.org/domains/example">More information...</a>
http://www.xkcd.com/1596/
Warning: this comic occasionally contains strong language (which may be unsuitable for children), unusual humor (which may be unsuitable for adults), and advanced mathematics (which may be unsuitable for liberal-arts majors).

This work is licensed under a
<a href="http://creativecommons.org/licenses/by-nc/2.5/">Creative Commons Attribution-NonCommercial 2.5 License</a>.


This means you're free to copy and share these comics (but not to sell them). <a rel="license" href="/license.html">More details</a>.

4

AutoIt,347字节

Func _($0)
$4=StringTrimLeft
$0=$4($0,7)
$3=StringSplit($0,"/")[1]
TCPStartup()
$2=TCPConnect(TCPNameToIP($3),80)
TCPSend($2,'GET /'&$4($0,StringLen($3))&' HTTP/1.1'&@LF&'Host: '&$3&@LF&'Connection: close'&@LF&@LF)
$1=''
Do
$1&=TCPRecv($2,1)
Until @extended
For $5 In StringRegExp($1,"(?s)\Q<p>\E(.*?)(?=\Q</p>\E)",3)
ConsoleWrite($5)
Next
EndFunc

测试中

输入:

_('http://www.autoitscript.com')

输出:

You don't have permission to access /error/noindex.html
on this server.

输入:

_('http://www.autoitscript.com/site')

输出:

The document has moved <a href="https://www.autoitscript.com/site">here</a>.

备注

  • 不支持嵌套<p>标签
  • 仅支持<p>标签(不区分大小写),将在所有其他标签格式上中断
  • 发生任何错误时,Panics会无限期循环

4

C#,727字节-40 = 687字节

using System.Text.RegularExpressions;class P{static void Main(){a:var i=System.Console.ReadLine();if(i.StartsWith("http://"))i=i.Substring(7);string p="/",h=i;var l=i.IndexOf(p);
if(l>0){h=i.Substring(0,l);p=i.Substring(l,i.Length-l);}var c=new System.Net.Sockets.TcpClient(h,80);var e=System.Text.Encoding.ASCII;var d=e.GetBytes("GET "+p+@" HTTP/1.1
Host: "+h+@"
Connection: close

");var s=c.GetStream();s.Write(d,0,d.Length);byte[]b=new byte[256],o;var m=new System.IO.MemoryStream();while(true){var r=s.Read(b,0,b.Length);if(r<=0){o=m.ToArray();break;}m.Write(b,0,r);}foreach (Match x in new Regex("<p>(.+?)</p>",RegexOptions.Singleline).Matches(e.GetString(o)))System.Console.WriteLine(x.Groups[1].Value);goto a;}}

这是一些培训,但肯定值得纪念:)

这是一个非高尔夫版本:

using System.Text.RegularExpressions;
class P
{
    static void Main()
    {
    a:
        var input = System.Console.ReadLine();
        if (input.StartsWith("http://")) input = input.Substring(7);
        string path = "/", hostname = input;
        var firstSlashIndex = input.IndexOf(path);
        if (firstSlashIndex > 0)
        {
            hostname = input.Substring(0, firstSlashIndex);
            path = input.Substring(firstSlashIndex, input.Length - firstSlashIndex);
        }
        var tcpClient = new System.Net.Sockets.TcpClient(hostname, 80);
        var asciiEncoding = System.Text.Encoding.ASCII;
        var dataToSend = asciiEncoding.GetBytes("GET " + path + @" HTTP/1.1
Host: " + hostname + @"
Connection: close

");
        var stream = tcpClient.GetStream();
        stream.Write(dataToSend, 0, dataToSend.Length);
        byte[] buff = new byte[256], output;
        var ms = new System.IO.MemoryStream();
        while (true)
        {
            var numberOfBytesRead = stream.Read(buff, 0, buff.Length);
            if (numberOfBytesRead <= 0)
            {
                output = ms.ToArray();
                break;
            }
            ms.Write(buff, 0, numberOfBytesRead);
        }
        foreach (Match match in new Regex("<p>(.+?)</p>", RegexOptions.Singleline).Matches(asciiEncoding.GetString(output)))
        {
            System.Console.WriteLine(match.Groups[1].Value);
            goto a;
        }
    }
}

如您所见,还有内存泄漏问题作为奖励:)


内存泄漏在哪里?我看不到using流周围的任何语句,但这不会造成泄漏。
古斯多

您可以修剪更多的字节:input = input.trimStart(“ http://”)将替换“ if”子句,并且您应该能够直接使用System.Text.Encoding.ASCII.GetBytes()首先将其存储在asciiEncoding中。我想您甚至会推出“使用系统”;并摆脱了少数“系统”。
minnmass 2015年

3

JavaScript的(的NodeJS) - 187 166

s=require("net").connect(80,p=process.argv[2],_=>s.write("GET / HTTP/1.0\nHost: "+p+"\n\n")&s.on("data",d=>(d+"").replace(/<p>([^]+?)<\/p>/g,(_,g)=>console.log(g))));

187:

s=require("net").connect(80,p=process.argv[2],_=>s.write("GET / HTTP/1.1\nHost: "+p+"\nConnection: close\n\n")&s.on("data",d=>(d+"").replace(/<p>([^]+?)<\/p>/gm,(_,g)=>console.log(g))));

用法:

node file.js www.example.com

或格式化

var url = process.argv[2];
s=require("net").connect(80, url ,_=> {
     s.write("GET / HTTP/1.1\nHost: "+url+"\nConnection: close\n\n");
     s.on("data",d=>(d+"").replace(/<p>([^]+?)<\/p>/gm,(_,g)=>console.log(g)))
});

1
警告:这将适用于较小的页面-较大的页面会发出多个数据事件。
本杰明·格林鲍姆

3

蟒2 - 212个 209字节

import socket,re
h,_,d=raw_input().partition('/')
s=socket.create_connection((h,80))
s.sendall('GET /%s HTTP/1.1\nHost:%s\n\n'%(d,h))
p=''
while h:h=s.recv(9);p+=h
for g in re.findall('<p>(.*?)</p>',p):print g

您可以通过在冒号while h:之前和之前去除空格来节省两个字节print g
斯凯勒

还有一个字节'GET /%s HTTP/1.1\nHost:%s\n\n'
Cees Timmerman

3

Python 2,187-40 = 147(REPL中的141)

Zac答案的压缩和循环版本:

import socket,re
while 1:h,_,d=raw_input().partition('/');s=socket.create_connection((h,80));s.sendall('GET /%s HTTP/1.1\nHost:%s\n\n'%(d,h));print re.findall('<p>(.*?)</p>',s.recv(9000))

例:

dictionary.com
['The document has moved <a href="http://dictionary.reference.com/">here</a>.']
dictionary.reference.com
[]
paragraph.com
[]
rare.com
[]

实际有用的是:

207-40 = 167

import socket,re
while 1:h,_,d=raw_input().partition('/');s=socket.create_connection((h,80));s.sendall('GET /%s HTTP/1.1\nHost:%s\n\n'%(d,h));print'\n'.join(re.findall('<p>(.*?)</p>',s.recv(9000),re.DOTALL))

例:

example.org
This domain is established to be used for illustrative examples in documents. You may use this
    domain in examples without prior coordination or asking for permission.
<a href="http://www.iana.org/domains/example">More information...</a>
www.iana.org/domains/example
The document has moved <a href="/domains/reserved">here</a>.
www.iana.org/domains/reserved

dictionary.com
The document has moved <a href="http://dictionary.reference.com/">here</a>.
dictionary.reference.com

catb.org

      <a href="http://validator.w3.org/check/referer"><img
          src="http://www.w3.org/Icons/valid-xhtml10"
          alt="Valid XHTML 1.0!" height="31" width="88" /></a>

This is catb.org, named after (the) Cathedral and the Bazaar. Most
of it, under directory esr, is my personal site.  In theory other
people could shelter here as well, but this has yet to occur.
catb.org/jargon
The document has moved <a href="http://www.catb.org/jargon/">here</a>.
www.catb.org/jargon/
This page indexes all the WWW resources associated with the Jargon File
and its print version, <cite>The New Hacker's Dictionary</cite>. It's as
official as anything associated with the Jargon File gets.
On 23 October 2003, the Jargon File achieved the
dubious honor of being cited in the SCO-vs.-IBM lawsuit.  See the <a
href='html/F/FUD.html'>FUD</a> entry for details.
www.catb.org/jargon/html/F/FUD.html
 Defined by Gene Amdahl after he left IBM to found his own company:
   &#8220;<span class="quote">FUD is the fear, uncertainty, and doubt that IBM sales people
   instill in the minds of potential customers who might be considering
   [Amdahl] products.</span>&#8221; The idea, of course, was to persuade them to go
   with safe IBM gear rather than with competitors' equipment.  This implicit
   coercion was traditionally accomplished by promising that Good Things would
   happen to people who stuck with IBM, but Dark Shadows loomed over the
   future of competitors' equipment or software.  See
   <a href="../I/IBM.html"><i class="glossterm">IBM</i></a>.  After 1990 the term FUD was associated
   increasingly frequently with <a href="../M/Microsoft.html"><i class="glossterm">Microsoft</i></a>, and has
   become generalized to refer to any kind of disinformation used as a
   competitive weapon.
[In 2003, SCO sued IBM in an action which, among other things,
   alleged SCO's proprietary control of <a href="../L/Linux.html"><i class="glossterm">Linux</i></a>.  The SCO
   suit rapidly became infamous for the number and magnitude of falsehoods
   alleged in SCO's filings.  In October 2003, SCO's lawyers filed a <a href="http://www.groklaw.net/article.php?story=20031024191141102" target="_top">memorandum</a>
   in which they actually had the temerity to link to the web version of
   <span class="emphasis"><em>this entry</em></span> in furtherance of their claims. Whilst we
   appreciate the compliment of being treated as an authority, we can return
   it only by observing that SCO has become a nest of liars and thieves
   compared to which IBM at its historic worst looked positively
   angelic. Any judge or law clerk reading this should surf through to
   <a href="http://www.catb.org/~esr/sco.html" target="_top">my collected resources</a> on this
   topic for the appalling details.&#8212;ESR]

1

gawk,235-40 = 195个字节

{for(print"GET "substr($0,j)" HTTP/1.1\nHost:"h"\n"|&(x="/inet/tcp/0/"(h=substr($0,1,(j=index($0,"/"))-1))"/80");(x|&getline)>0;)w=w RS$0
for(;o=index(w,"<p>");w=substr(w,c))print substr(w=substr(w,o+3),1,c=index(w,"/p>")-2)
close(x)}

打败它,但这是一个更令人无法忍受的版本,它需要网址而http://不是开头。如果要访问根目录,则必须在地址末尾添加一个/。此外,<p>标签必须小写。

我的早期版本实际上没有处理</p><p>正确包含的行。现在,此问题已解决。

输入输出 example.com/

This domain is established to be used for illustrative examples in documents. You may use this
    domain in examples without prior coordination or asking for permission.
<a href="http://www.iana.org/domains/example">More information...</a>

仍不适用于Wikipedia。我认为原因是维基百科使用https了所有内容。但是我不知道。

下面的版本对输入内容宽容得多,它也可以处理大写标签。

IGNORECASE=1{
    s=substr($0,(i=index($0,"//"))?i+2:0)
    x="/inet/tcp/0/"(h=(j=index(s,"/"))?substr(s,1,j-1):s)"/80"
    print"GET "substr(s,j)" HTTP/1.1\nHost:"h"\nConnection:close\n"|&x
    while((x|&getline)>0)w=w RS$0
    for(;o=index(w,"<p>");w=substr(w,c))
        print substr(w=substr(w,o+3),1,c=index(w,"/p>")-2)
    close(x)
}

我不确定这"Connection:close"条线。似乎不是强制性的。无论有没有它,我都找不到一个可以工作的例子。


1

Powershell(4)240

$input=Read-Host ""
$url=[uri]$input
$dir=$url.LocalPath
Do{
$res=Invoke-WebRequest -URI($url.Host+"/"+$dir) -Method Get
$res.ParsedHtml.getElementsByTagName('p')|foreach-object{write-host $_.innerText}
$dir=Read-Host ""
}While($dir -NE "")

取消高尔夫(不需要代理)

$system_proxyUri=Get-ItemProperty -Path "HKCU:\Software\Microsoft\Windows\CurrentVersion\Internet Settings" -Name ProxyServer
$proxy = [System.Net.WebRequest]::GetSystemWebProxy()
$proxyUri = $proxy.GetProxy($system_proxyUri.ProxyServer)
$input = Read-Host "Initial url"
#$input="http://stackoverflow.com/questions/tagged/powershell"
$url=[uri]$input
$dir=$url.LocalPath
Do{
$res=Invoke-WebRequest -URI($url.Host+"/"+$dir) -Method Get -Proxy($proxyUri)
$res.ParsedHtml.getElementsByTagName('p')|foreach-object{write-host $_.innerText}
$dir=Read-Host "next dir"
}While($dir -NE "")

编辑*也不难记住^^


-1

爪哇620 B

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;

public class JavaApplication12 {

    public static void main(String[] args) {
        try {             
            BufferedReader i = new BufferedReader(new InputStreamReader(new URL(args[0]).openStream()));
            String l;
            boolean print = false;
            while ((l = i.readLine()) != null) {
                if (l.toLowerCase().contains("<p>")) {
                    print = true;
                }
                if (print) {
                    if (l.toLowerCase().contains("</p>")) {
                        print = false;
                    }
                    System.out.println(l);
                }
            }

        } catch (Exception e) {

        }
    }

}

2
欢迎来到编程难题和代码高尔夫球!很遗憾,此提交无效。该问题仅允许使用低级TCP API,因此您不能使用InputStreamReader
丹尼斯

1
哦,我很抱歉,谢谢你指出这一点。在下一个答案中会做得更好
Shalika Ashan 2015年
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.