如何在不下载整个页面的情况下确定HTTP状态？

我想知道使用Ubuntu的网站的HTTP状态。我已经使用curl和wget命令用于此目的。但是问题是这些命令下载了完整的网站页面，然后搜索标题并将其显示在屏幕上。例如：

$ curl -I trafficinviter.com
HTTP/1.1 200 OK
Date: Mon, 02 Jan 2017 14:13:14 GMT
Server: Apache
X-Pingback: http://trafficinviter.com/xmlrpc.php
Link: <http://trafficinviter.com/>; rel=shortlink
Set-Cookie: wpfront-notification-bar-landingpage=1
Content-Type: text/html; charset=UTF-8

使用Wget命令执行相同的操作时，将下载整个页面并且不必要地占用了我的带宽。

我正在寻找的是：如何在不实际下载任何页面的情况下获取HTTP状态代码，从而节省带宽消耗。我曾尝试使用curl，但不确定是要下载完整页面还是仅下载系统标题以获取状态代码。

command-line wget curl

— 杰弗·威尔逊（Jaffer Wilson）
source

“尝试使用curl，但不确定是要下载完整页面还是仅下载标题” — curl -v（--verbose）选项是调试curl实际发送和接收内容的便捷方法。

— 贝尼·切尔尼亚夫斯基-帕斯金

恐怕我不赞成，因为您已经在问题中找到了解决方案。

— 与莫妮卡（Monica）进行的Lightness竞赛

@LightnessRacesinOrbit我不知道问题是否是我的答案。我来这里是为了帮助解决我的困惑。如果您仍然发现我的问题是错误的..我欢迎您做出不赞成的决定..谢谢

— Jaffer Wilson

manpages.ubuntu.com/manpages/trusty/zh-CN/man1/curl.1.html

— 与Monica进行的轻度比赛

“这些命令将下载完整的网站页面”-不，他们不会

— 停止

Answers:

curl -I仅获取HTTP标头；它不会下载整个页面。来自man curl：

-I, --head
      (HTTP/FTP/FILE) Fetch the HTTP-header only! HTTP-servers feature
      the command HEAD which this uses to get nothing but  the  header
      of  a  document. When used on an FTP or FILE file, curl displays
      the file size and last modification time only.

另一个选择是安装lynx和使用lynx -head -dump。

HEAD请求由HTTP 1.1协议（RFC 2616）指定：

9.4 HEAD

   The HEAD method is identical to GET except that the server MUST NOT
   return a message-body in the response. The metainformation contained
   in the HTTP headers in response to a HEAD request SHOULD be identical
   to the information sent in response to a GET request. This method can
   be used for obtaining metainformation about the entity implied by the
   request without transferring the entity-body itself. This method is
   often used for testing hypertext links for validity, accessibility,
   and recent modification.

— 亚历克斯
source

HEAD请求是否有可能（在标准范围内..显然有可能）返回不同于GET的状态代码？

— KutuluMike

@KutuluMike：编辑了答案以提供所需的信息。用RFC的话来说，它应该提供相同的元信息。

— AlexP

@duskwuff：然后HEAD请求应该返回相同的

— 405。– AlexP

@AlexP我的错。没关系！

— duskwuff'1

使用wget，您需要使用该--spider选项来发送HEAD请求（例如curl）：

$ wget -S --spider https://google.com
Spider mode enabled. Check if remote file exists.
--2017-01-03 00:08:38--  https://google.com/
Resolving google.com (google.com)... 216.58.197.174
Connecting to google.com (google.com)|216.58.197.174|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 302 Found
  Cache-Control: private
  Content-Type: text/html; charset=UTF-8
  Location: https://www.google.co.jp/?gfe_rd=cr&ei=...
  Content-Length: 262
  Date: Mon, 02 Jan 2017 15:08:38 GMT
  Alt-Svc: quic=":443"; ma=2592000; v="35,34"
Location: https://www.google.co.jp/?gfe_rd=cr&ei=... [following]
Spider mode enabled. Check if remote file exists.
--2017-01-03 00:08:38--  https://www.google.co.jp/?gfe_rd=cr&ei=...
Resolving www.google.co.jp (www.google.co.jp)... 210.139.253.109, 210.139.253.93, 210.139.253.123, ...
Connecting to www.google.co.jp (www.google.co.jp)|210.139.253.109|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Date: Mon, 02 Jan 2017 15:08:38 GMT
  Expires: -1
  Cache-Control: private, max-age=0
  Content-Type: text/html; charset=Shift_JIS
  P3P: CP="This is not a P3P policy! See https://www.google.com/support/accounts/answer/151657?hl=en for more info."
  Server: gws
  X-XSS-Protection: 1; mode=block
  X-Frame-Options: SAMEORIGIN
  Set-Cookie: NID=...; expires=Tue, 04-Jul-2017 15:08:38 GMT; path=/; domain=.google.co.jp; HttpOnly
  Alt-Svc: quic=":443"; ma=2592000; v="35,34"
  Transfer-Encoding: chunked
  Accept-Ranges: none
  Vary: Accept-Encoding
Length: unspecified [text/html]
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.

— uru
source

您不认为wget的朋友会获取整个页面，然后显示标题。

— 贾弗·威尔逊

@JafferWilson读取输出的最后几行。

— muru