如何使用Java检查URL是否存在或返回404?


75
String urlString = "http://www.nbc.com/Heroes/novels/downloads/Heroes_novel_001.pdf";
URL url = new URL(urlString);
if(/* Url does not return 404 */) {
    System.out.println("exists");
} else {
    System.out.println("does not exists");
}
urlString = "http://www.nbc.com/Heroes/novels/downloads/Heroes_novel_190.pdf";
url = new URL(urlString);
if(/* Url does not return 404 */) {
    System.out.println("exists");
} else {
    System.out.println("does not exists");
}

这应该打印

exists
does not exists

测试

public static String URL = "http://www.nbc.com/Heroes/novels/downloads/";

public static int getResponseCode(String urlString) throws MalformedURLException, IOException {
    URL u = new URL(urlString); 
    HttpURLConnection huc =  (HttpURLConnection)  u.openConnection(); 
    huc.setRequestMethod("GET"); 
    huc.connect(); 
    return huc.getResponseCode();
}

System.out.println(getResponseCode(URL + "Heroes_novel_001.pdf")); 
System.out.println(getResponseCode(URL + "Heroes_novel_190.pdf"));   
System.out.println(getResponseCode("http://www.example.com")); 
System.out.println(getResponseCode("http://www.example.com/junk"));           

输出量

200
200
200
404

在.connect()之前添加下一行,输出将为200、404、200、404

huc.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)");

我在您的测试中看不到问题。在我的浏览器中,我没有得到显示第二个结果的内容,但没有得到404
Brian Agnew

实际上,我似乎得到了一个基本上为空的HTML页面
Brian Agnew

1
该网站似乎提供了大多数内容的有效内容。例如www.nbc.com/junk。尝试使用example.com/junk.html
Brian Agnew

URL nbc.com/Heroes/novels/downloads/Heroes_novel_190.pdf为我提供了一个完全空白的页面(甚至没有<html>标记),但带有404标头。对用户不是很好,但是技术上正确。
Michael Borgwardt

1
您应该将解决方案分成一个答案,这样我也可以投票赞成!
Kingsolmn

Answers:


59

您可能要添加

HttpURLConnection.setFollowRedirects(false);
// note : or
//        huc.setInstanceFollowRedirects(false)

如果您不想遵循重定向(3XX)

您只需要一个“ HEAD”,而不是执行“ GET”。

huc.setRequestMethod("HEAD");
return (huc.getResponseCode() == HttpURLConnection.HTTP_OK);

18
为HEAD +1,人们不时会忘记HTTP的工作方式,但有些人还记得这是一件好事:)
Benjamin Gruenbaum 2012年

1
处理HTTPS网址更加棘手吧?必须管理证书...
Jayy 2015年

43

这为我工作:

URL u = new URL ( "http://www.example.com/");
HttpURLConnection huc =  ( HttpURLConnection )  u.openConnection (); 
huc.setRequestMethod ("GET");  //OR  huc.setRequestMethod ("HEAD"); 
huc.connect () ; 
int code = huc.getResponseCode() ;
System.out.println(code);

感谢您的建议。


23

通过调用使用HttpUrlConnectionopenConnection()URL对象来。

从连接读取后,getResponseCode()将为您提供HTTP响应。

例如

   URL u = new URL("http://www.example.com/"); 
   HttpURLConnection huc = (HttpURLConnection)u.openConnection(); 
   huc.setRequestMethod("GET"); 
   huc.connect() ; 
   OutputStream os = huc.getOutputStream(); 
   int code = huc.getResponseCode(); 

(未测试)


12

您的代码没有错。是NBC.com对您进行欺骗。当NBC.com决定您的浏览器无法显示PDF时,无论您请求什么,即使它不存在,它也只会发送回一个网页。

您需要通过告诉它您的浏览器可以欺骗它,例如

conn.setRequestProperty("User-Agent",
    "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13");

11

根据给出的答案和问题中的信息,这是您应该使用的代码:

public static boolean doesURLExist(URL url) throws IOException
{
    // We want to check the current URL
    HttpURLConnection.setFollowRedirects(false);

    HttpURLConnection httpURLConnection = (HttpURLConnection) url.openConnection();

    // We don't need to get data
    httpURLConnection.setRequestMethod("HEAD");

    // Some websites don't like programmatic access so pretend to be a browser
    httpURLConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)");
    int responseCode = httpURLConnection.getResponseCode();

    // We only accept response code 200
    return responseCode == HttpURLConnection.HTTP_OK;
}

当然可以测试和工作了。

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.