如何检查URL是否存在或使用Java返回404?

时间:2022-03-30 10:29:19
String urlString = "http://www.nbc.com/Heroes/novels/downloads/Heroes_novel_001.pdf";
URL url = new URL(urlString);
if(/* Url does not return 404 */) {
    System.out.println("exists");
} else {
    System.out.println("does not exists");
}
urlString = "http://www.nbc.com/Heroes/novels/downloads/Heroes_novel_190.pdf";
url = new URL(urlString);
if(/* Url does not return 404 */) {
    System.out.println("exists");
} else {
    System.out.println("does not exists");
}

This should print

这应该打印

exists
does not exists

TEST

测试

public static String URL = "http://www.nbc.com/Heroes/novels/downloads/";

public static int getResponseCode(String urlString) throws MalformedURLException, IOException {
    URL u = new URL(urlString); 
    HttpURLConnection huc =  (HttpURLConnection)  u.openConnection(); 
    huc.setRequestMethod("GET"); 
    huc.connect(); 
    return huc.getResponseCode();
}

System.out.println(getResponseCode(URL + "Heroes_novel_001.pdf")); 
System.out.println(getResponseCode(URL + "Heroes_novel_190.pdf"));   
System.out.println(getResponseCode("http://www.example.com")); 
System.out.println(getResponseCode("http://www.example.com/junk"));           

Output

产量

200
200
200
404

200 200 200 404

SOLUTION

Add the next line before .connect() and the output would be 200, 404, 200, 404

在.connect()之前添加下一行,输出将为200,404,200,404

huc.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)");

5 个解决方案

#1


53  

You may want to add

您可能想要添加

HttpURLConnection.setFollowRedirects(false);
// note : or
//        huc.setInstanceFollowRedirects(false)

if you don't want to follow redirection (3XX)

如果你不想按照重定向(3XX)

Instead of doing a "GET", a "HEAD" is all you need.

不是做“GET”,而是“HEAD”就是你所需要的。

huc.setRequestMethod("HEAD");
return (huc.getResponseCode() == HttpURLConnection.HTTP_OK);

#2


37  

this worked for me:

这对我有用:

URL u = new URL ( "http://www.example.com/");
HttpURLConnection huc =  ( HttpURLConnection )  u.openConnection (); 
huc.setRequestMethod ("GET");  //OR  huc.setRequestMethod ("HEAD"); 
huc.connect () ; 
int code = huc.getResponseCode() ;
System.out.println(code);

thanks for the suggestions above.

感谢上面的建议。

#3


23  

Use HttpUrlConnection by calling openConnection() on your URL object.

通过在URL对象上调用openConnection()来使用HttpUrlConnection。

getResponseCode() will give you the HTTP response once you've read from the connection.

从连接中读取后,getResponseCode()将为您提供HTTP响应。

e.g.

例如

   URL u = new URL("http://www.example.com/"); 
   HttpURLConnection huc = (HttpURLConnection)u.openConnection(); 
   huc.setRequestMethod("GET"); 
   huc.connect() ; 
   OutputStream os = huc.getOutputStream(); 
   int code = huc.getResponseCode(); 

(not tested)

(未测试)

#4


12  

There is nothing wrong with your code. It's the NBC.com doing tricks on you. When NBC.com decides that your browser is not capable of displaying PDF, it simply sends back a webpage regardless what you are requesting, even if it doesn't exist.

您的代码没有任何问题。这是NBC.com对你的伎俩。当NBC.com决定您的浏览器无法显示PDF时,它会简单地发回一个网页,无论您要求的是什么,即使它不存在。

You need to trick it back by telling it your browser is capable, something like,

你需要通过告诉它你的浏览器是否有能力来欺骗它,比如,

conn.setRequestProperty("User-Agent",
    "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13");

#5


3  

Based on the given answers and information in the question, this is the code you should use:

根据问题中给出的答案和信息,这是您应该使用的代码:

public static boolean doesURLExist(URL url) throws IOException
{
    // We want to check the current URL
    HttpURLConnection.setFollowRedirects(false);

    HttpURLConnection httpURLConnection = (HttpURLConnection) url.openConnection();

    // We don't need to get data
    httpURLConnection.setRequestMethod("HEAD");

    // Some websites don't like programmatic access so pretend to be a browser
    httpURLConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)");
    int responseCode = httpURLConnection.getResponseCode();

    // We only accept response code 200
    return responseCode == HttpURLConnection.HTTP_OK;
}

Of course tested and working.

当然经过测试和工作。

#1


53  

You may want to add

您可能想要添加

HttpURLConnection.setFollowRedirects(false);
// note : or
//        huc.setInstanceFollowRedirects(false)

if you don't want to follow redirection (3XX)

如果你不想按照重定向(3XX)

Instead of doing a "GET", a "HEAD" is all you need.

不是做“GET”,而是“HEAD”就是你所需要的。

huc.setRequestMethod("HEAD");
return (huc.getResponseCode() == HttpURLConnection.HTTP_OK);

#2


37  

this worked for me:

这对我有用:

URL u = new URL ( "http://www.example.com/");
HttpURLConnection huc =  ( HttpURLConnection )  u.openConnection (); 
huc.setRequestMethod ("GET");  //OR  huc.setRequestMethod ("HEAD"); 
huc.connect () ; 
int code = huc.getResponseCode() ;
System.out.println(code);

thanks for the suggestions above.

感谢上面的建议。

#3


23  

Use HttpUrlConnection by calling openConnection() on your URL object.

通过在URL对象上调用openConnection()来使用HttpUrlConnection。

getResponseCode() will give you the HTTP response once you've read from the connection.

从连接中读取后,getResponseCode()将为您提供HTTP响应。

e.g.

例如

   URL u = new URL("http://www.example.com/"); 
   HttpURLConnection huc = (HttpURLConnection)u.openConnection(); 
   huc.setRequestMethod("GET"); 
   huc.connect() ; 
   OutputStream os = huc.getOutputStream(); 
   int code = huc.getResponseCode(); 

(not tested)

(未测试)

#4


12  

There is nothing wrong with your code. It's the NBC.com doing tricks on you. When NBC.com decides that your browser is not capable of displaying PDF, it simply sends back a webpage regardless what you are requesting, even if it doesn't exist.

您的代码没有任何问题。这是NBC.com对你的伎俩。当NBC.com决定您的浏览器无法显示PDF时,它会简单地发回一个网页,无论您要求的是什么,即使它不存在。

You need to trick it back by telling it your browser is capable, something like,

你需要通过告诉它你的浏览器是否有能力来欺骗它,比如,

conn.setRequestProperty("User-Agent",
    "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13");

#5


3  

Based on the given answers and information in the question, this is the code you should use:

根据问题中给出的答案和信息,这是您应该使用的代码:

public static boolean doesURLExist(URL url) throws IOException
{
    // We want to check the current URL
    HttpURLConnection.setFollowRedirects(false);

    HttpURLConnection httpURLConnection = (HttpURLConnection) url.openConnection();

    // We don't need to get data
    httpURLConnection.setRequestMethod("HEAD");

    // Some websites don't like programmatic access so pretend to be a browser
    httpURLConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)");
    int responseCode = httpURLConnection.getResponseCode();

    // We only accept response code 200
    return responseCode == HttpURLConnection.HTTP_OK;
}

Of course tested and working.

当然经过测试和工作。