如果页面的URL重定向,如何以编程方式检查?

时间:2022-02-17 22:40:20

I am trying to extract the content of a webpage A. Using groovy I've tried the following

我正在尝试提取网页的内容A.使用groovy我尝试了以下内容

......
String urlStr = "url-of-webpage-A"
String pageText = urlStr.toURL().text
//println pageText
.....

The above code retrieves the text of webPage A as long as it doesn't redirect to an other webpage B. If A redirects to B, the page content of webPage B is retrieved in the pageText variable. Is there a way to code and check if webPage A is redirecting to an other webpage (in groovy or java)?

上面的代码检索webPage A的文本,只要它不重定向到其他网页B.如果A重定向到B,则在pageText变量中检索webPage B的页面内容。有没有办法编码和检查webPage A是否重定向到其他网页(在groovy或java)?

PS: The above piece of code is not a part of server side logic. I am executing it on the client side within the scope of a desktop appilcation.

PS:上面的代码不是服务器端逻辑的一部分。我在桌面应用程序的范围内在客户端执行它。

2 个解决方案

#1


4  

In groovy, you could do what Joachim suggests by doing:

在groovy中,你可以做Joachim建议做的事情:

String location = "url-of-webpage-A"
boolean wasRedirected = false
String pageContent = null

while( location ) {
  new URL( location ).openConnection().with { con ->
    // We'll do redirects ourselves
    con.instanceFollowRedirects = false

    // Get the response code, and the location to jump to (in case of a redirect)
    location = con.getHeaderField( "Location" )
    if( !wasRedirected && location ) {
      wasRedirected = true
    }

    // Read the HTML and close the inputstream
    pageContent = con.inputStream.withReader { it.text }
  }
}

println "wasRedirected:$wasRedirected contentLength:${pageContent.length()}"

If you don't want to be redirected, and want the contents of the first page, you simply need to do:

如果您不想被重定向,并且想要第一页的内容,您只需要:

String location = "url-of-webpage-A"
String pageContent = new URL( location ).openConnection().with { con ->
  // We'll do redirects ourselves
  con.instanceFollowRedirects = false

  // Get the location to jump to (in case of a redirect)
  location = con.getHeaderField( "Location" )

  // Read the HTML and close the inputstream
  con.inputStream.withReader { it.text }
}

if( location ) { 
  println "Page wanted to redirect to $location"
}
println "Content was:"
println pageContent    

#2


14  

In Java you can use URL.openConnection() to get a HttpURLConnection (you'll need to cast). On this you can call setInstanceFollowRedirects(false).

在Java中,您可以使用URL.openConnection()来获取HttpURLConnection(您需要进行强制转换)。在此,您可以调用setInstanceFollowRedirects(false)。

Then you can use getResponseCode() and see if HTTP_MOVED_PERM (301), HTTP_MOVED_TEMP (302) or HTTP_SEE_OTHER (303). They all indicate redirection.

然后,您可以使用getResponseCode()并查看HTTP_MOVED_PERM(301),HTTP_MOVED_TEMP(302)或HTTP_SEE_OTHER(303)。它们都表明重定向。

If you need to know where you're being redirected to, then you can use getHeaderField("Location") to get the location header.

如果您需要知道重定向到的位置,则可以使用getHeaderField(“Location”)来获取位置标头。

#1


4  

In groovy, you could do what Joachim suggests by doing:

在groovy中,你可以做Joachim建议做的事情:

String location = "url-of-webpage-A"
boolean wasRedirected = false
String pageContent = null

while( location ) {
  new URL( location ).openConnection().with { con ->
    // We'll do redirects ourselves
    con.instanceFollowRedirects = false

    // Get the response code, and the location to jump to (in case of a redirect)
    location = con.getHeaderField( "Location" )
    if( !wasRedirected && location ) {
      wasRedirected = true
    }

    // Read the HTML and close the inputstream
    pageContent = con.inputStream.withReader { it.text }
  }
}

println "wasRedirected:$wasRedirected contentLength:${pageContent.length()}"

If you don't want to be redirected, and want the contents of the first page, you simply need to do:

如果您不想被重定向,并且想要第一页的内容,您只需要:

String location = "url-of-webpage-A"
String pageContent = new URL( location ).openConnection().with { con ->
  // We'll do redirects ourselves
  con.instanceFollowRedirects = false

  // Get the location to jump to (in case of a redirect)
  location = con.getHeaderField( "Location" )

  // Read the HTML and close the inputstream
  con.inputStream.withReader { it.text }
}

if( location ) { 
  println "Page wanted to redirect to $location"
}
println "Content was:"
println pageContent    

#2


14  

In Java you can use URL.openConnection() to get a HttpURLConnection (you'll need to cast). On this you can call setInstanceFollowRedirects(false).

在Java中,您可以使用URL.openConnection()来获取HttpURLConnection(您需要进行强制转换)。在此,您可以调用setInstanceFollowRedirects(false)。

Then you can use getResponseCode() and see if HTTP_MOVED_PERM (301), HTTP_MOVED_TEMP (302) or HTTP_SEE_OTHER (303). They all indicate redirection.

然后,您可以使用getResponseCode()并查看HTTP_MOVED_PERM(301),HTTP_MOVED_TEMP(302)或HTTP_SEE_OTHER(303)。它们都表明重定向。

If you need to know where you're being redirected to, then you can use getHeaderField("Location") to get the location header.

如果您需要知道重定向到的位置,则可以使用getHeaderField(“Location”)来获取位置标头。