从主页下载文件没有网址。用java

时间:2020-12-20 09:55:51

I need to be able to download a file from a webpage using Java. The problem is that i can't seem to find the exact URL for the file. Example: www.something.com/file.xls.

我需要能够使用Java从网页下载文件。问题是我似乎无法找到该文件的确切URL。示例:www.something.com/file.xls。

The file i need can be find on this url: http://www.nasdaqomxnordic.com/aktier/Historiska_kurser/?Instrument=SSE837#divId You can see the Excel-logo if you scroll down a bit.

我需要的文件可以在这个网址找到:http://www.nasdaqomxnordic.com/aktier/Historiska_kurser/?Instrument=SSE837#divId如果向下滚动,你可以看到Excel徽标。

Would be glad if anyone have a library that might be able to help me with this and also of course any other type of guidance in my problem :)

如果有人有一个图书馆可以帮助我这个,当然还有任何其他类型的指导在我的问题:)会很高兴:)

1 个解决方案

#1


0  

There is no general solution to your problem -- JavaScript can be used to obfuscate what is and is not a link and where the link goes. If you're interested in scraping a specific page, though, you may be able to reverse-engineer it.

您的问题没有通用的解决方案 - 可以使用JavaScript来混淆什么是链接以及链接的位置。但是,如果您对抓取特定页面感兴趣,则可以对其进行反向工程。

On the page you linked, for example, the Excel logo has ID exportExcel. Searching for #exportExcel eventually leads to this code fragment:

例如,在您链接的页面上,Excel徽标具有ID exportExcel。搜索#exportExcel最终导致此代码片段:

if(to.match(/^\d{4}[-]\d{2}[-]\d{2}$/) && from.match(/^\d{4}[-]\d{2}[-]\d{2}$/)) {
    var query = webCore.createQuery( webCore.marketAction.getDataSeries, {}, {
            FromDate: from,
            ToDate: to,
            Instrument: webCore.getInstrument(),
            hi__a : "0,1,2,4,21,8,10,11,12,9",
            OmitNoTrade: "true",
            ext_xslt_lang: currentLanguage,
            ext_xslt_options: "," + $("#adjustedId:checked").val() + ",", //$("#unadjustedId:checked").val() + ",",
            ext_xslt: "hi_table_shares_adjusted.xsl",
            ext_contenttype : "application/ms-excel",
            ext_contenttypefilename : "_" + webCore.getInstrument() + ".xls",
            ext_xslt_hiddenattrs: ",ip,iv,",
            ext_xslt_tableId: "historicalTable"
        }
    );
    $("#excelQuery").val( query );
    $("#excelForm").attr( "action", webCore.proxyURL ).submit();
}

That's building a URL with some sort of hidden form submission. By experimenting and tracing through source code, you should be able to replicate what it's doing in your Java code to generate the URL you want. You'll need some familiarity with JavaScript and JQuery.

那是在构建一个带有某种隐藏表单提交的URL。通过对源代码进行试验和跟踪,您应该能够复制它在Java代码中所做的事情,以生成所需的URL。您需要熟悉JavaScript和JQuery。

Another method is to click the download link while watching your network traffic (via Wireshark for example) and observe the constructed URL that your browser is requesting. You'll need some knowledge of HTTP.

另一种方法是在观察网络流量时单击下载链接(例如通过Wireshark)并观察浏览器请求的构造URL。您需要一些HTTP知识。

#1


0  

There is no general solution to your problem -- JavaScript can be used to obfuscate what is and is not a link and where the link goes. If you're interested in scraping a specific page, though, you may be able to reverse-engineer it.

您的问题没有通用的解决方案 - 可以使用JavaScript来混淆什么是链接以及链接的位置。但是,如果您对抓取特定页面感兴趣,则可以对其进行反向工程。

On the page you linked, for example, the Excel logo has ID exportExcel. Searching for #exportExcel eventually leads to this code fragment:

例如,在您链接的页面上,Excel徽标具有ID exportExcel。搜索#exportExcel最终导致此代码片段:

if(to.match(/^\d{4}[-]\d{2}[-]\d{2}$/) && from.match(/^\d{4}[-]\d{2}[-]\d{2}$/)) {
    var query = webCore.createQuery( webCore.marketAction.getDataSeries, {}, {
            FromDate: from,
            ToDate: to,
            Instrument: webCore.getInstrument(),
            hi__a : "0,1,2,4,21,8,10,11,12,9",
            OmitNoTrade: "true",
            ext_xslt_lang: currentLanguage,
            ext_xslt_options: "," + $("#adjustedId:checked").val() + ",", //$("#unadjustedId:checked").val() + ",",
            ext_xslt: "hi_table_shares_adjusted.xsl",
            ext_contenttype : "application/ms-excel",
            ext_contenttypefilename : "_" + webCore.getInstrument() + ".xls",
            ext_xslt_hiddenattrs: ",ip,iv,",
            ext_xslt_tableId: "historicalTable"
        }
    );
    $("#excelQuery").val( query );
    $("#excelForm").attr( "action", webCore.proxyURL ).submit();
}

That's building a URL with some sort of hidden form submission. By experimenting and tracing through source code, you should be able to replicate what it's doing in your Java code to generate the URL you want. You'll need some familiarity with JavaScript and JQuery.

那是在构建一个带有某种隐藏表单提交的URL。通过对源代码进行试验和跟踪,您应该能够复制它在Java代码中所做的事情,以生成所需的URL。您需要熟悉JavaScript和JQuery。

Another method is to click the download link while watching your network traffic (via Wireshark for example) and observe the constructed URL that your browser is requesting. You'll need some knowledge of HTTP.

另一种方法是在观察网络流量时单击下载链接(例如通过Wireshark)并观察浏览器请求的构造URL。您需要一些HTTP知识。