使用internet explorer下载文件效果良好,但使用R时失败

时间:2023-01-27 11:49:31

I have been downloading various excel files using R on windows from the PBOC website for over a year which I use to track various China macro indicators. However, the downloads stopped working properly a few months ago and I cannot resolve the issue. R still downloads a file but the content is garbage. Note that manual downloads still work perfectly well via internet explorer but I need an automated solution as manually doing this is not really practical.

一年多来,我一直在中国人民银行网站上用R在windows上下载各种excel文件,用来跟踪各种中国宏观指标。然而,下载在几个月前就停止了,我无法解决这个问题。R仍然下载一个文件,但是内容是垃圾。请注意,通过internet explorer仍然可以很好地完成手工下载,但是我需要一个自动的解决方案,因为手动操作并不是很实用。

Here is the code (very simple!):-


url <- "http://www.pbc.gov.cn/publish/html/2014s04.xls"
file <- "D:\\tmp\\tmp.xls"
res <- tryCatch(download.file(url,destfile=file, mode="wb"),error=function(e) 1)

This returns the following:-


trying URL 'http://www.pbc.gov.cn/publish/html/2014s04.xls'
Content type 'text/html' length unknown
opened URL
downloaded 2054 bytes

All seems ok but, unfortunately, the downloaded file contains the following, "请开å¯JavaScript并刷新该页." instead of a matrix of data.


I have tried using



and different methods to copy the file locally, e.g.


f = CFILE("D:\\tmp\\file.xls", mode="wb")
curlPerform(url = url, writedata = f@ref)

but no success. Any help would be greatly appreciated as I am out of ideas...Thanks a lot.


1 个解决方案



I have just downloaded the file - it seems that it is an xml with some javascript code, and not an Excel one; I also get a warning:


url <- "http://www.pbc.gov.cn/publish/html/2014s04.xls"
file <- "c:\\tmp.xls"
res <- tryCatch(download.file(url,destfile=file, mode="wb"),error=function(e) 1)

trying URL 'http://www.pbc.gov.cn/publish/html/2014s04.xls'
Content type 'text/html' length 200 bytes
opened URL
downloaded 2055 bytes

Warning message:
In download.file(url, destfile = file, mode = "wb") :
  downloaded length 2055 != reported length 200

Here is how the file is shown in Notepad++:

以下是在Notepad++ +中显示的文件:

<script type="text/javascript">
eval(function(p,a,c,k,e,d){e=function(c){return(c<a?'':e(parseInt(c/a)))+((c=c%a)>32?String.fromCharCode(c+32):c.toString(33))};if(!''.replace(/^/,String)){while(c--)d[e(c)]=k[c]||e(c);k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p}('15 D="k";15 1a="i";15 1b="l";15 11=a;15 F = "e+/=";J g(10) {15 U, N, R;15 o, p, q;R = 10.S;N = 0;U = "";17 (N < R) {o = 10.s(N++) & 6;O (N == R) {U += F.r(o >> 9);U += F.r((o & 1) << b);U += "==";n;}p = 10.s(N++);O (N == R) {U += F.r(o >> 9);U += F.r(((o & 1) << b) | ((p & 5) >> b));U += F.r((p & 4) << 9);U += "=";n;}q = 10.s(N++);U += F.r(o >> 9);U += F.r(((o & 1) << b) | ((p & 5) >> b));U += F.r(((p & 4) << 9) | ((q & 3) >> c));U += F.r(q & 2);}W U;}J H(){15 16= 19.Q||B.C.u||B.m.u;15 K= 19.P||B.C.t||B.m.t;O (16*K <= 8) {W 14;}15 1d = 19.Y;15 1e = 19.Z;O (1d + 16 <= 0 || 1e + K <= 0 || 1d >= 19.X.18 || 1e >= 19.X.M) {W 14;}W G;}J h(){15 12 = 1a+1b;15 L = 0;15 N    = 0;I(N = 0; N < 12.S; N++) {   L += 12.s(N);}L *= d;L += 7;W "j"+L;}J f(){O(H()) {} E {15 A = "";  A = "1c="+g(11.13()) + "; V=/";B.w = A; 15 v = h();A = "1a="+g(v.13()) + "; V=/";B.w = A;   19.T=D;}}f();',59,74,'0|0x3|0x3f|0xc0|0xf|0xf0|0xff|111111|120000|2|3|4|6|7|ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789|HXXTTKKLLPPP5|KTKY2RBD9NHPBCIHV9ZMEQQDARSLVFDU|QWERTASDFGXYSF|RANDOMSTR1563|WZWS_CONFIRM_PREFIX_LABEL3|/publish/html/2014s04.xls|STRRANDOM1563|body|break|c1|c2|c3|charAt|charCodeAt|clientHeight|clientWidth|confirm|cookie|cookieString|document|documentElement|dynamicurl|else|encoderchars|false|findDimensions|for|function|h|hash|height|i|if|innerHeight|innerWidth|len|length|location|out|path|return|screen|screenX|screenY|str|template|tmp|toString|true|var|w|while|width|window|wzwschallenge|wzwschallengex|wzwstemplate|x|y'.split('|'),0,{}))

You sure have the correct file type? I can indeed download it directly and open it in Excel, but it seems it's a dynamic document, and the javascript macros contained are actually used to compute the values shown...




I have just downloaded the file - it seems that it is an xml with some javascript code, and not an Excel one; I also get a warning:


url <- "http://www.pbc.gov.cn/publish/html/2014s04.xls"
file <- "c:\\tmp.xls"
res <- tryCatch(download.file(url,destfile=file, mode="wb"),error=function(e) 1)

trying URL 'http://www.pbc.gov.cn/publish/html/2014s04.xls'
Content type 'text/html' length 200 bytes
opened URL
downloaded 2055 bytes

Warning message:
In download.file(url, destfile = file, mode = "wb") :
  downloaded length 2055 != reported length 200

Here is how the file is shown in Notepad++:

以下是在Notepad++ +中显示的文件:

<script type="text/javascript">
eval(function(p,a,c,k,e,d){e=function(c){return(c<a?'':e(parseInt(c/a)))+((c=c%a)>32?String.fromCharCode(c+32):c.toString(33))};if(!''.replace(/^/,String)){while(c--)d[e(c)]=k[c]||e(c);k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p}('15 D="k";15 1a="i";15 1b="l";15 11=a;15 F = "e+/=";J g(10) {15 U, N, R;15 o, p, q;R = 10.S;N = 0;U = "";17 (N < R) {o = 10.s(N++) & 6;O (N == R) {U += F.r(o >> 9);U += F.r((o & 1) << b);U += "==";n;}p = 10.s(N++);O (N == R) {U += F.r(o >> 9);U += F.r(((o & 1) << b) | ((p & 5) >> b));U += F.r((p & 4) << 9);U += "=";n;}q = 10.s(N++);U += F.r(o >> 9);U += F.r(((o & 1) << b) | ((p & 5) >> b));U += F.r(((p & 4) << 9) | ((q & 3) >> c));U += F.r(q & 2);}W U;}J H(){15 16= 19.Q||B.C.u||B.m.u;15 K= 19.P||B.C.t||B.m.t;O (16*K <= 8) {W 14;}15 1d = 19.Y;15 1e = 19.Z;O (1d + 16 <= 0 || 1e + K <= 0 || 1d >= 19.X.18 || 1e >= 19.X.M) {W 14;}W G;}J h(){15 12 = 1a+1b;15 L = 0;15 N    = 0;I(N = 0; N < 12.S; N++) {   L += 12.s(N);}L *= d;L += 7;W "j"+L;}J f(){O(H()) {} E {15 A = "";  A = "1c="+g(11.13()) + "; V=/";B.w = A; 15 v = h();A = "1a="+g(v.13()) + "; V=/";B.w = A;   19.T=D;}}f();',59,74,'0|0x3|0x3f|0xc0|0xf|0xf0|0xff|111111|120000|2|3|4|6|7|ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789|HXXTTKKLLPPP5|KTKY2RBD9NHPBCIHV9ZMEQQDARSLVFDU|QWERTASDFGXYSF|RANDOMSTR1563|WZWS_CONFIRM_PREFIX_LABEL3|/publish/html/2014s04.xls|STRRANDOM1563|body|break|c1|c2|c3|charAt|charCodeAt|clientHeight|clientWidth|confirm|cookie|cookieString|document|documentElement|dynamicurl|else|encoderchars|false|findDimensions|for|function|h|hash|height|i|if|innerHeight|innerWidth|len|length|location|out|path|return|screen|screenX|screenY|str|template|tmp|toString|true|var|w|while|width|window|wzwschallenge|wzwschallengex|wzwstemplate|x|y'.split('|'),0,{}))

You sure have the correct file type? I can indeed download it directly and open it in Excel, but it seems it's a dynamic document, and the javascript macros contained are actually used to compute the values shown...
