AJAX POST下载并上传PDF文件

时间:2021-02-03 21:22:38

I would like to write a chrome extension that downloads PDF file from a website accepting POST requests, and upload the PDF file to my localhost server. Here's my attempt:

我想编写一个chrome扩展程序,从接受POST请求的网站下载PDF文件,并将PDF文件上传到我的localhost服务器。这是我的尝试:

  $.ajax({
        url: 'http://example.com/download.action',
        data: data,
        type: 'POST',
        cache: false,
        crossDomain: true,
        success: function(response) {
            $.ajax({
                url: 'http://localhost/getpdf.php',
                data: response,
                type: 'POST',
                cache: false,
                contentType: 'application/octet-stream',
                processData: false,
                crossDomain: true
            });
        }
    });

From the console I observed the response of the download ajax request, it's a binary content beginning with "%PDF-1.7.%...", seems reasonable. Then in localhost server side, I use some simple PHP code to save the PDF file:

从控制台我观察到下载ajax请求的响应,它是以“%PDF-1.7。%...”开头的二进制内容,似乎是合理的。然后在localhost服务器端,我使用一些简单的PHP代码来保存PDF文件:

<?php
$raw_data = file_get_contents('php://input');
$f = fopen('test.pdf', 'w');
fwrite($f, $raw_data);
fclose($f);
?>

File is saved. But the saved PDF file can't be opened by Adobe Reader (file is damaged), and the file size is about 2 times larger than the original one.

文件已保存。但是Adobe Reader无法打开保存的PDF文件(文件已损坏),文件大小约为原始文件的2倍。

I checked the binaries of the saved PDF file and the original one by vim -b, here're the first 10 lines:

我通过vim -b检查了保存的PDF文件和原始文件的二进制文件,这里是前10行:

The original one:

原来的一个:

0000000: 2550 4446 2d31 2e37 0a25 e4e3 cfd2 0a36  %PDF-1.7.%.....6
0000010: 2030 206f 626a 0a3c 3c2f 5479 7065 2f58   0 obj.<</Type/X
0000020: 4f62 6a65 6374 0a2f 5375 6274 7970 652f  Object./Subtype/
0000030: 466f 726d 0a2f 4242 6f78 5b30 2030 2035  Form./BBox[0 0 5
0000040: 3935 2e32 3736 2038 3431 2e38 395d 0a2f  95.276 841.89]./
0000050: 5265 736f 7572 6365 733c 3c2f 584f 626a  Resources<</XObj
0000060: 6563 743c 3c2f 496d 3020 3720 3020 522f  ect<</Im0 7 0 R/
0000070: 496d 3120 3820 3020 522f 496d 3220 3920  Im1 8 0 R/Im2 9 
0000080: 3020 523e 3e2f 436f 6c6f 7253 7061 6365  0 R>>/ColorSpace
0000090: 3c3c 2f43 5330 2031 3020 3020 522f 4353  <</CS0 10 0 R/CS

The saved one:

保存的一个:

0000000: 2550 4446 2d31 2e37 0a25 efbf bdef bfbd  %PDF-1.7.%......
0000010: efbf bdef bfbd 0a36 2030 206f 626a 0a3c  .......6 0 obj.<
0000020: 3c2f 5479 7065 2f58 4f62 6a65 6374 0a2f  </Type/XObject./
0000030: 5375 6274 7970 652f 466f 726d 0a2f 4242  Subtype/Form./BB
0000040: 6f78 5b30 2030 2035 3935 2e32 3736 2038  ox[0 0 595.276 8
0000050: 3431 2e38 395d 0a2f 5265 736f 7572 6365  41.89]./Resource
0000060: 733c 3c2f 584f 626a 6563 743c 3c2f 496d  s<</XObject<</Im
0000070: 3020 3720 3020 522f 496d 3120 3820 3020  0 7 0 R/Im1 8 0 
0000080: 522f 496d 3220 3920 3020 523e 3e2f 436f  R/Im2 9 0 R>>/Co
0000090: 6c6f 7253 7061 6365 3c3c 2f43 5330 2031  lorSpace<</CS0 1

It seems some words are changed (maybe charset problem?)

似乎有些词被改变了(也许是charset问题?)

Any hints about this?

关于这个的任何提示?

2 个解决方案

#1


0  

You may want to use this way to read pdf text and then write it as your new pdf file

您可能希望使用这种方式读取pdf文本,然后将其写为新的pdf文件

how to get text from pdf file and save it into DB

如何从pdf文件中获取文本并将其保存到DB中

#2


0  

Finally I found a solution that meets my requirements.

最后,我找到了符合我要求的解决方案。

As mentioned by @mkl, there's some replacement to UTF-8 on the original PDF binary data, but we don't know in which step this replacement happens. So I start to search about sending/receiving binary data instead of strings, and I found this, which introduced a feature called "arraybuffer".

正如@mkl所提到的,在原始PDF二进制数据上有一些替代UTF-8,但我们不知道这种替换发生在哪一步。所以我开始搜索发送/接收二进制数据而不是字符串,我发现了这个,它引入了一个名为“arraybuffer”的功能。

According to the article above I changed my js function to this and it works:

根据上面的文章我改变了我的js函数,它的工作原理如下:

var form = $('<form method="post"></form>');
for (var i in data) {
    form.append('<input name="'+i+'" value="'+data[i]+'" />');
}

data = form.serialize();

var oReq = new XMLHttpRequest();
oReq.open('POST', 'http://example.com/download.action', true);
oReq.setRequestHeader("Content-type","application/x-www-form-urlencoded");
oReq.responseType = "arraybuffer";

oReq.onload = function (oEvent) {
    var arrayBuffer = oReq.response;
    if (arrayBuffer) {
        var xhr = new XMLHttpRequest;
        xhr.open("POST", 'http://localhost/getpdf.php', false);
        xhr.send(arrayBuffer);
    }
};

oReq.send(data);

#1


0  

You may want to use this way to read pdf text and then write it as your new pdf file

您可能希望使用这种方式读取pdf文本,然后将其写为新的pdf文件

how to get text from pdf file and save it into DB

如何从pdf文件中获取文本并将其保存到DB中

#2


0  

Finally I found a solution that meets my requirements.

最后,我找到了符合我要求的解决方案。

As mentioned by @mkl, there's some replacement to UTF-8 on the original PDF binary data, but we don't know in which step this replacement happens. So I start to search about sending/receiving binary data instead of strings, and I found this, which introduced a feature called "arraybuffer".

正如@mkl所提到的,在原始PDF二进制数据上有一些替代UTF-8,但我们不知道这种替换发生在哪一步。所以我开始搜索发送/接收二进制数据而不是字符串,我发现了这个,它引入了一个名为“arraybuffer”的功能。

According to the article above I changed my js function to this and it works:

根据上面的文章我改变了我的js函数,它的工作原理如下:

var form = $('<form method="post"></form>');
for (var i in data) {
    form.append('<input name="'+i+'" value="'+data[i]+'" />');
}

data = form.serialize();

var oReq = new XMLHttpRequest();
oReq.open('POST', 'http://example.com/download.action', true);
oReq.setRequestHeader("Content-type","application/x-www-form-urlencoded");
oReq.responseType = "arraybuffer";

oReq.onload = function (oEvent) {
    var arrayBuffer = oReq.response;
    if (arrayBuffer) {
        var xhr = new XMLHttpRequest;
        xhr.open("POST", 'http://localhost/getpdf.php', false);
        xhr.send(arrayBuffer);
    }
};

oReq.send(data);