如何使用headless下载文件与puppeteer:true?

时间:2022-12-20 19:10:45

I've been running the following code in order to download a csv file from the website http://niftyindices.com/resources/holiday-calendar:

我一直在运行以下代码,以便从网站http://niftyindices.com/resources/holiday-calendar下载csv文件:

const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();

await page.goto('http://niftyindices.com/resources/holiday-calendar');
await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', 
downloadPath: '/tmp'})
await page.click('#exportholidaycalender');
await page.waitFor(5000);
await browser.close();
})();

with headless: false it works, it downloads the file into /Users/user/Downloads. with headless: true it does NOT work.

无头:假它有效,它将文件下载到/ Users / user / Downloads。无头:真的它不起作用。

I'm running this on a macOS Sierra (MacBook Pro) using puppeteer version 1.1.1 which pulls Chromium version 66.0.3347.0 into .local-chromium/ directory and used npm init and npm i --save puppeteer to set it up.

我在macOS Sierra(MacBook Pro)上使用puppeteer 1.1.1版本运行它,它将Chromium版本66.0.3347.0拉入.local-chromium /目录并使用npm init和npm i --save puppeteer来设置它。

Any idea whats wrong?

任何想法都错了吗?

Thanks in advance for your time and help,

在此先感谢您的时间和帮助,

3 个解决方案

#1


3  

This page downloads a csv by creating a comma delimited string and forcing the browser to download it by setting the data type like so

此页面通过创建逗号分隔的字符串并强制浏览器通过设置数据类型来下载它来下载csv

let uri = "data:text/csv;charset=utf-8," + encodeURIComponent(content);
window.open(uri, "Some CSV");

This on chrome opens a new tab.

这在chrome上打开一个新选项卡。

You can tap into this event and physically download the contents into a file. Not sure if this is the best way but works well.

您可以使用此事件并将内容物理下载到文件中。不确定这是否是最佳方式,但效果很好。

const browser = await puppeteer.launch({
  headless: true
});
browser.on('targetcreated', async (target) => {
    let s = target.url();
    //the test opens an about:blank to start - ignore this
    if (s == 'about:blank') {
        return;
    }
    //unencode the characters after removing the content type
    s = s.replace("data:text/csv;charset=utf-8,", "");
    //clean up string by unencoding the %xx
    ...
    fs.writeFile("/tmp/download.csv", s, function(err) {
        if(err) {
            console.log(err);
            return;
        }
        console.log("The file was saved!");
    }); 
});

const page = await browser.newPage();
.. open link ...
.. click on download link ..

#2


0  

I needed to download a file from behind a login, which was being handled by Puppeteer. targetcreated was not being triggered. In the end I downloaded with request, after copying the cookies over from the Puppeteer instance.

我需要从登录后面下载一个文件,该文件由Puppeteer处理。 targetcreated没有被触发。在从Puppeteer实例复制cookie之后,我最后通过请求下载了。

In this case, I'm streaming the file through, but you could just as easily save it.

在这种情况下,我正在流式传输文件,但您可以轻松保存它。

    res.writeHead(200, {
        "Content-Type": 'application/octet-stream',
        "Content-Disposition": `attachment; filename=secretfile.jpg`
    });
    let cookies = await page.cookies();
    let jar = request.jar();
    for (let cookie of cookies) {
        jar.setCookie(`${cookie.name}=${cookie.value}`, "http://secretsite.com");
    }
    try {
        var response = await request({ url: "http://secretsite.com/secretfile.jpg", jar }).pipe(res);
    } catch(err) {
        console.trace(err);
        return res.send({ status: "error", message: err });
    }

#3


0  

I spent hours poring through this thread and Stack Overflow yesterday, trying to figure out how to get Puppeteer to download a csv file by clicking a download link in headless mode in an authenticated session. The accepted answer here didn't work in my case because the download does not trigger targetcreated, and the next answer, for whatever reason, did not retain the authenticated session. This article saved the day. In short, fetch. Hopefully this helps someone else out.

我昨天花了几个小时研究这个线程和Stack Overflow,试图找出如何通过在经过身份验证的会话中单击无头模式下载链接来让Puppeteer下载csv文件。这里接受的答案在我的情况下不起作用,因为下载不会触发targetcreated,并且无论出于何种原因,下一个答案都不会保留经过身份验证的会话。这篇文章救了这一天。简而言之,取。希望这有助于其他人。

const res = await this.page.evaluate(() =>
{
    return fetch('https://example.com/path/to/file.csv', {
        method: 'GET',
        credentials: 'include'
    }).then(r => r.text());
});

#1


3  

This page downloads a csv by creating a comma delimited string and forcing the browser to download it by setting the data type like so

此页面通过创建逗号分隔的字符串并强制浏览器通过设置数据类型来下载它来下载csv

let uri = "data:text/csv;charset=utf-8," + encodeURIComponent(content);
window.open(uri, "Some CSV");

This on chrome opens a new tab.

这在chrome上打开一个新选项卡。

You can tap into this event and physically download the contents into a file. Not sure if this is the best way but works well.

您可以使用此事件并将内容物理下载到文件中。不确定这是否是最佳方式,但效果很好。

const browser = await puppeteer.launch({
  headless: true
});
browser.on('targetcreated', async (target) => {
    let s = target.url();
    //the test opens an about:blank to start - ignore this
    if (s == 'about:blank') {
        return;
    }
    //unencode the characters after removing the content type
    s = s.replace("data:text/csv;charset=utf-8,", "");
    //clean up string by unencoding the %xx
    ...
    fs.writeFile("/tmp/download.csv", s, function(err) {
        if(err) {
            console.log(err);
            return;
        }
        console.log("The file was saved!");
    }); 
});

const page = await browser.newPage();
.. open link ...
.. click on download link ..

#2


0  

I needed to download a file from behind a login, which was being handled by Puppeteer. targetcreated was not being triggered. In the end I downloaded with request, after copying the cookies over from the Puppeteer instance.

我需要从登录后面下载一个文件,该文件由Puppeteer处理。 targetcreated没有被触发。在从Puppeteer实例复制cookie之后,我最后通过请求下载了。

In this case, I'm streaming the file through, but you could just as easily save it.

在这种情况下,我正在流式传输文件,但您可以轻松保存它。

    res.writeHead(200, {
        "Content-Type": 'application/octet-stream',
        "Content-Disposition": `attachment; filename=secretfile.jpg`
    });
    let cookies = await page.cookies();
    let jar = request.jar();
    for (let cookie of cookies) {
        jar.setCookie(`${cookie.name}=${cookie.value}`, "http://secretsite.com");
    }
    try {
        var response = await request({ url: "http://secretsite.com/secretfile.jpg", jar }).pipe(res);
    } catch(err) {
        console.trace(err);
        return res.send({ status: "error", message: err });
    }

#3


0  

I spent hours poring through this thread and Stack Overflow yesterday, trying to figure out how to get Puppeteer to download a csv file by clicking a download link in headless mode in an authenticated session. The accepted answer here didn't work in my case because the download does not trigger targetcreated, and the next answer, for whatever reason, did not retain the authenticated session. This article saved the day. In short, fetch. Hopefully this helps someone else out.

我昨天花了几个小时研究这个线程和Stack Overflow,试图找出如何通过在经过身份验证的会话中单击无头模式下载链接来让Puppeteer下载csv文件。这里接受的答案在我的情况下不起作用,因为下载不会触发targetcreated,并且无论出于何种原因,下一个答案都不会保留经过身份验证的会话。这篇文章救了这一天。简而言之,取。希望这有助于其他人。

const res = await this.page.evaluate(() =>
{
    return fetch('https://example.com/path/to/file.csv', {
        method: 'GET',
        credentials: 'include'
    }).then(r => r.text());
});