如何最好地基于一个数据集从RMarkdown生成多个HTML文件?

时间:2021-01-17 09:09:39

I have an RMarkdown report that is very useful and has grown to be several pages long with all the figures and tables in the HTML file.

我有一个非常有用的RMarkdown报告,它已经长了好几页,包含HTML文件中的所有图表和表格。

It uses the same dataset for all the figures and tables.

它对所有的数据和表格使用相同的数据集。

What I would like to do is to keep generating this large html file and then several new subdirectories, each with their own html files and subdirectories within those, each with their own html files.

我想做的是继续生成这个大的html文件,然后是几个新的子目录,每个子目录都有自己的html文件和子目录,每个子目录都有自己的html文件。

In this case, the full report contains data on a department, then each subdirectory would contain an html output related to each group within the department, and each of those would contain a subdirectory with html output for each person in each group. This way if someone is only interested in the metrics of one group, or one person, they look at the most appropriate output.

在这种情况下,完整的报告包含一个部门上的数据,然后每个子目录将包含一个与该部门内每个组相关的html输出,每个子目录将包含一个子目录,每个组中的每个人的html输出。这样,如果某人只对一个组或一个人的度量感兴趣,他们就会查看最合适的输出。

Parent dir: The same large html file with figures and tables generated with data for entire dept.
|
 __Subdir for each group: Output based on same data but only the group's metrics
    |
     __Subdir for each person: Output based on same data but only individual's metrics

What's the best way to arrange this?
1. Is there a code chunk option in RMardkown where I can say, chunk a goes in this html output file, chuck b goes in another?
2. Do I need multiple RMarkdown files, one for each html output, witch some sort of caching between them so I don't have to reprocess all the data? (this would seem silly because I need a lot of html files)
3. Should I give up RMarkdown for this task?

最好的安排是什么?1。RMardkown中是否有代码块选项我可以说,块a进入这个html输出文件,而chuck b进入另一个?2。我是否需要多个RMarkdown文件,每个html输出一个,在它们之间设置某种缓存,这样我就不必重新处理所有数据?(这看起来很傻,因为我需要很多html文件)我应该为这个任务放弃RMarkdown吗?

1 个解决方案

#1


3  

I do something like you're proposing with knitr, and it works very well.

我做的事情就像你和knitr的提议一样,而且效果很好。

Don't tell anyone, but I use a 'for' loop to cycle through a bunch of councils, each of whom get the same report but with their data. I then push the report into a directory structure, zip it and mail it.

不要告诉任何人,但我用一个for循环遍历一堆委员会,每个委员会都得到相同的报告,但都有自己的数据。然后我将报告推到一个目录结构中,并将其压缩并邮寄。

I have an Rmd file that expects two datasets, setA (being the subject) and setB (being its peers)

我有一个Rmd文件,它需要两个数据集,setA(作为主题)和setB(作为对等点)

The flow is something like:

流量是这样的:

set <- assemble_data() # loads whole set
for (report in report_list) {
    setA <- filter(set, subject == report)
    setB <- filter(set, subject != report)
    output_html <- str_c('path/',report,'.html')
    knit_interim <- str_c('path/',report,'md')
    knit_pattern <- 'name of RMd' # I generate more than one report for each place
    knit(knit_pattern) 
    markdowntoHTML(file = knit_interim, output=output_html, stylesheet=stylesheet, encoding='windows-1252')
}

In this way I can produce a report set in a few minutes. My case may be simpler than yours, because the report structure is the same - it's the datasets that change.

这样我就能在几分钟内写出一份报告。我的情况可能比您的要简单,因为报表结构是相同的——更改的是数据集。

Note that this is not a paste of the code (it is slightly more complicated than this) so beware typos etc.

注意,这不是代码的粘贴(它比这个稍微复杂一点),所以要注意拼写错误等等。

The point (as I understand it) is to write an Rmd that expects a dataset of a particular name, and the R code provides local scope for it. I struggled initially with it but it's all quite simple in its execution.

重点(正如我所理解的)是编写一个Rmd,它需要一个特定名称的数据集,并且R代码为它提供了本地范围。一开始我对它很纠结,但它的执行非常简单。

[update: 'How do you pass the data to the RMd files ?'

[更新:'如何将数据传递给RMd文件?'

You don't explicitly need to. In my code above the RMd is written expecting data in setA and setB.

你不需要明确地这样做。在我上面的代码中,RMd是在setA和setB中编写的。

It makes the workflow really easy - you write the template using a dataset (manually filter for one) and then when you're ready you can just run the loop. Like I said, I struggled a bit to understand at first but just jumped in and it all worked out quite nicely.

它使工作流变得非常简单—您可以使用数据集(手动筛选一个)编写模板,然后当您准备好时,您可以运行循环。就像我说的,一开始我有点难以理解,但我只是跳了进去,一切都很顺利。

#1


3  

I do something like you're proposing with knitr, and it works very well.

我做的事情就像你和knitr的提议一样,而且效果很好。

Don't tell anyone, but I use a 'for' loop to cycle through a bunch of councils, each of whom get the same report but with their data. I then push the report into a directory structure, zip it and mail it.

不要告诉任何人,但我用一个for循环遍历一堆委员会,每个委员会都得到相同的报告,但都有自己的数据。然后我将报告推到一个目录结构中,并将其压缩并邮寄。

I have an Rmd file that expects two datasets, setA (being the subject) and setB (being its peers)

我有一个Rmd文件,它需要两个数据集,setA(作为主题)和setB(作为对等点)

The flow is something like:

流量是这样的:

set <- assemble_data() # loads whole set
for (report in report_list) {
    setA <- filter(set, subject == report)
    setB <- filter(set, subject != report)
    output_html <- str_c('path/',report,'.html')
    knit_interim <- str_c('path/',report,'md')
    knit_pattern <- 'name of RMd' # I generate more than one report for each place
    knit(knit_pattern) 
    markdowntoHTML(file = knit_interim, output=output_html, stylesheet=stylesheet, encoding='windows-1252')
}

In this way I can produce a report set in a few minutes. My case may be simpler than yours, because the report structure is the same - it's the datasets that change.

这样我就能在几分钟内写出一份报告。我的情况可能比您的要简单,因为报表结构是相同的——更改的是数据集。

Note that this is not a paste of the code (it is slightly more complicated than this) so beware typos etc.

注意,这不是代码的粘贴(它比这个稍微复杂一点),所以要注意拼写错误等等。

The point (as I understand it) is to write an Rmd that expects a dataset of a particular name, and the R code provides local scope for it. I struggled initially with it but it's all quite simple in its execution.

重点(正如我所理解的)是编写一个Rmd,它需要一个特定名称的数据集,并且R代码为它提供了本地范围。一开始我对它很纠结,但它的执行非常简单。

[update: 'How do you pass the data to the RMd files ?'

[更新:'如何将数据传递给RMd文件?'

You don't explicitly need to. In my code above the RMd is written expecting data in setA and setB.

你不需要明确地这样做。在我上面的代码中,RMd是在setA和setB中编写的。

It makes the workflow really easy - you write the template using a dataset (manually filter for one) and then when you're ready you can just run the loop. Like I said, I struggled a bit to understand at first but just jumped in and it all worked out quite nicely.

它使工作流变得非常简单—您可以使用数据集(手动筛选一个)编写模板,然后当您准备好时,您可以运行循环。就像我说的,一开始我有点难以理解,但我只是跳了进去,一切都很顺利。