read.xls - 读取可变长度的工作表列表及其名称

时间:2022-09-10 20:19:00

Given several .xls files with varying number of sheets, I am reading them into R usingread.xls from the gdata package. I have two related issues (solving the second issue should solve the first):

鉴于几个.xls文件具有不同数量的工作表,我将它们从gdata包中读入R usingread.xls。我有两个相关的问题(解决第二个问题应解决第一个问题):

  1. It is unknown ahead of time how many sheets each .xls file will have, and in fact this value will vary from one file to the next.
  2. 事先不知道每个.xls文件将有多少张,实际上这个值会因文件而异。

  3. I need to capture the name of the sheet, which is relevant data
  4. 我需要捕获工作表的名称,这是相关数据

Right now, to resolve (1), I am using try() and iterating over sheet numbers until I hit an error.

现在,要解决(1),我正在使用try()并迭代工作表编号,直到我遇到错误。

How can I grab a list of the names of the sheet so that I can iterate over them?

如何获取工作表名称列表以便我可以迭代它们?

2 个解决方案

#1


9  

See the sheetCount and sheetNames functions (on same help page) in gdata. If xls <- "a.xls", say, then reading all sheets of a spreadsheet into a list, one sheet per component, is just this:

请参阅gdata中的sheetCount和sheetNames函数(在同一帮助页面上)。如果xls < - “a.xls”,那么,然后将电子表格的所有表格读入列表,每个组件一张,就是这样:

sapply(sheetNames(xls), read.xls, xls = xls, simplify = FALSE)

Note that the components will be named using the names of the sheets. Depending on the content it might make sense to remove simplify = FALSE.

请注意,组件将使用工作表的名称命名。根据内容,删除simplify = FALSE可能有意义。

#2


8  

For such tasks I use library XLConnect. With its functions you can get the names of each sheet in a vector and then just determine the length of that vector.

对于此类任务,我使用库XLConnect。通过它的功能,您可以在向量中获取每个工作表的名称,然后只确定该向量的长度。

#Read your workbook 
wb<-loadWorkbook("Your_workbook.xls")

#Save each sheet's name as a vector
lp<-getSheets(wb)

#Now read each sheet as separate list element
dat<-lapply(seq_along(lp),function(i) readWorksheet(wb,sheet=lp[i]))

UPDATE

As suggested by @Martin Studer XLConnect functions are already vectorized, so there is no need to use lapply(), instead just provide vector of sheet names or use function getSheets() inside readWorksheet().

正如@Martin Studer所建议的,XLConnect函数已经过矢量化,因此不需要使用lapply(),而只需提供工作表名称的向量或在readWorksheet()中使用函数getSheets()。

dat <- readWorksheet(wb, sheet = getSheets(wb))

#1


9  

See the sheetCount and sheetNames functions (on same help page) in gdata. If xls <- "a.xls", say, then reading all sheets of a spreadsheet into a list, one sheet per component, is just this:

请参阅gdata中的sheetCount和sheetNames函数(在同一帮助页面上)。如果xls < - “a.xls”,那么,然后将电子表格的所有表格读入列表,每个组件一张,就是这样:

sapply(sheetNames(xls), read.xls, xls = xls, simplify = FALSE)

Note that the components will be named using the names of the sheets. Depending on the content it might make sense to remove simplify = FALSE.

请注意,组件将使用工作表的名称命名。根据内容,删除simplify = FALSE可能有意义。

#2


8  

For such tasks I use library XLConnect. With its functions you can get the names of each sheet in a vector and then just determine the length of that vector.

对于此类任务,我使用库XLConnect。通过它的功能,您可以在向量中获取每个工作表的名称,然后只确定该向量的长度。

#Read your workbook 
wb<-loadWorkbook("Your_workbook.xls")

#Save each sheet's name as a vector
lp<-getSheets(wb)

#Now read each sheet as separate list element
dat<-lapply(seq_along(lp),function(i) readWorksheet(wb,sheet=lp[i]))

UPDATE

As suggested by @Martin Studer XLConnect functions are already vectorized, so there is no need to use lapply(), instead just provide vector of sheet names or use function getSheets() inside readWorksheet().

正如@Martin Studer所建议的,XLConnect函数已经过矢量化,因此不需要使用lapply(),而只需提供工作表名称的向量或在readWorksheet()中使用函数getSheets()。

dat <- readWorksheet(wb, sheet = getSheets(wb))