如何使用fwrite在R中使用EXCEL来解释字符变量而不使用科学记数法?

时间:2022-01-14 09:21:48

I have a relatively simple issue when writing out in R with fwrite from the data.table package I am getting a character vector interpreted as scientific notation by Excel. You can run the following code to create the data issue:

在R中用data.table包中的fwrite写出一个相对简单的问题我得到一个由Excel解释为科学记数法的字符向量。您可以运行以下代码来创建数据问题:

#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))
fwrite(samp,"test.csv",row.names = F)

When you read this back into R you get values back no problem if you have scinote disable. My less code capable colleagues work with the csv directly in excel and they see this:

当你把它读回到R中时,如果你有scinote禁用,你得到的值没有问题。我的代码能力较差的同事直接在excel中使用csv,他们看到了:

如何使用fwrite在R中使用EXCEL来解释字符变量而不使用科学记数法?

They can attempt to change the variable to text but excel then interprets all the zeros. I want them to see the original "7E39" from the data table created. Any ideas how to avoid this issue?

他们可以尝试将变量更改为文本,但excel然后解释所有的零。我希望他们从创建的数据表中看到原始的“7E39”。任何想法如何避免这个问题?

PS: I'm working with millions of rows so write.csv is not really an option

PS:我正在处理数百万行,所以write.csv不是一个真正的选择

EDIT:

One workaround I've found is to just create a mock variable with quotes:

我发现的一个解决方法是使用引号创建一个模拟变量:

samp = data.table(id = c("7E39", "7G32","5D99999"))[,id2:=shQuote(id)]

I prefer a tidyr solution (pun intended), as I hate unnecessary columns

我更喜欢tidyr解决方案(双关语),因为我讨厌不必要的列

EDIT2:

Following R2Evan's solution I adapted it to data table with the following (factoring another numerical column, to see if any changes occured):

在R2Evan的解决方案之后,我使用以下内容将其调整到数据表(考虑另一个数字列,以查看是否发生了任何更改):

#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))[,second_var:=c(1,2,3)]

fwrite(samp[,id:=sprintf("=%s", shQuote(id))],
          "foo.csv", row.names=FALSE)

2 个解决方案

#1


2  

It's a kludge, and dang-it for Excel to force this (I've dealt with it before).

这是一个kludge,并且为了迫使这个(我之前已经处理过)而用于Excel。

write.csv(data.frame(id=sprintf("=%s", shQuote(c("7E39", "7G32","5D99999")))),
          "foo.csv", row.names=FALSE)

This is forcing Excel to consider that column a formula, and interpret it as such. You'll see that in Excel, it is a literal formula that assigns a static string.

这迫使Excel将该列视为公式,并将其解释为此类。您将在Excel中看到,它是一个分配静态字符串的文字公式。

如何使用fwrite在R中使用EXCEL来解释字符变量而不使用科学记数法?

This is obviously not portable and prone to all sorts of problems, but that is Excel's way in this regard.

这显然不是可移植的,容易出现各种各样的问题,但这就是Excel在这方面的方式。

(BTW: I used write.csv here, but frankly it doesn't matter which function you use, as long as it passes the string through.)

(顺便说一句:我在这里使用了write.csv,但坦率地说,只要它传递字符串,你使用哪个函数都没关系。)

#2


1  

Another option, but one that your consumers will need to do, not you.

另一种选择,但是你的消费者需要做的,而不是你。

If you export the file "as is", meaning the cell content is just "7E39", then an auto-import within Excel will always try to be smart about that cell's content. However, you can manually import the data.

如果您“按原样”导出文件,意味着单元格内容只是“7E39”,那么Excel中的自动导入将始终尝试智能该单元格的内容。但是,您可以手动导入数据。

Using Excel 2016 (32bit, on win10_64bit, if it matters):

使用Excel 2016(32位,在win10_64bit,如果重要):

  1. Open Excel (first), have an (optionally empty) worksheet already open
  2. 打开Excel(第一个),打开一个(可选空)工作表

  3. On the ribbon: Data > Get External Data > From Text
  4. 在功能区上:数据>获取外部数据>从文本

  5. Navigate to the appropriate file (CSV)
  6. 导航到相应的文件(CSV)

  7. Select "Delimited" (file type), click Next, select "Comma" (and optionally deselect any others that may default to selected), Next
  8. 选择“分隔”(文件类型),单击“下一步”,选择“逗号”(并可选择取消选择可能默认选择的任何其他内容),然后单击“下一步”。

  9. Click on the specific column(s) and set the "Default data format" to "Text" (this will need to be done for any/all columns where this is a problem). Multiple columns can be Shift-selected (for a range of columns), but not Ctrl-selected. Finish.
  10. 单击特定列并将“默认数据格式”设置为“文本”(这将需要对存在此问题的任何/所有列执行此操作)。可以按Shift键选择多列(对于一系列列),但不能选择Ctrl。完。

  11. Choose the top-left cell to import/paste the data (or a new worksheet)
  12. 选择左上角的单元格以导入/粘贴数据(或新的工作表)

  13. Select Properties..., and deselect "Save query definition". Without this step, the data is considered a query into an external data source, which may not be a problem but makes some things a little annoying. (For example, try to highlight all data and delete it ... Excel really wants to make sure you know what you're doing there.)
  14. 选择“属性...”,然后取消选择“保存查询定义”。如果没有这一步骤,数据被认为是对外部数据源的查询,这可能不是问题,但会使一些事情有点烦人。 (例如,尝试突出显示所有数据并将其删除... Excel确实希望确保您知道自己在做什么。)

This method provides a portable solution. It "punishes" the Excel users, but anybody/anything else will still be able to consume the files directly without change. The biggest disadvantage with this method is that you won't know if somebody loads it incorrectly unless/until they get odd results when the try to use the data and some fields are silently converted.

该方法提供了便携式解决方案。它“惩罚”Excel用户,但任何人/任何其他人仍然可以直接使用文件而无需更改。这种方法的最大缺点是你不会知道是否有人正确地加载它,除非/当它们在尝试使用数据和某些字段被静默转换时得到奇怪的结果。

#1


2  

It's a kludge, and dang-it for Excel to force this (I've dealt with it before).

这是一个kludge,并且为了迫使这个(我之前已经处理过)而用于Excel。

write.csv(data.frame(id=sprintf("=%s", shQuote(c("7E39", "7G32","5D99999")))),
          "foo.csv", row.names=FALSE)

This is forcing Excel to consider that column a formula, and interpret it as such. You'll see that in Excel, it is a literal formula that assigns a static string.

这迫使Excel将该列视为公式,并将其解释为此类。您将在Excel中看到,它是一个分配静态字符串的文字公式。

如何使用fwrite在R中使用EXCEL来解释字符变量而不使用科学记数法?

This is obviously not portable and prone to all sorts of problems, but that is Excel's way in this regard.

这显然不是可移植的,容易出现各种各样的问题,但这就是Excel在这方面的方式。

(BTW: I used write.csv here, but frankly it doesn't matter which function you use, as long as it passes the string through.)

(顺便说一句:我在这里使用了write.csv,但坦率地说,只要它传递字符串,你使用哪个函数都没关系。)

#2


1  

Another option, but one that your consumers will need to do, not you.

另一种选择,但是你的消费者需要做的,而不是你。

If you export the file "as is", meaning the cell content is just "7E39", then an auto-import within Excel will always try to be smart about that cell's content. However, you can manually import the data.

如果您“按原样”导出文件,意味着单元格内容只是“7E39”,那么Excel中的自动导入将始终尝试智能该单元格的内容。但是,您可以手动导入数据。

Using Excel 2016 (32bit, on win10_64bit, if it matters):

使用Excel 2016(32位,在win10_64bit,如果重要):

  1. Open Excel (first), have an (optionally empty) worksheet already open
  2. 打开Excel(第一个),打开一个(可选空)工作表

  3. On the ribbon: Data > Get External Data > From Text
  4. 在功能区上:数据>获取外部数据>从文本

  5. Navigate to the appropriate file (CSV)
  6. 导航到相应的文件(CSV)

  7. Select "Delimited" (file type), click Next, select "Comma" (and optionally deselect any others that may default to selected), Next
  8. 选择“分隔”(文件类型),单击“下一步”,选择“逗号”(并可选择取消选择可能默认选择的任何其他内容),然后单击“下一步”。

  9. Click on the specific column(s) and set the "Default data format" to "Text" (this will need to be done for any/all columns where this is a problem). Multiple columns can be Shift-selected (for a range of columns), but not Ctrl-selected. Finish.
  10. 单击特定列并将“默认数据格式”设置为“文本”(这将需要对存在此问题的任何/所有列执行此操作)。可以按Shift键选择多列(对于一系列列),但不能选择Ctrl。完。

  11. Choose the top-left cell to import/paste the data (or a new worksheet)
  12. 选择左上角的单元格以导入/粘贴数据(或新的工作表)

  13. Select Properties..., and deselect "Save query definition". Without this step, the data is considered a query into an external data source, which may not be a problem but makes some things a little annoying. (For example, try to highlight all data and delete it ... Excel really wants to make sure you know what you're doing there.)
  14. 选择“属性...”,然后取消选择“保存查询定义”。如果没有这一步骤,数据被认为是对外部数据源的查询,这可能不是问题,但会使一些事情有点烦人。 (例如,尝试突出显示所有数据并将其删除... Excel确实希望确保您知道自己在做什么。)

This method provides a portable solution. It "punishes" the Excel users, but anybody/anything else will still be able to consume the files directly without change. The biggest disadvantage with this method is that you won't know if somebody loads it incorrectly unless/until they get odd results when the try to use the data and some fields are silently converted.

该方法提供了便携式解决方案。它“惩罚”Excel用户,但任何人/任何其他人仍然可以直接使用文件而无需更改。这种方法的最大缺点是你不会知道是否有人正确地加载它,除非/当它们在尝试使用数据和某些字段被静默转换时得到奇怪的结果。