
时间:2022-01-14 09:21:48

I have a relatively simple issue when writing out in R with fwrite from the data.table package I am getting a character vector interpreted as scientific notation by Excel. You can run the following code to create the data issue:


#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))
fwrite(samp,"test.csv",row.names = F)

When you read this back into R you get values back no problem if you have scinote disable. My less code capable colleagues work with the csv directly in excel and they see this:



They can attempt to change the variable to text but excel then interprets all the zeros. I want them to see the original "7E39" from the data table created. Any ideas how to avoid this issue?


PS: I'm working with millions of rows so write.csv is not really an option



One workaround I've found is to just create a mock variable with quotes:


samp = data.table(id = c("7E39", "7G32","5D99999"))[,id2:=shQuote(id)]

I prefer a tidyr solution (pun intended), as I hate unnecessary columns



Following R2Evan's solution I adapted it to data table with the following (factoring another numerical column, to see if any changes occured):


#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))[,second_var:=c(1,2,3)]

fwrite(samp[,id:=sprintf("=%s", shQuote(id))],
          "foo.csv", row.names=FALSE)

2 个解决方案



It's a kludge, and dang-it for Excel to force this (I've dealt with it before).


write.csv(data.frame(id=sprintf("=%s", shQuote(c("7E39", "7G32","5D99999")))),
          "foo.csv", row.names=FALSE)

This is forcing Excel to consider that column a formula, and interpret it as such. You'll see that in Excel, it is a literal formula that assigns a static string.



This is obviously not portable and prone to all sorts of problems, but that is Excel's way in this regard.


(BTW: I used write.csv here, but frankly it doesn't matter which function you use, as long as it passes the string through.)




Another option, but one that your consumers will need to do, not you.


If you export the file "as is", meaning the cell content is just "7E39", then an auto-import within Excel will always try to be smart about that cell's content. However, you can manually import the data.


Using Excel 2016 (32bit, on win10_64bit, if it matters):

使用Excel 2016(32位,在win10_64bit,如果重要):

  1. Open Excel (first), have an (optionally empty) worksheet already open
  2. 打开Excel(第一个),打开一个(可选空)工作表

  3. On the ribbon: Data > Get External Data > From Text
  4. 在功能区上:数据>获取外部数据>从文本

  5. Navigate to the appropriate file (CSV)
  6. 导航到相应的文件(CSV)

  7. Select "Delimited" (file type), click Next, select "Comma" (and optionally deselect any others that may default to selected), Next
  8. 选择“分隔”(文件类型),单击“下一步”,选择“逗号”(并可选择取消选择可能默认选择的任何其他内容),然后单击“下一步”。

  9. Click on the specific column(s) and set the "Default data format" to "Text" (this will need to be done for any/all columns where this is a problem). Multiple columns can be Shift-selected (for a range of columns), but not Ctrl-selected. Finish.
  10. 单击特定列并将“默认数据格式”设置为“文本”(这将需要对存在此问题的任何/所有列执行此操作)。可以按Shift键选择多列(对于一系列列),但不能选择Ctrl。完。

  11. Choose the top-left cell to import/paste the data (or a new worksheet)
  12. 选择左上角的单元格以导入/粘贴数据(或新的工作表)

  13. Select Properties..., and deselect "Save query definition". Without this step, the data is considered a query into an external data source, which may not be a problem but makes some things a little annoying. (For example, try to highlight all data and delete it ... Excel really wants to make sure you know what you're doing there.)
  14. 选择“属性...”,然后取消选择“保存查询定义”。如果没有这一步骤,数据被认为是对外部数据源的查询,这可能不是问题,但会使一些事情有点烦人。 (例如,尝试突出显示所有数据并将其删除... Excel确实希望确保您知道自己在做什么。)

This method provides a portable solution. It "punishes" the Excel users, but anybody/anything else will still be able to consume the files directly without change. The biggest disadvantage with this method is that you won't know if somebody loads it incorrectly unless/until they get odd results when the try to use the data and some fields are silently converted.




It's a kludge, and dang-it for Excel to force this (I've dealt with it before).


write.csv(data.frame(id=sprintf("=%s", shQuote(c("7E39", "7G32","5D99999")))),
          "foo.csv", row.names=FALSE)

This is forcing Excel to consider that column a formula, and interpret it as such. You'll see that in Excel, it is a literal formula that assigns a static string.



This is obviously not portable and prone to all sorts of problems, but that is Excel's way in this regard.


(BTW: I used write.csv here, but frankly it doesn't matter which function you use, as long as it passes the string through.)




Another option, but one that your consumers will need to do, not you.


If you export the file "as is", meaning the cell content is just "7E39", then an auto-import within Excel will always try to be smart about that cell's content. However, you can manually import the data.


Using Excel 2016 (32bit, on win10_64bit, if it matters):

使用Excel 2016(32位,在win10_64bit,如果重要):

  1. Open Excel (first), have an (optionally empty) worksheet already open
  2. 打开Excel(第一个),打开一个(可选空)工作表

  3. On the ribbon: Data > Get External Data > From Text
  4. 在功能区上:数据>获取外部数据>从文本

  5. Navigate to the appropriate file (CSV)
  6. 导航到相应的文件(CSV)

  7. Select "Delimited" (file type), click Next, select "Comma" (and optionally deselect any others that may default to selected), Next
  8. 选择“分隔”(文件类型),单击“下一步”,选择“逗号”(并可选择取消选择可能默认选择的任何其他内容),然后单击“下一步”。

  9. Click on the specific column(s) and set the "Default data format" to "Text" (this will need to be done for any/all columns where this is a problem). Multiple columns can be Shift-selected (for a range of columns), but not Ctrl-selected. Finish.
  10. 单击特定列并将“默认数据格式”设置为“文本”(这将需要对存在此问题的任何/所有列执行此操作)。可以按Shift键选择多列(对于一系列列),但不能选择Ctrl。完。

  11. Choose the top-left cell to import/paste the data (or a new worksheet)
  12. 选择左上角的单元格以导入/粘贴数据(或新的工作表)

  13. Select Properties..., and deselect "Save query definition". Without this step, the data is considered a query into an external data source, which may not be a problem but makes some things a little annoying. (For example, try to highlight all data and delete it ... Excel really wants to make sure you know what you're doing there.)
  14. 选择“属性...”,然后取消选择“保存查询定义”。如果没有这一步骤,数据被认为是对外部数据源的查询,这可能不是问题,但会使一些事情有点烦人。 (例如,尝试突出显示所有数据并将其删除... Excel确实希望确保您知道自己在做什么。)

This method provides a portable solution. It "punishes" the Excel users, but anybody/anything else will still be able to consume the files directly without change. The biggest disadvantage with this method is that you won't know if somebody loads it incorrectly unless/until they get odd results when the try to use the data and some fields are silently converted.
