I have a relatively simple issue when writing out in R with fwrite
from the data.table
package I am getting a character vector interpreted as scientific notation by Excel. You can run the following code to create the data issue:
在R中用data.table包中的fwrite写出一个相对简单的问题我得到一个由Excel解释为科学记数法的字符向量。您可以运行以下代码来创建数据问题:
#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))
fwrite(samp,"test.csv",row.names = F)
When you read this back into R you get values back no problem if you have scinote disable. My less code capable colleagues work with the csv directly in excel and they see this:
当你把它读回到R中时,如果你有scinote禁用,你得到的值没有问题。我的代码能力较差的同事直接在excel中使用csv,他们看到了:
They can attempt to change the variable to text but excel then interprets all the zeros. I want them to see the original "7E39" from the data table created. Any ideas how to avoid this issue?
他们可以尝试将变量更改为文本,但excel然后解释所有的零。我希望他们从创建的数据表中看到原始的“7E39”。任何想法如何避免这个问题?
PS: I'm working with millions of rows so write.csv
is not really an option
PS:我正在处理数百万行,所以write.csv不是一个真正的选择
EDIT:
One workaround I've found is to just create a mock variable with quotes:
我发现的一个解决方法是使用引号创建一个模拟变量:
samp = data.table(id = c("7E39", "7G32","5D99999"))[,id2:=shQuote(id)]
I prefer a tidyr solution (pun intended), as I hate unnecessary columns
我更喜欢tidyr解决方案(双关语),因为我讨厌不必要的列
EDIT2:
Following R2Evan's solution I adapted it to data table with the following (factoring another numerical column, to see if any changes occured):
在R2Evan的解决方案之后,我使用以下内容将其调整到数据表(考虑另一个数字列,以查看是否发生了任何更改):
#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))[,second_var:=c(1,2,3)]
fwrite(samp[,id:=sprintf("=%s", shQuote(id))],
"foo.csv", row.names=FALSE)
2 个解决方案
#1
2
It's a kludge, and dang-it for Excel to force this (I've dealt with it before).
这是一个kludge,并且为了迫使这个(我之前已经处理过)而用于Excel。
write.csv(data.frame(id=sprintf("=%s", shQuote(c("7E39", "7G32","5D99999")))),
"foo.csv", row.names=FALSE)
This is forcing Excel to consider that column a formula, and interpret it as such. You'll see that in Excel, it is a literal formula that assigns a static string.
这迫使Excel将该列视为公式,并将其解释为此类。您将在Excel中看到,它是一个分配静态字符串的文字公式。
This is obviously not portable and prone to all sorts of problems, but that is Excel's way in this regard.
这显然不是可移植的,容易出现各种各样的问题,但这就是Excel在这方面的方式。
(BTW: I used write.csv
here, but frankly it doesn't matter which function you use, as long as it passes the string through.)
(顺便说一句:我在这里使用了write.csv,但坦率地说,只要它传递字符串,你使用哪个函数都没关系。)
#2
1
Another option, but one that your consumers will need to do, not you.
另一种选择,但是你的消费者需要做的,而不是你。
If you export the file "as is", meaning the cell content is just "7E39"
, then an auto-import within Excel will always try to be smart about that cell's content. However, you can manually import the data.
如果您“按原样”导出文件,意味着单元格内容只是“7E39”,那么Excel中的自动导入将始终尝试智能该单元格的内容。但是,您可以手动导入数据。
Using Excel 2016 (32bit, on win10_64bit, if it matters):
使用Excel 2016(32位,在win10_64bit,如果重要):
- Open Excel (first), have an (optionally empty) worksheet already open
- On the ribbon: Data > Get External Data > From Text
- Navigate to the appropriate file (CSV)
- Select "Delimited" (file type), click Next, select "Comma" (and optionally deselect any others that may default to selected), Next
- Click on the specific column(s) and set the "Default data format" to "Text" (this will need to be done for any/all columns where this is a problem). Multiple columns can be Shift-selected (for a range of columns), but not Ctrl-selected. Finish.
- Choose the top-left cell to import/paste the data (or a new worksheet)
- Select Properties..., and deselect "Save query definition". Without this step, the data is considered a query into an external data source, which may not be a problem but makes some things a little annoying. (For example, try to highlight all data and delete it ... Excel really wants to make sure you know what you're doing there.)
打开Excel(第一个),打开一个(可选空)工作表
在功能区上:数据>获取外部数据>从文本
导航到相应的文件(CSV)
选择“分隔”(文件类型),单击“下一步”,选择“逗号”(并可选择取消选择可能默认选择的任何其他内容),然后单击“下一步”。
单击特定列并将“默认数据格式”设置为“文本”(这将需要对存在此问题的任何/所有列执行此操作)。可以按Shift键选择多列(对于一系列列),但不能选择Ctrl。完。
选择左上角的单元格以导入/粘贴数据(或新的工作表)
选择“属性...”,然后取消选择“保存查询定义”。如果没有这一步骤,数据被认为是对外部数据源的查询,这可能不是问题,但会使一些事情有点烦人。 (例如,尝试突出显示所有数据并将其删除... Excel确实希望确保您知道自己在做什么。)
This method provides a portable solution. It "punishes" the Excel users, but anybody/anything else will still be able to consume the files directly without change. The biggest disadvantage with this method is that you won't know if somebody loads it incorrectly unless/until they get odd results when the try to use the data and some fields are silently converted.
该方法提供了便携式解决方案。它“惩罚”Excel用户,但任何人/任何其他人仍然可以直接使用文件而无需更改。这种方法的最大缺点是你不会知道是否有人正确地加载它,除非/当它们在尝试使用数据和某些字段被静默转换时得到奇怪的结果。
#1
2
It's a kludge, and dang-it for Excel to force this (I've dealt with it before).
这是一个kludge,并且为了迫使这个(我之前已经处理过)而用于Excel。
write.csv(data.frame(id=sprintf("=%s", shQuote(c("7E39", "7G32","5D99999")))),
"foo.csv", row.names=FALSE)
This is forcing Excel to consider that column a formula, and interpret it as such. You'll see that in Excel, it is a literal formula that assigns a static string.
这迫使Excel将该列视为公式,并将其解释为此类。您将在Excel中看到,它是一个分配静态字符串的文字公式。
This is obviously not portable and prone to all sorts of problems, but that is Excel's way in this regard.
这显然不是可移植的,容易出现各种各样的问题,但这就是Excel在这方面的方式。
(BTW: I used write.csv
here, but frankly it doesn't matter which function you use, as long as it passes the string through.)
(顺便说一句:我在这里使用了write.csv,但坦率地说,只要它传递字符串,你使用哪个函数都没关系。)
#2
1
Another option, but one that your consumers will need to do, not you.
另一种选择,但是你的消费者需要做的,而不是你。
If you export the file "as is", meaning the cell content is just "7E39"
, then an auto-import within Excel will always try to be smart about that cell's content. However, you can manually import the data.
如果您“按原样”导出文件,意味着单元格内容只是“7E39”,那么Excel中的自动导入将始终尝试智能该单元格的内容。但是,您可以手动导入数据。
Using Excel 2016 (32bit, on win10_64bit, if it matters):
使用Excel 2016(32位,在win10_64bit,如果重要):
- Open Excel (first), have an (optionally empty) worksheet already open
- On the ribbon: Data > Get External Data > From Text
- Navigate to the appropriate file (CSV)
- Select "Delimited" (file type), click Next, select "Comma" (and optionally deselect any others that may default to selected), Next
- Click on the specific column(s) and set the "Default data format" to "Text" (this will need to be done for any/all columns where this is a problem). Multiple columns can be Shift-selected (for a range of columns), but not Ctrl-selected. Finish.
- Choose the top-left cell to import/paste the data (or a new worksheet)
- Select Properties..., and deselect "Save query definition". Without this step, the data is considered a query into an external data source, which may not be a problem but makes some things a little annoying. (For example, try to highlight all data and delete it ... Excel really wants to make sure you know what you're doing there.)
打开Excel(第一个),打开一个(可选空)工作表
在功能区上:数据>获取外部数据>从文本
导航到相应的文件(CSV)
选择“分隔”(文件类型),单击“下一步”,选择“逗号”(并可选择取消选择可能默认选择的任何其他内容),然后单击“下一步”。
单击特定列并将“默认数据格式”设置为“文本”(这将需要对存在此问题的任何/所有列执行此操作)。可以按Shift键选择多列(对于一系列列),但不能选择Ctrl。完。
选择左上角的单元格以导入/粘贴数据(或新的工作表)
选择“属性...”,然后取消选择“保存查询定义”。如果没有这一步骤,数据被认为是对外部数据源的查询,这可能不是问题,但会使一些事情有点烦人。 (例如,尝试突出显示所有数据并将其删除... Excel确实希望确保您知道自己在做什么。)
This method provides a portable solution. It "punishes" the Excel users, but anybody/anything else will still be able to consume the files directly without change. The biggest disadvantage with this method is that you won't know if somebody loads it incorrectly unless/until they get odd results when the try to use the data and some fields are silently converted.
该方法提供了便携式解决方案。它“惩罚”Excel用户,但任何人/任何其他人仍然可以直接使用文件而无需更改。这种方法的最大缺点是你不会知道是否有人正确地加载它,除非/当它们在尝试使用数据和某些字段被静默转换时得到奇怪的结果。