I am working with a csv file which was originally formatted in excel. I want to convert the rate column into numeric and remove the "$" sign.
我正在处理一个csv文件,它最初是用excel格式格式化的。我要将rate列转换为数字,并删除“$”符号。
I read in the file with : > NImp <- read.csv("National_TV_Spots 6_30_14 to 8_31_14.csv", sep=",", header=TRUE, stringsAsFactors=FALSE, strip.white=TRUE, na.strings=c("Not Monitored"))
我在文件中读:> NImp <- read。csv(“National_TV_Spots 6 _30_14 8 _31_14。csv", sep=",", header=TRUE, stringsAsFactors=FALSE, strip。白色= TRUE,na。= c字符串(“不监控”))
The data frame looks like this:
数据帧是这样的:
HH.IMP..000. ISCI Creative Program Rate
1 NA IT3896 Rising Costs30 (Opportunity Scholar - No Nursing) NUVO CINEMA $0.00
2 NA IT3896 Rising Costs30 (Opportunity Scholar - No Nursing) NUVO CINEMA $0.00
3 141 IT14429 Rising Costs30 (Opportunity Scholar - No Nursing) BONUS $0.00
4 476 ITES15443H Matthew Traina (B. EECT/A. CEET) :60 (no loc) Law & Order: SVU $0.00
5 NA IT3896 Rising Costs30 (Opportunity Scholar - No Nursing) NUVO CINEMA $0.00
When I do do the conversion, I get an error message: > NImp$Rate <- as.numeric(gsub("$","", NImp$Rate)) Warning message: NAs introduced by coercion
and all values get coerced to NAs.
当我进行转换时,会得到一个错误消息:> NImp$Rate <- as。数字(gsub(“$”、“”、“NImp$Rate”)警告消息:强制引入的NAs,所有值被强制转换为NAs。
I also tried, NImp$Rate <- as.numeric(sub("\\$","", NImp$Rate))
but again got the same warning message. However not all values became NAs - only specific ones. I opened the csv in excel to check and I realized that excel forces csv column width too narrow resulting in "####" cells. These cells are being coerced to "NA" by r
.
我也尝试过,NImp$Rate <- as。数字(sub(“\$”、“”、“NImp$Rate”),但再次得到相同的警告信息。然而,并不是所有的值都变成了NAs——只有特定的值。我在excel中打开了csv,并发现excel使csv列宽度太窄,导致了“####”单元格。这些细胞被r胁迫成NA。
I tried the option of opening the file in notepad and read the notepad file into r
. But I get the same results. The values are correctly displayed in both Notepad and when I read the file into r
. But when I change to numeric, everything that shows as "####" in excel, becomes NA
.
我尝试了在记事本中打开文件并将记事本文件读入r的选项,但是得到了相同的结果。当我把文件读入r时,这些值都正确地显示在记事本中。但是当我把它变成数值时,所有在excel中显示为“#### ###”的东西都变成了NA。
What should I do?
我应该做什么?
Adding str(NImp)
添加str(NImp)
'data.frame': 9859 obs. of 19 variables:
$ Spot.ID : int 13072903 13072904 13072898 13072793 13072905 13072899 13072397 13072476 13072398 13072681 ...
$ Date : chr "6/30/2014" "6/30/2014" "6/30/2014" "6/30/2014" ...
$ Hour : int 0 0 0 0 0 0 1 1 1 2 ...
$ Time : chr "12:08 AM" "12:20 AM" "12:29 AM" "12:30 AM" ...
$ Local.Date : chr "6/30/2014" "6/30/2014" "6/30/2014" "6/30/2014" ...
$ Broadcast.Week : int 1 1 1 1 1 1 1 1 1 1 ...
$ Local.Hour : int 0 0 0 0 0 0 1 1 1 2 ...
$ Local.Time : chr "12:08 AM" "12:20 AM" "12:29 AM" "12:30 AM" ...
$ Market : chr "NATIONAL CABLE" "NATIONAL CABLE" "NATIONAL CABLE" "NATIONAL CABLE" ...
$ Vendor : chr "NUVO" "NUVO" "AFAM" "USA" ...
$ Station : chr "NUVO" "NUVO" "AFAM" "USA" ...
$ M18.34.IMP..000.: int NA NA 3 88 NA 3 NA 53 NA 37 ...
$ W18.34.IMP..000.: int NA NA 86 66 NA 86 NA 70 NA 60 ...
$ A18.34.IMP..000.: int NA NA 89 154 NA 89 NA 123 NA 97 ...
$ HH.IMP..000. : int NA NA 141 476 NA 141 NA 461 NA 434 ...
$ ISCI : chr "IT3896" "IT3896" "IT14429" "ITES15443H" ...
$ Creative : chr "Rising Costs30 (Opportunity Scholar - No Nursing)" "Rising Costs30 (Opportunity Scholar - No Nursing)" "Rising Costs30 (Opportunity Scholar - No Nursing)" "Matthew Traina (B. EECT/A. CEET) :60 (no loc)" ...
$ Program : chr "NUVO CINEMA" "NUVO CINEMA" "BONUS" "Law & Order: SVU" ...
$ Rate : chr "$0.00" "$0.00" "$0.00" "$0.00" ...
1 个解决方案
#1
1
When a column was set as "Currency" in Excel, the values in the thousands or greater have a comma in them as well as the dollar sign prefix. For example, a value might look like $1,200.00
. The problem you were having was because you were removing the dollar signs but not the commas, so when you tried to convert to numeric
you get NA
.
当一个列在Excel中被设置为“货币”时,数千或更大的值在它们和美元符号前缀中都有一个逗号。例如,一个值可能看起来像$1,200.00。你遇到的问题是你去掉了美元符号而不是逗号,所以当你试图转换成数字时,你得到了NA。
as.numeric(c("0", "0", "1,200"))
[1] 0 0 NA
Warning message:
NAs introduced by coercion
You can remove the dollar signs and commas in one step using gsub
. I found an example of how to do this in a comment to this answer.
您可以使用gsub一步删除美元符号和逗号。我在对这个答案的评论中找到了这样做的一个例子。
as.numeric(gsub("[$,]", "", c("$0", "$0", "$1,200")))
[1] 0 0 1200
So the code that should work for your dataset is
所以应该适用于数据集的代码是
as.numeric(gsub("[$,]", "", NImp$Rate))
#1
1
When a column was set as "Currency" in Excel, the values in the thousands or greater have a comma in them as well as the dollar sign prefix. For example, a value might look like $1,200.00
. The problem you were having was because you were removing the dollar signs but not the commas, so when you tried to convert to numeric
you get NA
.
当一个列在Excel中被设置为“货币”时,数千或更大的值在它们和美元符号前缀中都有一个逗号。例如,一个值可能看起来像$1,200.00。你遇到的问题是你去掉了美元符号而不是逗号,所以当你试图转换成数字时,你得到了NA。
as.numeric(c("0", "0", "1,200"))
[1] 0 0 NA
Warning message:
NAs introduced by coercion
You can remove the dollar signs and commas in one step using gsub
. I found an example of how to do this in a comment to this answer.
您可以使用gsub一步删除美元符号和逗号。我在对这个答案的评论中找到了这样做的一个例子。
as.numeric(gsub("[$,]", "", c("$0", "$0", "$1,200")))
[1] 0 0 1200
So the code that should work for your dataset is
所以应该适用于数据集的代码是
as.numeric(gsub("[$,]", "", NImp$Rate))