I'm quite new to R and I'm trying to write a function that normalizes my data in diffrent dataframes.
我是R的新手,我正在尝试编写一个函数来规范我的数据在不同的数据帧中。
The normalization process is quite easy, I just divide the numbers I want to normalize by the population size for each object (that is stored in the table population). To know which object relates to one and another I tried to use IDs that are stored in each dataframe in the first column.
规范化过程非常简单,我只是将我想要的数字除以每个对象的种群大小(存储在表格总体中)。为了知道哪个对象与另一个对象有关,我尝试使用存储在第一列中每个数据帧中的ID。
I thought to do so because some objects that are in the population dataframe have no corresponding objects in the dataframes to be normalized, as to say, the dataframes sometimes have lesser objects.
我想是这样做的,因为人口数据框中的某些对象在数据框中没有相应的对象要进行规范化,比如说,数据框有时会有较少的对象。
Normally one would built up a relational database (which I tried) but it didn't worked out for me that way. So I tried to related the objects within the function but the function didn't work. Maybe someone of you has experience with this and can help me.
通常情况下,人们会建立一个关系数据库(我试过),但这对我来说没有用。所以我试图将函数中的对象关联起来,但函数不起作用。也许你们中的某个人有这方面的经验,可以帮助我。
so my attempt to write this function was:
所以我写这个函数的尝试是:
# Load Tables
# Agriculture, Annual Crops
table.annual.crops <-read.table ("C:\\Users\\etc", header=T,sep=";")
# Agriculture, Bianual and Perrenial Crops
table.bianual.crops <-read.table ("C:\\Users\\etc", header=T,sep=";")
# Fishery
table.fishery <-read.table ("C:\\Users\\etc", header=T,sep=";")
# Population per Municipality
table.population <-read.table ("C:\\Users\\etc", header=T,sep=";")
# attach data
attach(table.annual.crops)
attach(table.bianual.crops)
attach(table.fishery)
attach(table.population)
# Create a function to normalize data
# Objects should be related by their ID in the first column
# Values to be normalized and the population appear in the second column
funktion.norm.percapita<-function (x,y){if(x[,1]==y[,1]){x[,2]/y[,2]}else{return("0")}}
# execute the function
funktion.norm.percapita(table.annual.crops,table.population)
1 个解决方案
#1
5
Lets start with the attach steps... why? Its usually unecessary and can get you into trouble! Especially since both your population data.frame and your crops data.frame have Geocode as a column!
让我们从附加步骤开始......为什么?它通常是不必要的,可以让你陷入困境!特别是因为你的人口data.frame和你的庄稼data.frame都有Geocode作为一列!
as suggested in the comments, you can use merge
. This will by default combine data.frames using columns of the same name. You can specify which columns on which to merge with the by
parameters.
如评论中所建议,您可以使用合并。默认情况下,这将使用相同名称的列组合data.frames。您可以指定要与by参数合并的列。
dat <- merge(table.annual.crops, table.population)
dat$crop.norm <- dat$CropValue / dat$Population
The reason your function isn't working? Look at the results of your if
statemnt.
你的功能不起作用的原因?看看你的if statemnt的结果。
table.annual.crops[,1] == table.population[,1]
Gives a vector of booleans that will recycle the shorter vector. If your data is quite large (on the order of millions of rows) the merge
function can be slow. if this is the case, take a look at the data.table
package and use its merge function instead.
给出一个可以回收较短矢量的布尔矢量。如果您的数据非常大(大约数百万行),则合并功能可能会很慢。如果是这种情况,请查看data.table包并使用其合并函数。
#1
5
Lets start with the attach steps... why? Its usually unecessary and can get you into trouble! Especially since both your population data.frame and your crops data.frame have Geocode as a column!
让我们从附加步骤开始......为什么?它通常是不必要的,可以让你陷入困境!特别是因为你的人口data.frame和你的庄稼data.frame都有Geocode作为一列!
as suggested in the comments, you can use merge
. This will by default combine data.frames using columns of the same name. You can specify which columns on which to merge with the by
parameters.
如评论中所建议,您可以使用合并。默认情况下,这将使用相同名称的列组合data.frames。您可以指定要与by参数合并的列。
dat <- merge(table.annual.crops, table.population)
dat$crop.norm <- dat$CropValue / dat$Population
The reason your function isn't working? Look at the results of your if
statemnt.
你的功能不起作用的原因?看看你的if statemnt的结果。
table.annual.crops[,1] == table.population[,1]
Gives a vector of booleans that will recycle the shorter vector. If your data is quite large (on the order of millions of rows) the merge
function can be slow. if this is the case, take a look at the data.table
package and use its merge function instead.
给出一个可以回收较短矢量的布尔矢量。如果您的数据非常大(大约数百万行),则合并功能可能会很慢。如果是这种情况,请查看data.table包并使用其合并函数。