向大数据帧添加索引(数字ID)列[复制]

时间:2022-04-18 16:32:57

This question already has an answer here:

这个问题已经有了答案:

I have a read large csv file into a data frame. Data in the csv file are from multiple web sites representing user information. For example here is the structure of the data frame.

我有一个读取大的csv文件到一个数据帧。csv文件中的数据来自表示用户信息的多个web站点。例如,这里是数据帧的结构。

user_id, number_of_logins, number_of_images, web
001, 34, 3, aa.com
002, 4, 4, aa.com
034, 3, 3, aa.com
001, 12, 4, bb.com
002, 1, 3, bb.com
034, 2, 2, cc.com

as you can see once I bring the data into the data frame user_id is no longer a unique id and this causes all the analysis. I am trying to add another columns prior to user_id which is something like "generated_uid" and pretty much use the index of the data.frame to be filled by that column. What's the best way to accomplish this.

正如您可以看到的,一旦我将数据导入到数据框架中,user_id不再是唯一的id,这将导致所有的分析。我试图在user_id之前添加另一个列,user_id类似于“generated_uid”,并且使用该列要填充的data.frame的索引。最好的办法是什么?

4 个解决方案

#1


88  

You can add a sequence of numbers very easily with

你可以很容易地添加一个数字序列

data$ID <- seq.int(nrow(data))

Of course it will have no real meaning so it might not be of use in analysis.

当然,它将没有真正的意义,所以它可能在分析中没有用处。

If you are already using library(tidyverse), you can use

如果您已经在使用library(tidyverse),您可以使用它

data <- tibble::rowid_to_column(data, "ID")

#2


7  

Using alternative dplyr package:

使用替代dplyr包:

library("dplyr") # or library("tidyverse")

df <- df %>% mutate(id = row_number())

#3


5  

Well, if I understand you correctly. You can do something like the following.

如果我没听错的话。您可以执行如下操作。

To show it, I first create a data.frame with your example

要显示它,我首先用您的示例创建一个data.frame

df <- 
scan(what = character(), sep = ",", text =
"001, 34, 3, aa.com
002, 4, 4, aa.com
034, 3, 3, aa.com
001, 12, 4, bb.com
002, 1, 3, bb.com
034, 2, 2, cc.com")

df <- as.data.frame(matrix(df, 6, 4, byrow = TRUE))
colnames(df) <- c("user_id", "number_of_logins", "number_of_images", "web")  

You can then run one of the following lines to add a column (at the end of the data.frame) with the row number as the generated user id. The second lines simply adds leading zeros.

然后,您可以运行以下行之一,以行号作为生成的用户id添加列(在data.frame的末尾)。

df$generated_uid  <- 1:nrow(df)
df$generated_uid2 <- sprintf("%03d", 1:nrow(df))

If you absolutely want the generated user id to be the first column, you can add the column like so:

如果您绝对希望生成的用户id是第一列,可以这样添加列:

df <- cbind("generated_uid3" = sprintf("%03d", 1:nrow(df)), df)

or simply rearrage the columns.

或者只是重新排列列。

#4


3  

If your data.frame is a data.table, you can use special symbol .I:

如果您的数据。frame是一个数据。表,你可以用特殊符号。

data[, ID := .I]

#1


88  

You can add a sequence of numbers very easily with

你可以很容易地添加一个数字序列

data$ID <- seq.int(nrow(data))

Of course it will have no real meaning so it might not be of use in analysis.

当然,它将没有真正的意义,所以它可能在分析中没有用处。

If you are already using library(tidyverse), you can use

如果您已经在使用library(tidyverse),您可以使用它

data <- tibble::rowid_to_column(data, "ID")

#2


7  

Using alternative dplyr package:

使用替代dplyr包:

library("dplyr") # or library("tidyverse")

df <- df %>% mutate(id = row_number())

#3


5  

Well, if I understand you correctly. You can do something like the following.

如果我没听错的话。您可以执行如下操作。

To show it, I first create a data.frame with your example

要显示它,我首先用您的示例创建一个data.frame

df <- 
scan(what = character(), sep = ",", text =
"001, 34, 3, aa.com
002, 4, 4, aa.com
034, 3, 3, aa.com
001, 12, 4, bb.com
002, 1, 3, bb.com
034, 2, 2, cc.com")

df <- as.data.frame(matrix(df, 6, 4, byrow = TRUE))
colnames(df) <- c("user_id", "number_of_logins", "number_of_images", "web")  

You can then run one of the following lines to add a column (at the end of the data.frame) with the row number as the generated user id. The second lines simply adds leading zeros.

然后,您可以运行以下行之一,以行号作为生成的用户id添加列(在data.frame的末尾)。

df$generated_uid  <- 1:nrow(df)
df$generated_uid2 <- sprintf("%03d", 1:nrow(df))

If you absolutely want the generated user id to be the first column, you can add the column like so:

如果您绝对希望生成的用户id是第一列,可以这样添加列:

df <- cbind("generated_uid3" = sprintf("%03d", 1:nrow(df)), df)

or simply rearrage the columns.

或者只是重新排列列。

#4


3  

If your data.frame is a data.table, you can use special symbol .I:

如果您的数据。frame是一个数据。表,你可以用特殊符号。

data[, ID := .I]