This question already has an answer here:
这个问题已经有了答案:
- Create an ID (row number) column 5 answers
- 创建一个ID(行号)列5的答案
I have a read large csv file into a data frame. Data in the csv file are from multiple web sites representing user information. For example here is the structure of the data frame.
我有一个读取大的csv文件到一个数据帧。csv文件中的数据来自表示用户信息的多个web站点。例如,这里是数据帧的结构。
user_id, number_of_logins, number_of_images, web
001, 34, 3, aa.com
002, 4, 4, aa.com
034, 3, 3, aa.com
001, 12, 4, bb.com
002, 1, 3, bb.com
034, 2, 2, cc.com
as you can see once I bring the data into the data frame user_id is no longer a unique id and this causes all the analysis. I am trying to add another columns prior to user_id
which is something like "generated_uid"
and pretty much use the index of the data.frame
to be filled by that column. What's the best way to accomplish this.
正如您可以看到的,一旦我将数据导入到数据框架中,user_id不再是唯一的id,这将导致所有的分析。我试图在user_id之前添加另一个列,user_id类似于“generated_uid”,并且使用该列要填充的data.frame的索引。最好的办法是什么?
4 个解决方案
#1
88
You can add a sequence of numbers very easily with
你可以很容易地添加一个数字序列
data$ID <- seq.int(nrow(data))
Of course it will have no real meaning so it might not be of use in analysis.
当然,它将没有真正的意义,所以它可能在分析中没有用处。
If you are already using library(tidyverse)
, you can use
如果您已经在使用library(tidyverse),您可以使用它
data <- tibble::rowid_to_column(data, "ID")
#2
7
Using alternative dplyr package:
使用替代dplyr包:
library("dplyr") # or library("tidyverse")
df <- df %>% mutate(id = row_number())
#3
5
Well, if I understand you correctly. You can do something like the following.
如果我没听错的话。您可以执行如下操作。
To show it, I first create a data.frame
with your example
要显示它,我首先用您的示例创建一个data.frame
df <-
scan(what = character(), sep = ",", text =
"001, 34, 3, aa.com
002, 4, 4, aa.com
034, 3, 3, aa.com
001, 12, 4, bb.com
002, 1, 3, bb.com
034, 2, 2, cc.com")
df <- as.data.frame(matrix(df, 6, 4, byrow = TRUE))
colnames(df) <- c("user_id", "number_of_logins", "number_of_images", "web")
You can then run one of the following lines to add a column (at the end of the data.frame
) with the row number as the generated user id. The second lines simply adds leading zeros.
然后,您可以运行以下行之一,以行号作为生成的用户id添加列(在data.frame的末尾)。
df$generated_uid <- 1:nrow(df)
df$generated_uid2 <- sprintf("%03d", 1:nrow(df))
If you absolutely want the generated user id to be the first column, you can add the column like so:
如果您绝对希望生成的用户id是第一列,可以这样添加列:
df <- cbind("generated_uid3" = sprintf("%03d", 1:nrow(df)), df)
or simply rearrage the columns.
或者只是重新排列列。
#4
3
If your data.frame
is a data.table
, you can use special symbol .I
:
如果您的数据。frame是一个数据。表,你可以用特殊符号。
data[, ID := .I]
#1
88
You can add a sequence of numbers very easily with
你可以很容易地添加一个数字序列
data$ID <- seq.int(nrow(data))
Of course it will have no real meaning so it might not be of use in analysis.
当然,它将没有真正的意义,所以它可能在分析中没有用处。
If you are already using library(tidyverse)
, you can use
如果您已经在使用library(tidyverse),您可以使用它
data <- tibble::rowid_to_column(data, "ID")
#2
7
Using alternative dplyr package:
使用替代dplyr包:
library("dplyr") # or library("tidyverse")
df <- df %>% mutate(id = row_number())
#3
5
Well, if I understand you correctly. You can do something like the following.
如果我没听错的话。您可以执行如下操作。
To show it, I first create a data.frame
with your example
要显示它,我首先用您的示例创建一个data.frame
df <-
scan(what = character(), sep = ",", text =
"001, 34, 3, aa.com
002, 4, 4, aa.com
034, 3, 3, aa.com
001, 12, 4, bb.com
002, 1, 3, bb.com
034, 2, 2, cc.com")
df <- as.data.frame(matrix(df, 6, 4, byrow = TRUE))
colnames(df) <- c("user_id", "number_of_logins", "number_of_images", "web")
You can then run one of the following lines to add a column (at the end of the data.frame
) with the row number as the generated user id. The second lines simply adds leading zeros.
然后,您可以运行以下行之一,以行号作为生成的用户id添加列(在data.frame的末尾)。
df$generated_uid <- 1:nrow(df)
df$generated_uid2 <- sprintf("%03d", 1:nrow(df))
If you absolutely want the generated user id to be the first column, you can add the column like so:
如果您绝对希望生成的用户id是第一列,可以这样添加列:
df <- cbind("generated_uid3" = sprintf("%03d", 1:nrow(df)), df)
or simply rearrage the columns.
或者只是重新排列列。
#4
3
If your data.frame
is a data.table
, you can use special symbol .I
:
如果您的数据。frame是一个数据。表,你可以用特殊符号。
data[, ID := .I]