This question already has an answer here:
这个问题已经有了答案:
- what you can do with data.frame that you can't in data.table 1 answer
- 你能用数据。frame做什么?表1的答案
Apparently in my last question I demonstrated confusion between data.frame
and data.table
. Admittedly, I didn't realize there was a distinction.
显然,在上一个问题中,我展示了data.frame和data.table之间的混淆。诚然,我没有意识到这其中有区别。
So I read the help for each but in practical, everyday terms, what is the difference, what are the implications and what are each used for that would help guide me to their appropriate usage?
所以我阅读了每一个的帮助但是在实际的,日常用语中,有什么区别,有什么含义,每一个都用来帮助我找到它们的正确用法?
3 个解决方案
#1
48
While this is a broad question, if someone is new to R
this can be confusing and the distinction can get lost.
虽然这是一个宽泛的问题,但如果有人是新手,这可能会让人感到困惑,而且这种区别可能会丢失。
All data.table
s are also data.frame
s. Loosely speaking, you can think of data.tables as data.frames with extra features.
所有数据。表也data.frames。粗略地说,您可以考虑数据。数据表,具有额外功能的框架。
data.frame
is part of base R
.
数据。帧是基数R的一部分。
data.table
is a package that extends data.frames
. Two of its most notable features are speed and cleaner syntax.
数据。表是扩展data.frame的包。它最显著的两个特性是速度和更简洁的语法。
However, that syntax sugar is different from the standard R syntax for data.frame while being hard for the untrained eye to distinguish at a glance. Therefore, if you read a code snippet and there is no other context to indicate you are working with data.tables and try to apply the code to a data.frame it may fail or produce unexpected results. (a clear giveaway that you are working with d.t's, besides the library
/require
call is the presence of the assignment operator :=
which is unique to d.t)
然而,这个语法sugar与data.frame的标准R语法不同,而未经训练的眼睛很难一眼就分辨出来。因此,如果您读了一个代码片段,并且没有其他上下文表明您正在处理数据。表并尝试将代码应用到数据。frame它可能会失败或产生意想不到的结果。很明显,你在和d一起工作。除了库/require调用外,还有赋值操作符:=,它是d.t独有的)
With all that being said, I think it is hard to actually appreciate the beauty of data.table
without experiencing the shortcomings of data.frame
. (for example, see the first 3 bullet points of @eddi's answer). In other words, I would very much suggest learning how to work with and manipulate data.frames
first then move on to data.table
s.
尽管如此,我认为很难真正欣赏数据的美。表中没有出现数据的缺点。(例如,请参见@eddi的答案的前三个要点)。换句话说,我非常建议学习如何使用和操作data.frame,然后转向data.tables。
#2
28
A few differences in my day to day life that come to mind (in no particular order):
我的日常生活中出现的一些不同之处(没有特定的顺序):
- not having to specify the
data.table
name over and over (leading to clumsy syntax and silly mistakes) in expressions (on the flip side I sometimes miss the TAB-completion of names) - 不需要指定数据。表名一遍又一遍(导致笨拙的语法和愚蠢的错误)在表达式中(在另一方面,我有时会漏掉名称的表补)
- much faster and very intuitive
by
operations - 操作非常快速和直观
- no more frantically hitting Ctrl-C after typing
df
, forgetting how largedf
was (also leading to almost never usinghead
) - 输入df后不再疯狂地按Ctrl-C,忘记了df有多大(也导致了几乎不使用head)
- faster and better file reading with
fread
- 使用fread可以更快更好地读取文件
- the package also provides a number of other utility functions, like
%between%
orrbindlist
that make life better - 该包还提供了许多其他实用功能,比如%之间的%或rbindlist,它们可以使生活更美好
- faster everything else, since a lot of
data.frame
operations copy the entire thing needlessly - 其他的东西都快了,因为很多数据。框架操作不必要地复制整个东西。
#3
7
They are similar. Data frames are lists of vectors of equal length while data tables (data.table
) is an inheritance of data frames. Therefore data tables are data frames but data frames are not necessarily data tables. The data tables package and function were written to enhance the speed of indexing, ordered joins, assignment, grouping and listing columns (etc.).
他们是相似的。数据帧是长度相等的向量的列表,而数据表(Data .table)是数据帧的继承。因此,数据表是数据帧,但数据帧不一定是数据表。编写数据表包和函数是为了提高索引、有序连接、分配、分组和列出列(等等)的速度。
See http://datatable.r-forge.r-project.org/datatable-intro.pdf for more information.
有关更多信息,请参见http://datatable.r-forge.r-project.org/datatable-intro.pdf。
#1
48
While this is a broad question, if someone is new to R
this can be confusing and the distinction can get lost.
虽然这是一个宽泛的问题,但如果有人是新手,这可能会让人感到困惑,而且这种区别可能会丢失。
All data.table
s are also data.frame
s. Loosely speaking, you can think of data.tables as data.frames with extra features.
所有数据。表也data.frames。粗略地说,您可以考虑数据。数据表,具有额外功能的框架。
data.frame
is part of base R
.
数据。帧是基数R的一部分。
data.table
is a package that extends data.frames
. Two of its most notable features are speed and cleaner syntax.
数据。表是扩展data.frame的包。它最显著的两个特性是速度和更简洁的语法。
However, that syntax sugar is different from the standard R syntax for data.frame while being hard for the untrained eye to distinguish at a glance. Therefore, if you read a code snippet and there is no other context to indicate you are working with data.tables and try to apply the code to a data.frame it may fail or produce unexpected results. (a clear giveaway that you are working with d.t's, besides the library
/require
call is the presence of the assignment operator :=
which is unique to d.t)
然而,这个语法sugar与data.frame的标准R语法不同,而未经训练的眼睛很难一眼就分辨出来。因此,如果您读了一个代码片段,并且没有其他上下文表明您正在处理数据。表并尝试将代码应用到数据。frame它可能会失败或产生意想不到的结果。很明显,你在和d一起工作。除了库/require调用外,还有赋值操作符:=,它是d.t独有的)
With all that being said, I think it is hard to actually appreciate the beauty of data.table
without experiencing the shortcomings of data.frame
. (for example, see the first 3 bullet points of @eddi's answer). In other words, I would very much suggest learning how to work with and manipulate data.frames
first then move on to data.table
s.
尽管如此,我认为很难真正欣赏数据的美。表中没有出现数据的缺点。(例如,请参见@eddi的答案的前三个要点)。换句话说,我非常建议学习如何使用和操作data.frame,然后转向data.tables。
#2
28
A few differences in my day to day life that come to mind (in no particular order):
我的日常生活中出现的一些不同之处(没有特定的顺序):
- not having to specify the
data.table
name over and over (leading to clumsy syntax and silly mistakes) in expressions (on the flip side I sometimes miss the TAB-completion of names) - 不需要指定数据。表名一遍又一遍(导致笨拙的语法和愚蠢的错误)在表达式中(在另一方面,我有时会漏掉名称的表补)
- much faster and very intuitive
by
operations - 操作非常快速和直观
- no more frantically hitting Ctrl-C after typing
df
, forgetting how largedf
was (also leading to almost never usinghead
) - 输入df后不再疯狂地按Ctrl-C,忘记了df有多大(也导致了几乎不使用head)
- faster and better file reading with
fread
- 使用fread可以更快更好地读取文件
- the package also provides a number of other utility functions, like
%between%
orrbindlist
that make life better - 该包还提供了许多其他实用功能,比如%之间的%或rbindlist,它们可以使生活更美好
- faster everything else, since a lot of
data.frame
operations copy the entire thing needlessly - 其他的东西都快了,因为很多数据。框架操作不必要地复制整个东西。
#3
7
They are similar. Data frames are lists of vectors of equal length while data tables (data.table
) is an inheritance of data frames. Therefore data tables are data frames but data frames are not necessarily data tables. The data tables package and function were written to enhance the speed of indexing, ordered joins, assignment, grouping and listing columns (etc.).
他们是相似的。数据帧是长度相等的向量的列表,而数据表(Data .table)是数据帧的继承。因此,数据表是数据帧,但数据帧不一定是数据表。编写数据表包和函数是为了提高索引、有序连接、分配、分组和列出列(等等)的速度。
See http://datatable.r-forge.r-project.org/datatable-intro.pdf for more information.
有关更多信息,请参见http://datatable.r-forge.r-project.org/datatable-intro.pdf。