I really like data.frames in R because you can store different types of data in one data structure and you have a lot of different methods to modify the data (add column, combine data.frames,...), it is really easy to extract a subset from the data,...
我非常喜欢data.frame in R,因为你可以在一个数据结构中存储不同类型的数据,并且你有很多不同的方法来修改数据(添加列,合并数据,…),从数据中提取一个子集非常简单,……
Is there any Java library available which have the same functionality? I'm mostly interested in storing different types of data in a matrix-like fashion and be able to extract a subset of the data.
是否有任何具有相同功能的Java库?我最感兴趣的是用矩阵样的方式存储不同类型的数据,并能够提取数据的子集。
Using a two-dimensional array in Java can provide a similar structure, but it is much more difficult to add a column and afterwards extract the top k records.
在Java中使用二维数组可以提供类似的结构,但是添加一个列和随后提取*k记录要困难得多。
6 个解决方案
#1
10
I have just open-sourced a first draft version of Paleo, a Java 8 library which offers data frames based on typed columns (including support for primitive values). Columns can be created programmatically (through a simple builder API), or imported from text file.
我刚刚开放了Paleo的初稿版本,这是一个Java 8库,它提供基于类型化列的数据帧(包括对原始值的支持)。可以通过编程方式创建列(通过一个简单的生成器API),或者从文本文件中导入。
Please refer to the README for further details.
详情请参阅自述文件。
The project is still wet from birth – I am very interested in feedback / PRs, tia!
这个项目从出生起就一直是湿的-我对反馈/ PRs很感兴趣,tia!
#2
11
I also found myself in need of a data frame structure while working in Java recently. Fortunately, after writing a very basic implementation I was able to get approval to release it as open source. You can find my implementation here: Joinery -- Data frames for Java. Contributions and feature requests are welcome.
最近在Java工作时,我也发现自己需要一个数据框架结构。幸运的是,在编写了一个非常基本的实现之后,我能够获得批准,将其作为开放源码发布。您可以在这里找到我的实现:Joinery——Java的数据框架。欢迎贡献和特性请求。
#3
11
Tablesaw (https://github.com/jtablesaw/tablesaw) is Java dataframe begun in 2015 and under active development in 2017. It's designed to be as scalable as possible without sacrificing ease-of-use. Features include filtering by rows and columns, descriptive stats, map/reduce functions, cross-tabs, plots, machine learning. Apache license.
在2015年开始的Java dataframe,在2017年的积极开发下。它被设计成尽可能的可伸缩,同时又不牺牲易用性。特性包括按行和列进行过滤、描述性统计、map/reduce函数、交叉选项卡、绘图、机器学习。Apache许可证。
In one query test it returned 500+ records from a 500,000,000 record table in 2 ms.
在一个查询测试中,它从一个500,000,000的记录表中返回了500多个记录。
It also includes a column-oriented store that is much smaller and faster than working with CSV files. Contributions, feature requests, and just-plain feedback is welcome.
它还包括一个以列为导向的存储,比使用CSV文件要小得多,也快得多。贡献、特性请求和简单的反馈是受欢迎的。
#4
5
Not being very proficient with R, but you should have a look at Guava, specifically Tables. They do not provide the exact functionality you want, but you could either extend them or their specification could help you in writing your own Collection.
不是很精通R,但是你应该看看Guava,尤其是表格。它们没有提供您想要的确切功能,但是您可以扩展它们,或者它们的规范可以帮助您编写自己的集合。
#5
3
Morpheus (http://www.zavtech.com/morpheus/docs/) provides a DataFrame analogue to that of R. It is a high performance column store data structure that enables data to sorted, sliced, grouped, and aggregated in either the row or column dimension. It also supports parallel processing for many of these operations using the Fork & Join framework internally.
Morpheus (http://www.zavtech.com/morpheus/docs/)提供了一个DataFrame类似于r的数据结构,它是一个高性能的列存储数据结构,它可以使数据在行或列维度中进行排序、切片、分组和聚合。它还支持在内部使用Fork & Join框架对许多这些操作进行并行处理。
You can easily read & write data to CSV files, databases and also a proprietary JSON format. Adapters to load data from Quandl, Google Finance and others are also available.
您可以轻松地读写数据到CSV文件、数据库和专有的JSON格式。适配器可以从Quandl、谷歌金融和其他数据中加载数据。
It has built in support for various styles of Linear Regressions, Principal Component Analysis, Linear Algebra and other types of analytics support. The feature set is still growing, but it is already a very capable framework.
它支持各种类型的线性回归、主成分分析、线性代数和其他类型的分析支持。特性集仍然在增长,但它已经是一个非常有能力的框架。
#6
0
In R we have the dataframe, in Python we have pandas, in Java: There is the Schema from the deeplearning4j
在R中,我们有dataframe,在Python中我们有熊猫,在Java中:有来自deeplearning4j的模式。
There is also a version for the data analysis of the ubiquitous iris data if you want to just get started, here
还有一个版本用于数据分析,如果你想在这里开始,就可以对无所不在的虹膜数据进行数据分析。
There are also other custom objects (from Weka, from Tensorflow that are more or less the same).
也有其他自定义对象(来自Weka,来自或多或少相同的Tensorflow)。
#1
10
I have just open-sourced a first draft version of Paleo, a Java 8 library which offers data frames based on typed columns (including support for primitive values). Columns can be created programmatically (through a simple builder API), or imported from text file.
我刚刚开放了Paleo的初稿版本,这是一个Java 8库,它提供基于类型化列的数据帧(包括对原始值的支持)。可以通过编程方式创建列(通过一个简单的生成器API),或者从文本文件中导入。
Please refer to the README for further details.
详情请参阅自述文件。
The project is still wet from birth – I am very interested in feedback / PRs, tia!
这个项目从出生起就一直是湿的-我对反馈/ PRs很感兴趣,tia!
#2
11
I also found myself in need of a data frame structure while working in Java recently. Fortunately, after writing a very basic implementation I was able to get approval to release it as open source. You can find my implementation here: Joinery -- Data frames for Java. Contributions and feature requests are welcome.
最近在Java工作时,我也发现自己需要一个数据框架结构。幸运的是,在编写了一个非常基本的实现之后,我能够获得批准,将其作为开放源码发布。您可以在这里找到我的实现:Joinery——Java的数据框架。欢迎贡献和特性请求。
#3
11
Tablesaw (https://github.com/jtablesaw/tablesaw) is Java dataframe begun in 2015 and under active development in 2017. It's designed to be as scalable as possible without sacrificing ease-of-use. Features include filtering by rows and columns, descriptive stats, map/reduce functions, cross-tabs, plots, machine learning. Apache license.
在2015年开始的Java dataframe,在2017年的积极开发下。它被设计成尽可能的可伸缩,同时又不牺牲易用性。特性包括按行和列进行过滤、描述性统计、map/reduce函数、交叉选项卡、绘图、机器学习。Apache许可证。
In one query test it returned 500+ records from a 500,000,000 record table in 2 ms.
在一个查询测试中,它从一个500,000,000的记录表中返回了500多个记录。
It also includes a column-oriented store that is much smaller and faster than working with CSV files. Contributions, feature requests, and just-plain feedback is welcome.
它还包括一个以列为导向的存储,比使用CSV文件要小得多,也快得多。贡献、特性请求和简单的反馈是受欢迎的。
#4
5
Not being very proficient with R, but you should have a look at Guava, specifically Tables. They do not provide the exact functionality you want, but you could either extend them or their specification could help you in writing your own Collection.
不是很精通R,但是你应该看看Guava,尤其是表格。它们没有提供您想要的确切功能,但是您可以扩展它们,或者它们的规范可以帮助您编写自己的集合。
#5
3
Morpheus (http://www.zavtech.com/morpheus/docs/) provides a DataFrame analogue to that of R. It is a high performance column store data structure that enables data to sorted, sliced, grouped, and aggregated in either the row or column dimension. It also supports parallel processing for many of these operations using the Fork & Join framework internally.
Morpheus (http://www.zavtech.com/morpheus/docs/)提供了一个DataFrame类似于r的数据结构,它是一个高性能的列存储数据结构,它可以使数据在行或列维度中进行排序、切片、分组和聚合。它还支持在内部使用Fork & Join框架对许多这些操作进行并行处理。
You can easily read & write data to CSV files, databases and also a proprietary JSON format. Adapters to load data from Quandl, Google Finance and others are also available.
您可以轻松地读写数据到CSV文件、数据库和专有的JSON格式。适配器可以从Quandl、谷歌金融和其他数据中加载数据。
It has built in support for various styles of Linear Regressions, Principal Component Analysis, Linear Algebra and other types of analytics support. The feature set is still growing, but it is already a very capable framework.
它支持各种类型的线性回归、主成分分析、线性代数和其他类型的分析支持。特性集仍然在增长,但它已经是一个非常有能力的框架。
#6
0
In R we have the dataframe, in Python we have pandas, in Java: There is the Schema from the deeplearning4j
在R中,我们有dataframe,在Python中我们有熊猫,在Java中:有来自deeplearning4j的模式。
There is also a version for the data analysis of the ubiquitous iris data if you want to just get started, here
还有一个版本用于数据分析,如果你想在这里开始,就可以对无所不在的虹膜数据进行数据分析。
There are also other custom objects (from Weka, from Tensorflow that are more or less the same).
也有其他自定义对象(来自Weka,来自或多或少相同的Tensorflow)。