
时间:2021-01-07 16:16:24

I have a non-computer related data logger, that collects data from the field. This data is stored as text files, and I manually lump the files together and organize them. The current format is through a csv file per year per logger. Each file is around 4,000,000 lines x 7 loggers x 5 years = a lot of data. some of the data is organized as bins item_type, item_class, item_dimension_class, and other data is more unique, such as item_weight, item_color, date_collected, and so on ...

我有一个非计算机相关的数据记录器,它从现场收集数据。这些数据存储为文本文件,我手动将文件放在一起并组织它们。当前格式是每个记录器每年通过一个csv文件。每个文件大约4,000,000行x 7个记录器x 5年=大量数据。一些数据被组织为bin,item_type,item_class,item_dimension_class,其他数据更加独特,例如item_weight,item_color,date_collected等等......

Currently, I do statistical analysis on the data using a python/numpy/matplotlib program I wrote. It works fine, but the problem is, I'm the only one who can use it, since it and the data live on my computer.

目前,我使用我编写的python / numpy / matplotlib程序对数据进行统计分析。它工作正常,但问题是,我是唯一可以使用它的人,因为它和数据存在于我的计算机上。

I'd like to publish the data on the web using a postgres db; however, I need to find or implement a statistical tool that'll take a large postgres table, and return statistical results within an adequate time frame. I'm not familiar with python for the web; however, I'm proficient with PHP on the web side, and python on the offline side.

我想使用postgres db在网上发布数据;但是,我需要找到或实施一个统计工具,它将采用一个大的postgres表,并在适当的时间范围内返回统计结果。我不熟悉网络的python;但是,我在网络方面精通PHP,在线下方面精通python。

users should be allowed to create their own histograms, data analysis. For example, a user can search for all items that are blue shipped between week x and week y, while another user can search for sort the weight distribution of all items by hour for all year long.


I was thinking of creating and indexing my own statistical tools, or automate the process somehow to emulate most queries. This seemed inefficient.


I'm looking forward to hearing your ideas



1 个解决方案



I think you can utilize your current combination(python/numpy/matplotlib) fully if the number of users are not too big. I do some similar works, and my data size a little more than 10g. Data are stored in a few sqlite files, and i use numpy to analyze data, PIL/matplotlib to generate chart files(png, gif), cherrypy as a webserver, mako as a template language.

我认为如果用户数量不是太大,你可以充分利用你当前的组合(python / numpy / matplotlib)。我做了一些类似的工作,我的数据大小超过10克。数据存储在几个sqlite文件中,我使用numpy分析数据,PIL / matplotlib生成图表文件(png,gif),cherrypy作为网络服务器,mako作为模板语言。

If you need more server/client database, then you can migrate to postgresql, but you can still fully use your current programs if you go with a python web framework, like cherrypy.

如果您需要更多服务器/客户端数据库,那么您可以迁移到postgresql,但如果您使用python Web框架(如cherrypy),您仍然可以完全使用当前程序。



I think you can utilize your current combination(python/numpy/matplotlib) fully if the number of users are not too big. I do some similar works, and my data size a little more than 10g. Data are stored in a few sqlite files, and i use numpy to analyze data, PIL/matplotlib to generate chart files(png, gif), cherrypy as a webserver, mako as a template language.

我认为如果用户数量不是太大,你可以充分利用你当前的组合(python / numpy / matplotlib)。我做了一些类似的工作,我的数据大小超过10克。数据存储在几个sqlite文件中,我使用numpy分析数据,PIL / matplotlib生成图表文件(png,gif),cherrypy作为网络服务器,mako作为模板语言。

If you need more server/client database, then you can migrate to postgresql, but you can still fully use your current programs if you go with a python web framework, like cherrypy.

如果您需要更多服务器/客户端数据库,那么您可以迁移到postgresql,但如果您使用python Web框架(如cherrypy),您仍然可以完全使用当前程序。