I want to help my friend to analyze Posts on Social Networks (Facebook, Twitter, Linkdin and etc.) as well as several weblogs and websites.
我想帮助我的朋友分析社交网络(Facebook, Twitter, Linkdin等)以及几个weblog和网站上的文章。
When it comes to the Storing the Data, I have no experience in huge data. Which one is the best for a bunch of thousand post, tweet and article per day: Database, XML file, plain text? If database, which one?
说到数据存储,我对大数据没有经验。哪一种最适合每天发表上千篇文章:数据库、XML文件、纯文本?如果数据库中,哪一个?
P.S. The language that I am going to start programming with is Python.
我要开始编程的语言是Python。
1 个解决方案
#1
2
That depends on the way you want to work with the data. If you have structured data, and want the exchange it between different programs, xml might be a good choice. If you do mass processing, plain text might be a good choice. If you want to filter the data, a database might be a good choice.
这取决于您希望使用数据的方式。如果您有结构化数据,并且希望在不同的程序之间进行交换,那么xml可能是一个不错的选择。如果进行大规模处理,纯文本可能是一个不错的选择。如果您想要过滤数据,数据库可能是一个不错的选择。
#1
2
That depends on the way you want to work with the data. If you have structured data, and want the exchange it between different programs, xml might be a good choice. If you do mass processing, plain text might be a good choice. If you want to filter the data, a database might be a good choice.
这取决于您希望使用数据的方式。如果您有结构化数据,并且希望在不同的程序之间进行交换,那么xml可能是一个不错的选择。如果进行大规模处理,纯文本可能是一个不错的选择。如果您想要过滤数据,数据库可能是一个不错的选择。