I wanna create a million double column tables.. well I tried to create using java which took some 100 mbs of data converted to 7Gbs and took 20hrs to complete it... I am using postgre sql before which I tried mysql, mysql is even worse .. Is there any way to create this much amount of tables using less space and time? will horizontal partitioning work well ?
我想创建一个百万双列表..好吧我试图创建使用java,花了大约100 mbs的数据转换为7Gbs并花了20小时来完成它...我使用postgre sql之前我尝试了mysql,mysql甚至更糟糕..有没有办法用更少的空间和时间创建这么多的表?水平分区会很好吗?
I am trying to index RDF data for fast execution, Idea is to index rdf data using rdbms and transform sparql query to sql query, well RDF is collection of resources in form of triples subject, predicate, object, existing methods use predicate tables means, for each predicate, subject and object are stored, amount of predicates are very less as compared to other 2. So querying requires joining of these predicate tables in order to get results which are of order of 100mbs in flat files.I was trying of creating subject tables 4 fast execution
我试图索引RDF数据以便快速执行,Idea是使用rdbms索引rdf数据并将sparql查询转换为sql查询,以及RDF是三元组主题,谓词,对象,现有方法使用谓词表方式的资源集合,对于每个谓词,主题和对象都存储,谓词的数量与其他谓词相比要少得多。因此,查询需要连接这些谓词表,以便在平面文件中获得大约100mbs的结果。我试图创建主题表4快速执行
2 个解决方案
#1
5
If you need a million tables in your database, you're doing it wrong.
如果您的数据库中需要一百万个表,那么您做错了。
Tables are intended to represent structurally and conceptually different data. And I refuse to believe that you're operating with a million different concepts in your application.
表旨在表示结构上和概念上不同的数据。而且我拒绝相信您在应用程序中使用了一百万种不同的概念。
Sometimes, beginners believe they should create a table per user, for example. But "a user" is one concept, and you store the same information for each user (name, email, username, password, for example), so it ought to be one table, where each user is just a separate row.
有时,初学者认为他们应该为每个用户创建一个表格。但“用户”是一个概念,并且您为每个用户存储相同的信息(例如,名称,电子邮件,用户名,密码),因此它应该是一个表,其中每个用户只是一个单独的行。
It sounds like you're making a similar mistake, perhaps not with users, but with some other abstraction which you have a lot of instances of. Each instance should be a row in one single table.
这听起来像是你犯了一个类似的错误,也许不是用户,而是有一些其他的抽象,你有很多实例。每个实例应该是一个表中的一行。
If you describe to us what it is you're trying to store in a database, we can almost certainly help you figure out how it should be mapped to tables.
如果你向我们描述你试图存储在数据库中的内容,我们几乎可以肯定地帮助你弄清楚它应该如何映射到表格。
Edit
after reading your comments (which should really be edited into the question itself), here are my thoughts:
阅读你的评论后编辑(这应该真正编辑成问题本身),这是我的想法:
If all the data is structured the same way (as triples), you could simply store everything in a single table with three columns, and then add the necessary indexes for efficient lookups.
如果所有数据的结构方式相同(如三元组),则只需将所有数据存储在具有三列的单个表中,然后添加必要的索引以进行高效查找。
If all the predicates are known in advance, you could make a table per predicate, but I'm not really sure how much sense that would make, even.
如果事先知道所有谓词,你可以为每个谓词创建一个表,但是我不确定它会产生多大的意义,甚至。
The cleanest option would probably be to have 4 tables:(id, subject)
, (id, predicate)
, (id, object)
,(subjectid, predicateid, objectid)
.
最干净的选择可能是有4个表:(id,subject),(id,谓词),(id,object),(subjectid,predicateid,objectid)。
#2
1
Database tables use up quite a bit of space for managing their indexes, schema and reserving disc space.
数据库表占用了相当多的空间来管理索引,模式和保留磁盘空间。
In most cases you'll be better off with a single table that has 20 million rows than you will having a million tables with 20 rows.
在大多数情况下,使用一个拥有2000万行的表比使用一百万个包含20行的表更好。
If the 20 million row approach got too big you could then use vertical partitioning to make it perform better.
如果2000万行的方法太大,那么你可以使用垂直分区来使其表现更好。
I do think you're mainly going to succeed in giving Stack overflow users a mass annurism trying to work out why you need to do what you're asking :)
我认为你主要是成功地给Stack溢出用户一个大规模的年度主义试图解决你为什么需要做你所要求的:)
#1
5
If you need a million tables in your database, you're doing it wrong.
如果您的数据库中需要一百万个表,那么您做错了。
Tables are intended to represent structurally and conceptually different data. And I refuse to believe that you're operating with a million different concepts in your application.
表旨在表示结构上和概念上不同的数据。而且我拒绝相信您在应用程序中使用了一百万种不同的概念。
Sometimes, beginners believe they should create a table per user, for example. But "a user" is one concept, and you store the same information for each user (name, email, username, password, for example), so it ought to be one table, where each user is just a separate row.
有时,初学者认为他们应该为每个用户创建一个表格。但“用户”是一个概念,并且您为每个用户存储相同的信息(例如,名称,电子邮件,用户名,密码),因此它应该是一个表,其中每个用户只是一个单独的行。
It sounds like you're making a similar mistake, perhaps not with users, but with some other abstraction which you have a lot of instances of. Each instance should be a row in one single table.
这听起来像是你犯了一个类似的错误,也许不是用户,而是有一些其他的抽象,你有很多实例。每个实例应该是一个表中的一行。
If you describe to us what it is you're trying to store in a database, we can almost certainly help you figure out how it should be mapped to tables.
如果你向我们描述你试图存储在数据库中的内容,我们几乎可以肯定地帮助你弄清楚它应该如何映射到表格。
Edit
after reading your comments (which should really be edited into the question itself), here are my thoughts:
阅读你的评论后编辑(这应该真正编辑成问题本身),这是我的想法:
If all the data is structured the same way (as triples), you could simply store everything in a single table with three columns, and then add the necessary indexes for efficient lookups.
如果所有数据的结构方式相同(如三元组),则只需将所有数据存储在具有三列的单个表中,然后添加必要的索引以进行高效查找。
If all the predicates are known in advance, you could make a table per predicate, but I'm not really sure how much sense that would make, even.
如果事先知道所有谓词,你可以为每个谓词创建一个表,但是我不确定它会产生多大的意义,甚至。
The cleanest option would probably be to have 4 tables:(id, subject)
, (id, predicate)
, (id, object)
,(subjectid, predicateid, objectid)
.
最干净的选择可能是有4个表:(id,subject),(id,谓词),(id,object),(subjectid,predicateid,objectid)。
#2
1
Database tables use up quite a bit of space for managing their indexes, schema and reserving disc space.
数据库表占用了相当多的空间来管理索引,模式和保留磁盘空间。
In most cases you'll be better off with a single table that has 20 million rows than you will having a million tables with 20 rows.
在大多数情况下,使用一个拥有2000万行的表比使用一百万个包含20行的表更好。
If the 20 million row approach got too big you could then use vertical partitioning to make it perform better.
如果2000万行的方法太大,那么你可以使用垂直分区来使其表现更好。
I do think you're mainly going to succeed in giving Stack overflow users a mass annurism trying to work out why you need to do what you're asking :)
我认为你主要是成功地给Stack溢出用户一个大规模的年度主义试图解决你为什么需要做你所要求的:)