I wanna create a million double column tables.. well I tried to create using java which took some 100 mbs of data converted to 7Gbs and took 20hrs to complete it... I am using postgre sql before which I tried mysql, mysql is even worse .. Is there any way to create this much amount of tables using less space and time? will horizontal partitioning work well ?
我想创建一个百万双列表..好吧我试图创建使用java,花了大约100 mbs的数据转换为7Gbs并花了20小时来完成它...我使用postgre sql之前我尝试了mysql,mysql甚至更糟糕..有没有办法用更少的空间和时间创建这么多的表?水平分区会很好吗?
I am trying to index RDF data for fast execution, Idea is to index rdf data using rdbms and transform sparql query to sql query, well RDF is collection of resources in form of triples subject, predicate, object, existing methods use predicate tables means, for each predicate, subject and object are stored, amount of predicates are very less as compared to other 2. So querying requires joining of these predicate tables in order to get results which are of order of 100mbs in flat files.I was trying of creating subject tables 4 fast execution
2 个解决方案
If you need a million tables in your database, you're doing it wrong.
Tables are intended to represent structurally and conceptually different data. And I refuse to believe that you're operating with a million different concepts in your application.
Sometimes, beginners believe they should create a table per user, for example. But "a user" is one concept, and you store the same information for each user (name, email, username, password, for example), so it ought to be one table, where each user is just a separate row.
It sounds like you're making a similar mistake, perhaps not with users, but with some other abstraction which you have a lot of instances of. Each instance should be a row in one single table.
If you describe to us what it is you're trying to store in a database, we can almost certainly help you figure out how it should be mapped to tables.
after reading your comments (which should really be edited into the question itself), here are my thoughts:
If all the data is structured the same way (as triples), you could simply store everything in a single table with three columns, and then add the necessary indexes for efficient lookups.
If all the predicates are known in advance, you could make a table per predicate, but I'm not really sure how much sense that would make, even.
The cleanest option would probably be to have 4 tables:(id, subject)
, (id, predicate)
, (id, object)
,(subjectid, predicateid, objectid)
Database tables use up quite a bit of space for managing their indexes, schema and reserving disc space.
In most cases you'll be better off with a single table that has 20 million rows than you will having a million tables with 20 rows.
If the 20 million row approach got too big you could then use vertical partitioning to make it perform better.
I do think you're mainly going to succeed in giving Stack overflow users a mass annurism trying to work out why you need to do what you're asking :)
If you need a million tables in your database, you're doing it wrong.
Tables are intended to represent structurally and conceptually different data. And I refuse to believe that you're operating with a million different concepts in your application.
Sometimes, beginners believe they should create a table per user, for example. But "a user" is one concept, and you store the same information for each user (name, email, username, password, for example), so it ought to be one table, where each user is just a separate row.
It sounds like you're making a similar mistake, perhaps not with users, but with some other abstraction which you have a lot of instances of. Each instance should be a row in one single table.
If you describe to us what it is you're trying to store in a database, we can almost certainly help you figure out how it should be mapped to tables.
after reading your comments (which should really be edited into the question itself), here are my thoughts:
If all the data is structured the same way (as triples), you could simply store everything in a single table with three columns, and then add the necessary indexes for efficient lookups.
If all the predicates are known in advance, you could make a table per predicate, but I'm not really sure how much sense that would make, even.
The cleanest option would probably be to have 4 tables:(id, subject)
, (id, predicate)
, (id, object)
,(subjectid, predicateid, objectid)
Database tables use up quite a bit of space for managing their indexes, schema and reserving disc space.
In most cases you'll be better off with a single table that has 20 million rows than you will having a million tables with 20 rows.
If the 20 million row approach got too big you could then use vertical partitioning to make it perform better.
I do think you're mainly going to succeed in giving Stack overflow users a mass annurism trying to work out why you need to do what you're asking :)