I am doing a project, where I need to store billions of rows of unstructured history_data in a sql database (postgres) 2-3 years. The data/columns may change from day to day.
我正在做一个项目,在这个项目中,我需要在sql数据库(postgres)中存储数十亿行的非结构化历史数据。数据/列可能每天都在变化。
So example, day one the user might save {“user_id”:”2223”, “website”:”www.mywebsite.org”, “webpage”:”mysubpageName”}.
例如,第一天用户可能会保存{" user_id ": " 2223 ", " website ": " www.mywebsite.org ", " pages ": " mysubpageName "}。
And the following day {“name”:”username”, “user_id”: “2223”, “bookclub_id”:”1” }.
第二天{" name ": " username "、" user_id ": " 2223 "、" bookclub_id ": " 1 "}。
I have been doing a project earlier, where we used the classic entity key/value table model for this problem. We saved maybe up to 30 key/values pr entity. But when exceeding 70-100 mill rows, the queries began to run slower and slower (too many inner joins).
我之前做过一个项目,在这个项目中,我们使用了这个问题的经典实体键/值表模型。我们保存了最多30个key/values pr实体。但是当超过70-100行时,查询开始运行得越来越慢(太多的内部连接)。
Therefore i am wondering if I should change using the Json model in postgres. After searching the web, and reading blogs, I am really confused. What are the pro and con changing this to json in postgres?
因此,我想知道是否应该在postgres中使用Json模型进行更改。在搜索网页,阅读博客之后,我真的很困惑。在postgres中,什么是pro和con将其转换为json ?
1 个解决方案
#1
1
You can think about this in terms of query complexity. If you have an index to the json documents (maybe user_id) you can do a simple index-scan to access the whole json string very fast.
您可以从查询复杂性的角度考虑这一点。如果您有json文档的索引(可能是user_id),您可以做一个简单的索引扫描来快速访问整个json字符串。
You have to dissect it on the client side then, or you can pass it to functions in postgres, if e.g. you want to extract only data for specific values.
然后,您必须在客户端对它进行剖析,或者您可以将它传递给postgres中的函数,例如,您希望仅提取特定值的数据。
One of the most important features of postgres when dealing with json is having functional indexes. In comparison to "normal" index which index the value of a column, function indexes apply a function to a value of one (or even more) column values and index the return value. I don't know the function that extracts the value of a json string, but consider you want the user that have bookclub_id = 1. You can create an index like
在处理json时,postgres最重要的特性之一是具有函数索引。与索引列值的“普通”索引相比,函数索引将函数应用于一个(甚至更多)列值的值,并索引返回值。我不知道提取json字符串值的函数,但考虑到您想要具有bookclub_id = 1的用户。您可以创建一个类似的索引
create index idx_bookblub_id on mytable using getJsonValue("bookclub_id",mytable.jsonvalue)
Afterwards queries like
后来查询等
select * from mytable where getJsonValue("bookclub_id",mytable.jsonvalue) = 1
are lightning fast.
闪电速度很快。
#1
1
You can think about this in terms of query complexity. If you have an index to the json documents (maybe user_id) you can do a simple index-scan to access the whole json string very fast.
您可以从查询复杂性的角度考虑这一点。如果您有json文档的索引(可能是user_id),您可以做一个简单的索引扫描来快速访问整个json字符串。
You have to dissect it on the client side then, or you can pass it to functions in postgres, if e.g. you want to extract only data for specific values.
然后,您必须在客户端对它进行剖析,或者您可以将它传递给postgres中的函数,例如,您希望仅提取特定值的数据。
One of the most important features of postgres when dealing with json is having functional indexes. In comparison to "normal" index which index the value of a column, function indexes apply a function to a value of one (or even more) column values and index the return value. I don't know the function that extracts the value of a json string, but consider you want the user that have bookclub_id = 1. You can create an index like
在处理json时,postgres最重要的特性之一是具有函数索引。与索引列值的“普通”索引相比,函数索引将函数应用于一个(甚至更多)列值的值,并索引返回值。我不知道提取json字符串值的函数,但考虑到您想要具有bookclub_id = 1的用户。您可以创建一个类似的索引
create index idx_bookblub_id on mytable using getJsonValue("bookclub_id",mytable.jsonvalue)
Afterwards queries like
后来查询等
select * from mytable where getJsonValue("bookclub_id",mytable.jsonvalue) = 1
are lightning fast.
闪电速度很快。