citus 官方文档很不错,资料很全,同时包含一个多租户应用的文档,所以运行下,方便学习
环境准备
使用docker-compose 运行,同时集成了graphql 引擎,很方便
- docker-compose 文件
version: '2.1'
services:
graphql-engine:
image: hasura/graphql-engine:v1.0.0-alpha26
ports:
- "8080:8080"
command: >
/bin/sh -c "
graphql-engine --database-url postgres://postgres@master/postgres serve --enable-console;
"
master:
container_name: "${COMPOSE_PROJECT_NAME:-citus}_master"
image: 'citusdata/citus:7.5.1'
ports: ["${MASTER_EXTERNAL_PORT:-5432}:5432"]
labels: ['com.citusdata.role=Master']
worker:
image: 'citusdata/citus:7.5.1'
labels: ['com.citusdata.role=Worker']
depends_on: { manager: { condition: service_healthy } }
manager:
container_name: "${COMPOSE_PROJECT_NAME:-citus}_manager"
image: 'citusdata/membership-manager:0.2.0'
volumes: ['/var/run/docker.sock:/var/run/docker.sock']
depends_on: { master: { condition: service_healthy } }
- 数据准备
curl https://examples.citusdata.com/tutorial/companies.csv > companies.csv
curl https://examples.citusdata.com/tutorial/campaigns.csv > campaigns.csv
curl https://examples.citusdata.com/tutorial/ads.csv > ads.csv
- 创建表
CREATE TABLE companies (
id bigint NOT NULL,
name text NOT NULL,
image_url text,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL
);
CREATE TABLE campaigns (
id bigint NOT NULL,
company_id bigint NOT NULL,
name text NOT NULL,
cost_model text NOT NULL,
state text NOT NULL,
monthly_budget bigint,
blacklisted_site_urls text[],
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL
);
CREATE TABLE ads (
id bigint NOT NULL,
company_id bigint NOT NULL,
campaign_id bigint NOT NULL,
name text NOT NULL,
image_url text,
target_url text,
impressions_count bigint DEFAULT 0,
clicks_count bigint DEFAULT 0,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL
);
- 添加表关系
ALTER TABLE companies ADD PRIMARY KEY (id);
ALTER TABLE campaigns ADD PRIMARY KEY (id, company_id);
ALTER TABLE ads ADD PRIMARY KEY (id, company_id);
citus 分布式处理
- 添加数据分布式表功能
很方便,就是select 语句,调用函数即可
SELECT create_distributed_table('companies', 'id');
SELECT create_distributed_table('campaigns', 'company_id');
SELECT create_distributed_table('ads', 'company_id');
- 导入数据
citus 环境起来之后就可以使用功能导入数据了
- 效果
- 一个json 查询
SELECT campaigns.id, campaigns.name, campaigns.monthly_budget,
sum(impressions_count) as total_impressions, sum(clicks_count) as total_clicks
FROM ads, campaigns
WHERE ads.company_id = campaigns.company_id
AND campaigns.company_id = 5
AND campaigns.state = 'running'
GROUP BY campaigns.id, campaigns.name, campaigns.monthly_budget
ORDER BY total_impressions, total_clicks;
- 效果
数据模型说明
实际上上面的核心是创建分布式表,使用的create_distributed_table,同时定义了,多租户的数据隔离id company_id
后边的操作都是基本的sql 操作,后边会有citus 多租户应用开发的一些好的实践介绍。
参考资料
https://docs.citusdata.com/en/v7.5/get_started/tutorial_multi_tenant.html
https://docs.citusdata.com/en/v7.5/sharding/data_modeling.html#distributing-by-tenant-id
https://github.com/hasura/graphql-on-various-pg
https://github.com/rongfengliang/citus-hasuar-graphql