基于etcd的分布式配置中心
etcd docs | etcd versus other key-value stores https://etcd.io/docs/v3.4.0/learning/why/
The name “etcd” originated from two ideas, the unix “/etc” folder and “d"istributed systems. The “/etc” folder is a place to store configuration data for a single system whereas etcd stores configuration information for large scale distributed systems. Hence, a “d"istributed “/etc” is “etcd”.
etcd is designed as a general substrate for large scale distributed systems. These are systems that will never tolerate split-brain operation and are willing to sacrifice availability to achieve this end. etcd stores metadata in a consistent and fault-tolerant way. An etcd cluster is meant to provide key-value storage with best of class stability, reliability, scalability and performance.
Distributed systems use etcd as a consistent key-value store for configuration management, service discovery, and coordinating distributed work. Many organizations use etcd to implement production systems such as container schedulers, service discovery services, and distributed data storage. Common distributed patterns using etcd include leader election, distributed locks, and monitoring machine liveness.
Use cases
- Container Linux by CoreOS: Applications running on Container Linux get automatic, zero-downtime Linux kernel updates. Container Linux uses locksmith to coordinate updates. Locksmith implements a distributed semaphore over etcd to ensure only a subset of a cluster is rebooting at any given time.
- Kubernetes stores configuration data into etcd for service discovery and cluster management; etcd’s consistency is crucial for correctly scheduling and operating services. The Kubernetes API server persists cluster state into etcd. It uses etcd’s watch API to monitor the cluster and roll out critical configuration changes.
Comparison chart
Perhaps etcd already seems like a good fit, but as with all technological decisions, proceed with caution. Please note this documentation is written by the etcd team. Although the ideal is a disinterested comparison of technology and features, the authors’ expertise and biases obviously favor etcd. Use only as directed.
The table below is a handy quick reference for spotting the differences among etcd and its most popular alternatives at a glance. Further commentary and details for each column are in the sections following the table.
etcd | ZooKeeper | Consul | NewSQL (Cloud Spanner, CockroachDB, TiDB) | |
---|---|---|---|---|
Concurrency Primitives | Lock RPCs, Election RPCs, command line locks, command line elections, recipes in go | External curator recipes in Java | Native lock API | Rare, if any |
Linearizable Reads | Yes | No | Yes | Sometimes |
Multi-version Concurrency Control | Yes | No | No | Sometimes |
Transactions | Field compares, Read, Write | Version checks, Write | Field compare, Lock, Read, Write | SQL-style |
Change Notification | Historical and current key intervals | Current keys and directories | Current keys and prefixes | Triggers (sometimes) |
User permissions | Role based | ACLs | ACLs | Varies (per-table GRANT, per-database roles) |
HTTP/JSON API | Yes | No | Yes | Rarely |
Membership Reconfiguration | Yes | >3.5.0 | Yes | Yes |
Maximum reliable database size | Several gigabytes | Hundreds of megabytes (sometimes several gigabytes) | Hundreds of MBs | Terabytes+ |
Minimum read linearization latency | Network RTT | No read linearization | RTT + fsync | Clock barriers (atomic, NTP) |
ZooKeeper
ZooKeeper solves the same problem as etcd: distributed system coordination and metadata storage. However, etcd has the luxury of hindsight taken from engineering and operational experience with ZooKeeper’s design and implementation. The lessons learned from Zookeeper certainly informed etcd’s design, helping it support large scale systems like Kubernetes. The improvements etcd made over Zookeeper include:
- Dynamic cluster membership reconfiguration
- Stable read/write under high load
- A multi-version concurrency control data model
- Reliable key monitoring which never silently drop events
- Lease primitives decoupling connections from sessions
- APIs for safe distributed shared locks
Furthermore, etcd supports a wide range of languages and frameworks out of the box. Whereas Zookeeper has its own custom Jute RPC protocol, which is totally unique to Zookeeper and limits its supported language bindings, etcd’s client protocol is built from gRPC, a popular RPC framework with language bindings for go, C++, Java, and more. Likewise, gRPC can be serialized into JSON over HTTP, so even general command line utilities like curl
can talk to it. Since systems can select from a variety of choices, they are built on etcd with native tooling rather than around etcd with a single fixed set of technologies.
When considering features, support, and stability, new applications planning to use Zookeeper for a consistent key value store would do well to choose etcd instead.
Consul
Consul is an end-to-end service discovery framework. It provides built-in health checking, failure detection, and DNS services. In addition, Consul exposes a key value store with RESTful HTTP APIs. As it stands in Consul 1.0, the storage system does not scale as well as other systems like etcd or Zookeeper in key-value operations; systems requiring millions of keys will suffer from high latencies and memory pressure. The key value API is missing, most notably, multi-version keys, conditional transactions, and reliable streaming watches.
etcd and Consul solve different problems. If looking for a distributed consistent key value store, etcd is a better choice over Consul. If looking for end-to-end cluster service discovery, etcd will not have enough features; choose Kubernetes, Consul, or SmartStack.
NewSQL (Cloud Spanner, CockroachDB, TiDB)
Both etcd and NewSQL databases (e.g., Cockroach, TiDB, Google Spanner) provide strong data consistency guarantees with high availability. However, the significantly different system design parameters lead to significantly different client APIs and performance characteristics.
NewSQL databases are meant to horizontally scale across data centers. These systems typically partition data across multiple consistent replication groups (shards), potentially distant, storing data sets on the order of terabytes and above. This sort of scaling makes them poor candidates for distributed coordination as they have long latencies from waiting on clocks and expect updates with mostly localized dependency graphs. The data is organized into tables, including SQL-style query facilities with richer semantics than etcd, but at the cost of additional complexity for processing, planning, and optimizing queries.
In short, choose etcd for storing metadata or coordinating distributed applications. If storing more than a few GB of data or if full SQL queries are needed, choose a NewSQL database.
Using etcd for metadata
etcd replicates all data within a single consistent replication group. For storing up to a few GB of data with consistent ordering, this is the most efficient approach. Each modification of cluster state, which may change multiple keys, is assigned a global unique ID, called a revision in etcd, from a monotonically increasing counter for reasoning over ordering. Since there’s only a single replication group, the modification request only needs to go through the raft protocol to commit. By limiting consensus to one replication group, etcd gets distributed consistency with a simple protocol while achieving low latency and high throughput.
The replication behind etcd cannot horizontally scale because it lacks data sharding. In contrast, NewSQL databases usually shard data across multiple consistent replication groups, storing data sets on the order of terabytes and above. However, to assign each modification a global unique and increasing ID, each request must go through an additional coordination protocol among replication groups. This extra coordination step may potentially conflict on the global ID, forcing ordered requests to retry. The result is a more complicated approach with typically worse performance than etcd for strict ordering.
If an application reasons primarily about metadata or metadata ordering, such as to coordinate processes, choose etcd. If the application needs a large data store spanning multiple data centers and does not heavily depend on strong global ordering properties, choose a NewSQL database.
Using etcd for distributed coordination
etcd has distributed coordination primitives such as event watches, leases, elections, and distributed shared locks out of the box. These primitives are both maintained and supported by the etcd developers; leaving these primitives to external libraries shirks the responsibility of developing foundational distributed software, essentially leaving the system incomplete. NewSQL databases usually expect these distributed coordination primitives to be authored by third parties. Likewise, ZooKeeper famously has a separate and independent library of coordination recipes. Consul, which provides a native locking API, goes so far as to apologize that it’s “ not a bulletproof method”.
In theory, it’s possible to build these primitives atop any storage systems providing strong consistency. However, the algorithms tend to be subtle; it is easy to develop a locking algorithm that appears to work, only to suddenly break due to thundering herd and timing skew. Furthermore, other primitives supported by etcd, such as transactional memory depend on etcd’s MVCC data model; simple strong consistency is not enough.
For distributed coordination, choosing etcd can help prevent operational headaches and save engineering effort.
https://mp.weixin.qq.com/s/86LN9l1hdviquFT8gwy0oA
etcd 与 Zookeeper、Consul 等其它 kv 组件的对比
关于 etcd
本文的主角是 etcd。名称 “etcd” 源自两个想法,即 unix “/etc” 文件夹 和 “d” 分布式系统。“/etc” 文件夹是用于存储单个系统的配置数据的位置,而 etcd 用于存储大规模分布式的配置信息。因此,分配了 “d” 的 “/etc” 就是 “etcd”。
etcd 被设计为大型分布式系统的通用基板。这些大型系统需要避免脑裂,并且愿意牺牲可用性来实现此目的。etcd 以一致且容错的方式存储元数据。etcd 集群旨在提供具有稳定性、可靠性、可伸缩性和性能的键值存储。
分布式系统将 etcd 用作配置管理、服务发现和协调分布式工作的一致键值存储组件。许多组织在生产系统上使用 etcd,例如容器调度程序、服务发现服务和分布式数据存储。使用 etcd 的常见分布式模式包括领导者选举、分布式锁和监视机器活动状态等。
使用案例
CoreOS 的 Container Linux:在 Container Linux 上运行的应用程序将获得零停机时间的 Linux 内核自动更新。Container Linux 使用锁来协调更新。Locksmith 在 etcd上 实现了一个分布式信号量,以确保在任何给定时间仅集群的一个子集正在重启。
Kubernetes 将配置数据存储到 etcd 中以进行服务发现和集群管理;etcd的一致性对于容器的编排至关重要。Kubernetes API 服务器将群集状态持久保存到 etcd 中。它使用 etcd 的 watch API 监视集群并回滚关键的配置更改。
多维度对比
也许 etcd 已经看起来很合适,但是与所有技术选型一样,我们需要谨慎进行。尽管理想的情况是对技术和功能进行客观的比较,但是作者的专业知识和偏见显然倾向于etcd(实验和文档由etcd的作者编写)。
下表是一目了然的快速参考,可发现 etcd 及其最受欢迎的替代方案之间的差异。表格后面的各节中提供了每列的进一步说明和详细信息。
与 ZooKeeper
ZooKeeper 解决了与 etcd 相同的问题:分布式系统协调和元数据存储。但是, etcd 踩在前人的肩膀上,其参考了 ZooKeeper 的设计和实现经验。从 Zookeeper 汲取的经验教训无疑为 etcd 的设计提供了支撑,从而帮助其支持 Kubernetes 等大型系统。对 Zookeeper 进行的 etcd 改进包括:
动态重新配置集群成员
高负载下稳定的读写
多版本并发控制数据模型
可靠的键值监控
租期原语将 session 中的连接解耦
用于分布式共享锁的 API
此外,etcd 开箱即用地支持多种语言和框架。Zookeeper 拥有自己的自定义Jute RPC 协议,该协议对于 Zookeeper 而言是完全唯一的,并限制了其受支持的语言绑定,而 etcd 的客户端协议则是基于 gRPC 构建的,gRP 是一种流行的 RPC 框架,具有 go,C ++,Java 等语言支持。同样,gRPC 可以通过 HTTP 序列化为 JSON,因此即使是通用命令行使用程序(例如curl)也可以与之通信。由于系统可以从多种选择中进行选择,因此它们是基于具有本机工具的 etcd 构建的,而不是基于一组固定的技术围绕 etcd 构建的。
在考虑功能,支持和稳定性时,etcd 相比于 Zookeeper,更加适合用作一致性的键值存储的组件。
Consul
Consul 是一个端到端的服务发现框架。它提供内置的运行状况检查,故障检测和 DNS 服务。此外,Consul 还使用 RESTful HTTP API 公开了密钥值存储。在 Consul 1.0 中,存储系统在键值操作中无法像 etcd 或 Zookeeper 等其他组件那样扩展。数百万个键的系统将遭受高延迟和内存压力。Consul 最明显的是缺少多版本键,条件事务和可靠的流监视。
etcd 和 Consul 解决了不同的问题。如果要寻找分布式一致键值存储,那么与 Consul 相比,etcd是更好的选择。如果正在寻找端到端的集群服务发现,etcd 将没有足够的功能。可以选择 Kubernetes,Consul或 SmartStack。
NewSQL(Cloud Spanner, CockroachDB, TiDB)
etcd 和 NewSQL 数据库(例如Cockroach,TiDB,Google Spanner)都提供了具有高可用性的强大数据一致性保证。但是,不同的系统设计思路导致显著不同的客户端 API 和性能特征。
NewSQL 数据库旨在跨数据中心水平扩展。这些系统通常跨多个一致的复制组(分片)对数据进行分区,这些复制组可能相距很远,并以 TB 或更高级别存储数据集。这种缩放比例使它们成为分布式协调的较差候选者,因为它们需要很长的等待时间,并且期望使用大多数本地化的依赖拓扑进行更新。NewSQL 数据被组织成表格,包括具有比 etcd 更为丰富的语义的 SQL 样式的查询工具,但是以处理和优化查询的额外复杂性为代价。
简而言之,选择 etcd 来存储元数据或协调分布式应用程序。如果存储的数据超过数 GB,或者需要完整的 SQL 查询,请选择 NewSQL 数据库。
使用 etcd 存储元配置数据
etcd 在单个复制组中复制所有数据。对于以一致的顺序存储多达几 GB 的数据,这是最有效的方法。集群状态的每次修改(可能会更改多个键)都从一个单调递增的计数器中分配了一个全局唯一 ID(在etcd中称为修订版),以进行排序。由于只有一个复制组,因此修改请求只需通过 raft 协议提交。通过将共识限制在一个复制组中,etcd 使用简单的协议即可获得分布式一致性,同时实现低延迟和高吞吐量。
etcd 后面的复制无法水平扩展,因为它缺少数据分片。相反,NewSQL 数据库通常在多个一致的复制组之间分片数据,存储数据集的级别为 TB 或更高。但是,要为每个修改分配一个全局唯一且递增的 ID,每个请求必须通过复制组之间的附加协调协议。这个额外的协调步骤可能会在全局 ID 上发生冲突,从而强制有序的请求重试。结果是,对于严格的一致性,NewSQL 方法的性能通常比 etcd 更复杂。
如果应用程序主要是出于元数据或元数据排序的原因(例如协调流程),请选择etcd。如果应用程序需要跨多个数据中心的大型数据存储,并且在很大程度上不依赖于强大的全局排序属性,请选择 NewSQL 数据库。
使用 etcd 作为分布式协调组件
etcd 具有分布式协调原语,例如事件监视,租约,选举和开箱即用的分布式锁。这些原语由 etcd 开发人员维护和支持;将这些功能留给承担了开发基础分布式软件的外部库,实质上使系统不完整。NewSQL 数据库通常期望这些分布式协调原语由第三方编写。同样,ZooKeeper 有一个独立的协调库。提供本地锁 API 的 Consul 甚至对 “不是防弹方法” 深表歉意(1个client释放锁之后,其它client无法立刻获得锁,这可能是由于lock-delay设置引起的。)。
从理论上讲,可以在提供强一致性的任何存储系统上构建这些原语。但是,算法往往很微妙。很容易开发出一种看起来有效的锁定算法,但是由于边界和时序偏差而中断。此外,etcd 支持的其他原语(例如事务性存储器)取决于 etcd 的 MVCC 数据模型;简单的强一致性是不够的。
对于分布式协调,选择 etcd 可以帮助避免操作上的麻烦并减少工作量。