1.1 Introduction中 Kafka as a Messaging System官网剖析(博主推荐)

时间:2023-04-11 23:02:14

 不多说,直接上干货!

  一切来源于官网

http://kafka.apache.org/documentation/

1.1 Introduction中  Kafka as a Messaging System官网剖析(博主推荐)

Kafka as a Messaging System

kafka作为一个消息系统

How does Kafka's notion of streams compare to a traditional enterprise messaging system?

Kafka的流与传统企业消息系统相比的概念如何?

  Messaging traditionally has two models: queuing and publish-subscribe. In a queue, a pool of consumers may read from a server and each record goes to one of them; in publish-subscribe the record is broadcast to all consumers. Each of these two models has a strength and a weakness. The strength of queuing is that it allows you to divide up the processing of data over multiple consumer instances, which lets you scale your processing. Unfortunately, queues aren't multi-subscriber—once one process reads the data it's gone. Publish-subscribe allows you broadcast data to multiple processes, but has no way of scaling processing since every message goes to every subscriber.

传统的消息有两种模式:队列和发布订阅。 
在队列模式中,消费者池从服务器读取消息(每个消息只被其中一个读取);
发布订阅模式:消息广播给所有的消费者。这两种模式都有优缺点,
队列的优点是允许多个消费者瓜分处理数据,这样可以扩展处理。但是,队列不像多个订阅者,一旦消息者进程读取后故障了,那么消息就丢了。
而发布和订阅允许你广播数据到多个消费者,由于每个订阅者都订阅了消息,所以没办法缩放处理。

  The consumer group concept in Kafka generalizes these two concepts. As with a queue the consumer group allows you to divide up processing over a collection of processes (the members of the consumer group). As with publish-subscribe, Kafka allows you to broadcast messages to multiple consumer groups.

kafka中消费者组有两个概念:队列:消费者组(consumer group)允许同名的消费者组成员瓜分处理。发布订阅:允许你广播消息给多个消费者组(不同名)。

  The advantage of Kafka's model is that every topic has both these properties—it can scale processing and is also multi-subscriber—there is no need to choose one or the other.

  Kafka has stronger ordering guarantees than a traditional messaging system, too.

kafka有比传统的消息系统更强的顺序保证。

  A traditional queue retains records in-order on the server, and if multiple consumers consume from the queue then the server hands out records in the order they are stored. However, although the server hands out records in order, the records are delivered asynchronously to consumers, so they may arrive out of order on different consumers. This effectively means the ordering of the records is lost in the presence of parallel consumption. Messaging systems often work around this by having a notion of "exclusive consumer" that allows only one process to consume from a queue, but of course this means that there is no parallelism in processing.

传统的消息系统按顺序保存数据,如果多个消费者从队列消费,则服务器按存储的顺序发送消息,
但是,尽管服务器按顺序发送,消息异步传递到消费者,因此消息可能乱序到达消费者。
这意味着消息存在并行消费的情况,顺序就无法保证。
消息系统常常通过仅设1个消费者来解决这个问题,但是这意味着没用到并行处理。

  Kafka does it better. By having a notion of parallelism—the partition—within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. Since there are many partitions this still balances the load over many consumer instances. Note however that there cannot be more consumer instances in a consumer group than partitions.

kafka做的更好。通过并行topic的parition —— kafka提供了顺序保证和负载均衡。
每个partition仅由同一个消费者组中的一个消费者消费到。
并确保消费者是该partition的唯一消费者,并按顺序消费数据。
每个topic有多个分区,则需要对多个消费者做负载均衡,
但请注意,相同的消费者组中不能有比分区更多的消费者,否则多出的消费者一直处于空等待,不会收到消息。