One of the main highlight of the Apache Storm is that it is a fault-tolerant, fast with no “Single Point of Failure” (SPOF) distributed application. We can install Apache Storm in as many systems as needed to increase the capacity of the application.
Let’s have a look at how the Apache Storm cluster is designed and its internal architecture. The following diagram depicts the cluster design.
Apache Storm has two type of nodes, Nimbus (master node) and Supervisor (worker node). Nimbus is the central component of Apache Storm. The main job of Nimbus is to run the Storm topology. Nimbus analyzes the topology and gathers the task to be executed. Then, it will distributes the task to an available supervisor.
A supervisor will have one or more worker process. Supervisor will delegate the tasks to worker processes. Worker process will spawn as many executors as needed and run the task. Apache Storm uses an internal distributed messaging system for the communication between nimbus and supervisors.
Components | Description |
---|---|
Nimbus | Nimbus is a master node of Storm cluster. All other nodes in the cluster are called as worker nodes. Master node is responsible for distributing data among all the worker nodes, assign tasks to worker nodes and monitoring failures. |
Supervisor | The nodes that follow instructions given by the nimbus are called as Supervisors. A supervisor has multiple worker processes and it governs worker processes to complete the tasks assigned by the nimbus. |
Worker process | A worker process will execute tasks related to a specific topology. A worker process will not run a task by itself, instead it creates executors and asks them to perform a particular task. A worker process will have multiple executors. |
Executor | An executor is nothing but a single thread spawn by a worker process. An executor runs one or more tasks but only for a specific spout or bolt. |
Task | A task performs actual data processing. So, it is either a spout or a bolt. |
ZooKeeper framework |
Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate between themselves and maintaining shared data with robust synchronization techniques. Nimbus is stateless, so it depends on ZooKeeper to monitor the working node status. ZooKeeper helps the supervisor to interact with the nimbus. It is responsible to maintain the state of nimbus and supervisor. |
Storm is stateless in nature. Even though stateless nature has its own disadvantages, it actually helps Storm to process real-time data in the best possible and quickest way.
Storm is not entirely stateless though. It stores its state in Apache ZooKeeper. Since the state is available in Apache ZooKeeper, a failed nimbus can be restarted and made to work from where it left. Usually, service monitoring tools like monit will monitor Nimbus and restart it if there is any failure.
Apache Storm also have an advanced topology called Trident Topology with state maintenance and it also provides a high-level API like Pig. We will discuss all these features in the coming chapters.
译文:Apache storm主要亮点之一,它是一种容错、处理快速且没有“单点故障”(SPOF)的分布式应用程序。
我们可以根据需要安装Apache storm系统来增加应用程序的能力。
让我们看看Apache storm的集群设计和内部结构。
下图描述了集群的设计:
Apache Storm有两种类型的节点,nimbus(master node主节点)和supervisor(worker node工作节点)。
Nimbus是Apache storm的中心组成部分。
Nimbus的主要工作是运行topology。
Nimbus分析topology并且收集需要执行的任务。
然后,它将把任务分配给一个可用的supervisor。
一个supervisor有一个或多个工作进程。
Supervisor委派任务给工作进程。
工作进程会根据需要产生适量的executors去执行任务。
nimbus和supervisor之间使用Apache storm一个内部的分布式信息系统来相互沟通,传递信息。
Storm本身在本质上是无状态的。
即使无状态有缺点,实际上却是有助于storm快速高效的处理流数据。
Storm通过apache zookeeper存储其状态。
既然apache zookeeper可以保存storm集群的状态,那么失败的nimbus就可以在它停止运行的节点上重新启动。
通常,服务监控工具monit将监控nimbus,并在其失败的时候重新启动它。
Apache storm也有一个先进的topology维护状态,称为Trident Topology ,它还提供了一个像Pig那样的高水平的API。
在接下来的章节中我们将讨论这些特性。