【文件属性】:
文件名称:Getting Started with Storm
文件大小:5.65MB
文件格式:PDF
更新时间:2017-06-14 11:01:52
Storm 入门 分布式计算
Storm is a distributed, reliable, fault-tolerant system for processing streams of data.
The work is delegated to different types of components that are each responsible for a
simple specific processing task. The input stream of a Storm cluster is handled by a
component called a spout. The spout passes the data to a component called a bolt,
which transforms it in some way. A bolt either persists the data in some sort of storage,
or passes it to some other bolt. You can imagine a Storm cluster as a chain of bolt
components that each make some kind of transformation on the data exposed by the
spout.
To illustrate this concept, here’s a simple example. Last night I was watching the news
when the announcers started talking about politicians and their positions on various
topics. They kept repeating different names, and I wondered if each name was mentioned
an equal number of times, or if there was a bias in the number of mentions.
Imagine the subtitles of what the announcers were saying as your input stream of data.
You could have a spout that reads this input from a file (or a socket, via HTTP, or some
other method). As lines of text arrive, the spout hands them to a bolt that separates
lines of text into words. This stream of words is passed to another bolt that compares
each word to a predefined list of politician’s names. With each match, the second bolt
increases a counter for that name in a database. Whenever you want to see the results,
you just query that database, which is updated in real time as data arrives. The arrangement
of all the components (spouts and bolts) and their connections is called a
topology
网友评论
- storm相关的书籍太少了
- 不错的一本书,收藏了!