Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
Storm是免费开源的分布式实时计算框架。Hadoop用于对数据的批处理,而Storm可以可靠地处理海量的流式数据。Storm是简单易用的,可以与多种编程语言结合使用,学习和使用storm的过程是快乐的!
Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
Storm有很多使用场景:实时分析,线上机器学习,持续计算,分布式RPC,ETL等等。Storm单个节点每秒可以处理百万级别的tuple。Storm是可靠的、易用的、可扩展的、高容错的。
Storm integrates with the queueing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. Read more in the tutorial.
Storm集成了队列和数据库技术。一个Storm topology可以以任意复杂的方式来处理数据流,并且可以在计算的过程中可以对数据流重新分区。