HBase:权威指南(英文版)

时间:2015-09-26 03:08:49
【文件属性】:

文件名称:HBase:权威指南(英文版)

文件大小:8.59MB

文件格式:DOCX

更新时间:2015-09-26 03:08:49

HBase 权威指南 英文版

There may be many reasons that brought you here, it could be because you heard all about Hadoop and what it can do to crunch petabytes of data in a reasonable amount of time. While reading into Hadoop you found that for random access to the accumulated data there is something call HBase. Or it was the hype that is prevalent these days addressing a new kind of data storage architecture. It strives to solve large scale data problems where traditional solutions may either be too involved or cost prohibitive. A common term used in this area is NoSQL. No matter how you have arrived here, I presume you want to know and learn - like me not too long ago - how you can use HBase in your company or organization to store a virtually endless amount of data. You may have a background in relational databases theory or you want to start fresh and this "column oriented thing" is something that seems to fit your bill. You also heard that HBase can scale without much effort and that alone is reason enough to look at it since you are building the next web-scale system. I was at that point in late 2007 facing the task of storing millions of documents in a system that needed to be fault tolerant and scalable while still being maintainable by just me. I have decent skills in managing a MySQL database system and was using it to store data that would ultimately be served to our website users. This database was running on a single server, with another as a backup. The issue was that it would not be able to hold the amount of data I needed to store for this new project. I either invest into serious RDBMS scalability skills, or find something else instead. Obviously I went the latter route and since my mantra always was (and still is) "How does someone like Google do it?", I came across Hadoop. After a few attempts of using Hadoop directly I was faced with implementing a random access layer on top of it - but that problem had been solved already: in 2006 Google had published a paper called BigTable [1] and the Hadoop developers had an open-source implementation of it called HBase (the Hadoop Database). That was the answer to all my problems. Or so it seemed... What follows is a blur to me. Looking back I realize that I would have wished for this customer project to start today. HBase is now mature, nearing a 1.0 release and is used by many high profile companies, such as Facebook, Adobe, Twitter, and StumbleUpon. Mine was one of the very first clusters in production (and is still in use today!) and my use-case triggered a few very interesting issues (let me refrain from saying more). But that was to be expected betting on a 0.1x version of a community project. And I had the opportunity over the years to contribute back and stay close to the development team so that eventually I was humbled by being asked to become a full-time committer as well. I learned a lot over the last few years from my fellow HBase developers and am still learning more every day. My belief is that we are by far not at the peak of this technology and it will evolve further over the years to come. Let me pay my respect to the entire HBase community with this book which strives to cover not just the internal workings of HBase or how to get it going but more specifically how to apply it to your use-case. In fact, I strongly assume that this is why you are here right now. You want to learn how HBase can solve your problem. Let me help you trying to figure this out.


网友评论

  • 还不错 有时间可以多读读~~~~~~~~~~
  • 好东东,中文翻译版有一些瑕疵,影响理解,看完英文版顿悟,9分很值啊~
  • 比较完整的word文档版本,方便复制和格式修改,非常好。
  • 谢谢分享,学习的同时还可以学英文
  • 开启我的HBase之旅。
  • 谢谢分享,很不错,很完整的文档!
  • 深入详细地解剖HBASE的内部,非常好的书。但HBASE是一个在不断更新的项目,所以其中有些内容只能用来参考一下,因为这些内容已经被更新了。最好是结合HBASE发布的文档一起看。
  • 非常好的一本书,强烈推荐
  • 还是比较完整的doc文档,可以慢慢研究一下nosql。
  • 不错~ 是全文
  • word文档,文字版,原书中各种标识基本都有,谢谢分享