与Cloudera和Hortonworks相比，Hadoop发行版MapR有哪些缺点？

Cloudera and Hortonworks use HDFS, one of the basic concepts of Apache Hadoop. MapR uses its own concept / implementation. Instead of HDFS, you use the native file system directly. You can find a lot of advantages using this approach on the website of MapR.

Cloudera和Hortonworks使用HDFS，这是Apache Hadoop的基本概念之一。 MapR使用自己的概念/实现。您可以直接使用本机文件系统而不是HDFS。您可以在MapR网站上使用此方法找到许多优势。

I wonder what are the disadvantages of this approach?

我想知道这种方法的缺点是什么？

4 个解决方案

#1

I would define MapR a bit differently. It does not use HDFS, but instead of it provides their own distributed file system with NFS interface. which, as well as HDFS is based on local FS.
Main differances are coming from the fact that HDFS is not Posix and other design choices.
1. HDFS is not mutable while MapR is. It can be viewed as advantage, especially if you need it.
2. HDFS is not mountable while MapR is. You can use any existing tools working with Linux FS.

我会稍微定义MapR。它不使用HDFS，而是使用NFS接口提供自己的分布式文件系统。其中，以及HDFS基于本地FS。主要的不同之处在于HDFS不是Posix和其他设计选择。 1. MapR是HDFS不可变的。它可以被视为有利，特别是如果你需要它。 2. MapR时无法安装HDFS。您可以使用任何使用Linux FS的现有工具。

Unrelated to posix: MapR have small block size and not single point of failure (NameNode). MapR Has multisite replication.

与posix无关：MapR具有较小的块大小而不是单点故障（NameNode）。 MapR具有多站点复制。

lets look on dark side also: a) Having mutable data (instead of not mutable HDFS) makes system more complicated.
b) It is not known (at least for me) to work on huge clusters. (I heard about hundred of nodes).
c) From architecture point (having small blocks) I am not sure how good data locality can be achieved.

让我们看看黑暗的一面：a）拥有可变数据（而不是不可变的HDFS）会使系统更加复杂。 b）不知道（至少对我来说）在大型集群上工作。（我听说过几百个节点）。 c）从架构点（具有小块）我不确定如何实现良好的数据局部性。

#2

David, the minute-sort record was set by MapR on the Google Compute Engine in the Google Cloud on 1/30/2013. See our blog at http://www.mapr.com/blog/hadoop-minutesort-record. The record was set on a 2103-node cluster and 1.5 TB of data was sorted in 59 seconds.

大卫，分钟记录由MapR在2013年1月30日的Google Cloud中的Google Compute Engine上设置。请访问我们的博客http://www.mapr.com/blog/hadoop-minutesort-record。该记录在2103节点集群上设置，1.5 TB数据在59秒内排序。

Also see an earlier blog about the Terasort record by MapR sorting 1 TB of data in 54 seconds. It was set on a 1003-node cluster on the Google Compute Engine in the Google Cloud. The blog is posted at http://www.mapr.com/blog/record-setting-hadoop-in-the-cloud.

另请参阅早期博客，关于MapR记录，MapR在54秒内对1 TB数据进行排序。它设置在Google Cloud中Google Compute Engine上的1003节点群集上。该博客发布在http://www.mapr.com/blog/record-setting-hadoop-in-the-cloud。

Also see answers.mapr.com for many questions/answers on this topic.

有关此主题的许多问题/答案，请参阅answers.mapr.com。

#3

Until some impartial source does extensive benchmarking (under varying workloads) of Apache Hadoop vs. MapR's version, I think we cannot categorically say one is faster than the other. If records are going to determine your opinion, then you should now that the current terasort record is held by Yahoo, with Apache Hadoop. Details here and here.

直到一些公正的来源对Apache Hadoop与MapR的版本进行广泛的基准测试（在不同的工作负载下），我认为我们不能断然说一个比另一个更快。如果记录将决定您的意见，那么您现在应该使用Apache Hadoop来保存当前的terasort记录。细节在这里和这里。

#4

The main disadvantage between MapR and Hortonworks/Cloudera is that MapRFS (file system) and MapR-DB (NOSQL database) are proprietary (not open source). If MapR were to no longer exist, it is assumed that these products would cease to be developed and supported.

MapR和Hortonworks / Cloudera之间的主要缺点是MapRFS（文件系统）和MapR-DB（NOSQL数据库）是专有的（非开源）。如果MapR不再存在，则假定这些产品将不再开发和支持。

There is less risk of HDFS/HBase not being developed and supported as Hortonworks, Cloudera and other Hadoop distributions use/support HDFS/HBase along with the open source community.

由于Hortonworks，Cloudera和其他Hadoop发行版使用/支持HDFS / HBase以及开源社区，因此不会开发和支持HDFS / HBase的风险较小。

#1