I am re-configuring a Hadoop cluster to use the High Availability (HA) features for both the shared filesystem and the MR1 jobtracker.
我正在重新配置Hadoop集群,以便为共享文件系统和MR1作业跟踪器使用高可用性(HA)功能。
It seems I can't get the automatic failover features for both to work at the same time. Instead one of the services is stuck with both (all) daemons stuck in standby.
似乎我无法获得两者同时工作的自动故障转移功能。相反,其中一个服务被两个(所有)守护进程停留在待机状态。
How do I get automatic failover to work for all my HA services?
如何使自动故障转移适用于我的所有HA服务?
I'm using:
- Cloudera CDH 4.5.0
- JDK 7
- Ubuntu 12.04
- ...without the Cloudera Manager
Cloudera CDH 4.5.0
......没有Cloudera Manager
1 个解决方案
#1
The namenode and jobtracker share a similar HA implementation, to the degree that they both extend the same base class. They both use a backing zookeeper cluster to decide which available node is active.
namenode和jobtracker共享一个类似的HA实现,只要它们都扩展相同的基类。它们都使用后备zookeeper集群来决定哪个可用节点处于活动状态。
The location used in zookeeper is constructed by appending the failover group name (i.e. the values given in dfs.nameservices
and mapred.job.tracker
) to a configurable prefix.
zookeeper中使用的位置是通过将故障转移组名称(即dfs.nameservices和mapred.job.tracker中给出的值)附加到可配置前缀来构造的。
For both services the configurable prefix is /hadoop-ha
by default.
对于这两种服务,默认情况下可配置前缀为/ hadoop-ha。
This means if the two services are configured with the same failover group name (say, my-application
) then the final zookeeper paths used by the two services will collide. If they in fact use the same zookeeper cluster, one service will not get a zookeeper node and have broken automatic failover.
这意味着如果两个服务配置了相同的故障转移组名称(例如,my-application),那么两个服务使用的最终zookeeper路径将发生冲突。如果它们实际上使用相同的zookeeper群集,则一个服务将不会获得zookeeper节点并且已经破坏了自动故障转移。
The solution is to avoid the collision. The easiest way is to ensure that mapred.job.tracker
in mapred-site.xml
and dfs.nameservices
in hdfs-site.xml
do not contain common values.
解决方案是避免碰撞。最简单的方法是确保mapred-site.xml中的mapred.job.tracker和hdfs-site.xml中的dfs.nameservices不包含公共值。
One might also attempt to configure the /hadoop-ha
prefix on a per-service basis. It is controlled by the ha.zookeeper.parent-znode
configuration property.
也可能尝试在每个服务的基础上配置/ hadoop-ha前缀。它由ha.zookeeper.parent-znode配置属性控制。
For example, in hdfs-site.xml
one might have:
例如,在hdfs-site.xml中,可能有:
<property>
<name>ha.zookeeper.parent-znode</name>
<value>/hdfs-ha</value>
</property>
...while mapred-site.xml
contains:
...而mapred-site.xml包含:
<property>
<name>ha.zookeeper.parent-znode</name>
<value>/mapred-ha</value>
</property>
However, be aware that in this configuration hdfs-site.xml
and mapred-site.xml
can't be loaded at the same time: the value of one key will clobber the other.
但是,请注意,在此配置中,无法同时加载hdfs-site.xml和mapred-site.xml:一个键的值将破坏另一个键。
Either way the zookeeper paths will have changed, requiring the respective -formatZK
commands to be re-run.
无论哪种方式,zookeeper路径都会发生变化,需要重新运行相应的-formatZK命令。
#1
The namenode and jobtracker share a similar HA implementation, to the degree that they both extend the same base class. They both use a backing zookeeper cluster to decide which available node is active.
namenode和jobtracker共享一个类似的HA实现,只要它们都扩展相同的基类。它们都使用后备zookeeper集群来决定哪个可用节点处于活动状态。
The location used in zookeeper is constructed by appending the failover group name (i.e. the values given in dfs.nameservices
and mapred.job.tracker
) to a configurable prefix.
zookeeper中使用的位置是通过将故障转移组名称(即dfs.nameservices和mapred.job.tracker中给出的值)附加到可配置前缀来构造的。
For both services the configurable prefix is /hadoop-ha
by default.
对于这两种服务,默认情况下可配置前缀为/ hadoop-ha。
This means if the two services are configured with the same failover group name (say, my-application
) then the final zookeeper paths used by the two services will collide. If they in fact use the same zookeeper cluster, one service will not get a zookeeper node and have broken automatic failover.
这意味着如果两个服务配置了相同的故障转移组名称(例如,my-application),那么两个服务使用的最终zookeeper路径将发生冲突。如果它们实际上使用相同的zookeeper群集,则一个服务将不会获得zookeeper节点并且已经破坏了自动故障转移。
The solution is to avoid the collision. The easiest way is to ensure that mapred.job.tracker
in mapred-site.xml
and dfs.nameservices
in hdfs-site.xml
do not contain common values.
解决方案是避免碰撞。最简单的方法是确保mapred-site.xml中的mapred.job.tracker和hdfs-site.xml中的dfs.nameservices不包含公共值。
One might also attempt to configure the /hadoop-ha
prefix on a per-service basis. It is controlled by the ha.zookeeper.parent-znode
configuration property.
也可能尝试在每个服务的基础上配置/ hadoop-ha前缀。它由ha.zookeeper.parent-znode配置属性控制。
For example, in hdfs-site.xml
one might have:
例如,在hdfs-site.xml中,可能有:
<property>
<name>ha.zookeeper.parent-znode</name>
<value>/hdfs-ha</value>
</property>
...while mapred-site.xml
contains:
...而mapred-site.xml包含:
<property>
<name>ha.zookeeper.parent-znode</name>
<value>/mapred-ha</value>
</property>
However, be aware that in this configuration hdfs-site.xml
and mapred-site.xml
can't be loaded at the same time: the value of one key will clobber the other.
但是,请注意,在此配置中,无法同时加载hdfs-site.xml和mapred-site.xml:一个键的值将破坏另一个键。
Either way the zookeeper paths will have changed, requiring the respective -formatZK
commands to be re-run.
无论哪种方式,zookeeper路径都会发生变化,需要重新运行相应的-formatZK命令。