Google dataproc：无法访问spark历史记录页面

I created a Google dataproc cluster. After logging into master node I started spark-shell then trying to access spark history page using

我创建了一个Google dataproc集群。登录到主节点后,我启动了spark-shell,然后尝试访问spark历史页面

http://<external_ip_masternode>:4040

It get redirected to

它被重定向到

http://<hostname_mastername>:8088/proxy/application_1487485713573_0002/

Browser is rejecting with error "DNS address could not be found." which is understandable.

浏览器拒绝错误“无法找到DNS地址”。这是可以理解的。

Following are VM instance setting

以下是VM实例设置

Public IP type Ephermal tcp:4040 opened in firewall ip forwarding Off: Unable to edit this configuration

公共IP类型Ephermal tcp:4040在防火墙中打开ip forwarding关闭:无法编辑此配置

Following troubleshooting done but did not help

完成故障排除但没有帮助

Telnet to :4040 -> Working

Telnet至:4040 - >工作

Access from Ubantu host/ browser Chrome: Getting redirected and name lookup failure

从Ubantu主机/浏览器访问Chrome:重定向和名称查找失败

Access from Ubantu host /browser Firefox: Getting redirected and name lookup failure

从Ubantu主机/浏览器访问Firefox:重定向和名称查找失败

Access from Mac OSX host /browser Safari : Getting redirected and name lookup failure

从Mac OSX主机/浏览器访问Safari:重定向和名称查找失败

Access from Mac OSX host/ browser chrome : Getting redirected and name lookup failure

从Mac OSX主机/浏览器访问Chrome:获取重定向和名称查找失败

1 个解决方案

#1

To view Hadoop web interfaces in Dataproc, it is recommended to follow the instructions for running an SSH-based SOCKS proxy: https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces

要在Dataproc中查看Hadoop Web界面,建议按照运行基于SSH的SOCKS代理的说明进行操作:https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces

If you follow the instructions there, it'll also have you run a separate browser session using your SSH tunnel, and sets hostname resolution to occur on the VM side of the tunnel. That way, all the links in the Hadoop pages will automatically work, since they all reference each other using internal hostnames, and intentionally avoid any dependency on "external IP addresses".

如果您按照其中的说明进行操作,则还可以使用SSH隧道运行单独的浏览器会话,并将主机名解析设置为在隧道的VM端进行。这样,Hadoop页面中的所有链接都将自动工作,因为它们都使用内部主机名相互引用,并有意避免依赖“外部IP地址”。

Using the SSH tunnel is also much more secure than opening up firewall rules to visit the unencrypted HTTP traffic directly coming from the Hadoop HTTP servers (if you accidentally open up your firewall rules too broadly, then other people on the internet will be able to access your external IP addresses, and even if you don't, attackers could see your unencrypted web traffic served up by the ApplicationMaster, HistoryServer, etc.).

使用SSH隧道比打开防火墙规则访问直接来自Hadoop HTTP服务器的未加密HTTP流量要安全得多(如果您不小心打开防火墙规则,那么互联网上的其他人就可以访问您的外部IP地址,即使您没有,攻击者也可以看到ApplicationMaster,HistoryServer等提供的未加密的Web流量。

#1

To view Hadoop web interfaces in Dataproc, it is recommended to follow the instructions for running an SSH-based SOCKS proxy: https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces

要在Dataproc中查看Hadoop Web界面,建议按照运行基于SSH的SOCKS代理的说明进行操作:https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces

秒客网

Google dataproc：无法访问spark历史记录页面

1 个解决方案

#1

#1

相关文章