环境:
spark 2.0.0,anaconda2
1.spark ipython和notebook安装配置
方法一:
这个方法可以通过网页进入ipython notebook,另开终端可以进入pyspark如果装有Anaconda 就可以直接如下方式获得IPython界面的登陆,没有装Anaconda的参考最下边的链接自行安装ipython相关包。
vi ~/.bashrc
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"
source ~/.bashrc
重新启动pyspark
出现
ting a Notebook with PySpark
On the driver host, choose a directory notebook_directory to run the Notebook. notebook_directory contains the .ipynb files that represent the different notebooks that can be served.
In notebook_directory, run pyspark with your desired runtime options. You should see output like the following:
参考:
ipython和jupyter on spark 2.0.0
http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html
方法二:
方法二用ipython可以,但是jupyter有问题,不知道是不是个别的
It is also possible to launch the PySpark shell in IPython, the enhanced Python interpreter. PySpark works with IPython 1.0.0 and later. To use IPython, set the PYSPARK_DRIVER_PYTHON variable to ipython when running bin/pyspark:
$ PYSPARK_DRIVER_PYTHON=ipython ./bin/pyspark
To use the Jupyter notebook (previously known as the IPython notebook),
$ PYSPARK_DRIVER_PYTHON=jupyter ./bin/pyspark
You can customize the ipython or jupyter commands by setting PYSPARK_DRIVER_PYTHON_OPTS.
root@py-server:/server/bin# PYSPARK_DRIVER_PYTHON=ipython $SPARK_HOME/bin/pyspark
Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jul 2 2016, 17:42:40)
Type "copyright", "credits" or "license" for more information.
IPython 4.2.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/03 22:24:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Python version 2.7.12 (default, Jul 2 2016 17:42:40)
SparkSession available as 'spark'.
In [1]:
2. 使用:
Open http://notebook_host:8880/ in a browser.
比如:http://spark01:8880/
New->Python打开Python界面
Shift+Enter or Shift+Return执行命令
注意:
设置IPython后,pyspark就只能用IPython,除非恢复环境变量
3.测试例子
引用:《Spark for Python Developers》
file_in换成你自己的文件,如果是本地就用#那一句,hdfs就默认,修改一下具体地址即可。