I am trying to import and use pyspark
with anaconda.
我正在尝试导入和使用火花塞和水蟒。
After installing spark, and setting the $SPARK_HOME
variable I tried:
安装spark之后,设置我尝试的$SPARK_HOME变量:
$ pip install pyspark
This won't work (of course) because I discovered that I need to tel python to look for pyspark
under $SPARK_HOME/python/
. The problem is that to do that, I need to set the $PYTHONPATH
while anaconda don't use that environment variable.
这不会起作用(当然),因为我发现我需要用telpython在$SPARK_HOME/python/下查找pyspark。问题是,要做到这一点,我需要设置$PYTHONPATH,而anaconda不使用该环境变量。
I tried to copy the content of $SPARK_HOME/python/
to ANACONDA_HOME/lib/python2.7/site-packages/
but it won't work.
我试图将$SPARK_HOME/python/的内容复制到ANACONDA_HOME/lib/python2.7/site-packages/但它不起作用。
Is there any solution to use pyspark in anaconda?
在水蟒中是否有使用火苗的解决方案?
4 个解决方案
#1
8
You can simply set PYSPARK_DRIVER_PYTHON
and PYSPARK_PYTHON
environmental variables to use either root Anaconda Python or a specific Anaconda environment. For example:
您可以简单地设置PYSPARK_DRIVER_PYTHON和PYSPARK_PYTHON环境变量来使用根Anaconda Python或特定的Anaconda环境。例如:
export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python
or
或
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/envs/foo/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/envs/foo/bin/python
When you use $SPARK_HOME/bin/pyspark
/ $SPARK_HOME/bin/spark-submit
it will choose a correct environment. Just remember that PySpark has to the same Python version on all machines.
当您使用$SPARK_HOME/bin/pyspark / $SPARK_HOME/bin/spark-submit时,它将选择正确的环境。请记住,PySpark在所有机器上都有相同的Python版本。
On a side note using PYTHONPATH
should work just fine, even if it is not recommended.
顺便说一句,使用PYTHONPATH应该可以工作得很好,即使不推荐它。
#2
1
I don't believe that you need nor can install pyspark as a module. Instead, I extended my $PYTHONPATH
in my ~/.bash_profile as follows:
我认为您不需要也不能将pyspark安装为模块。相反,我在我的~/中扩展了我的$PYTHONPATH。bash_profile如下:
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
出口到PYTHONPATH = $ SPARK_HOME / python:SPARK_HOME / python /构建:PYTHONPATH美元
After that, I was able to import pyspark as ps
. Hope that works for you too.
在那之后,我可以导入pyspark as ps.希望这对你也适用。
#3
1
Here are the complete set of environment variables I had to put in my .bashrc to get this to work in both scripts and notebook
这里有一组完整的环境变量,我必须在我的.bashrc中才能在脚本和笔记本中使用
export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python
export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7
export PYLIB=/opt/spark-2.1.0-bin-hadoop2.7/python/lib
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
#4
0
Perhaps this can help someone, According to the Anaconda documentation, you install FindSpark as follows:
也许这可以帮助某人,根据Anaconda文档,您安装FindSpark如下:
conda install -c conda-forge findspark
It was only after installing it as showned about that I was able to import FindSpark. No export statements required.
只是在将它安装为showned之后,我才能够导入FindSpark。不需要导出报表。
#1
8
You can simply set PYSPARK_DRIVER_PYTHON
and PYSPARK_PYTHON
environmental variables to use either root Anaconda Python or a specific Anaconda environment. For example:
您可以简单地设置PYSPARK_DRIVER_PYTHON和PYSPARK_PYTHON环境变量来使用根Anaconda Python或特定的Anaconda环境。例如:
export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python
or
或
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/envs/foo/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/envs/foo/bin/python
When you use $SPARK_HOME/bin/pyspark
/ $SPARK_HOME/bin/spark-submit
it will choose a correct environment. Just remember that PySpark has to the same Python version on all machines.
当您使用$SPARK_HOME/bin/pyspark / $SPARK_HOME/bin/spark-submit时,它将选择正确的环境。请记住,PySpark在所有机器上都有相同的Python版本。
On a side note using PYTHONPATH
should work just fine, even if it is not recommended.
顺便说一句,使用PYTHONPATH应该可以工作得很好,即使不推荐它。
#2
1
I don't believe that you need nor can install pyspark as a module. Instead, I extended my $PYTHONPATH
in my ~/.bash_profile as follows:
我认为您不需要也不能将pyspark安装为模块。相反,我在我的~/中扩展了我的$PYTHONPATH。bash_profile如下:
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
出口到PYTHONPATH = $ SPARK_HOME / python:SPARK_HOME / python /构建:PYTHONPATH美元
After that, I was able to import pyspark as ps
. Hope that works for you too.
在那之后,我可以导入pyspark as ps.希望这对你也适用。
#3
1
Here are the complete set of environment variables I had to put in my .bashrc to get this to work in both scripts and notebook
这里有一组完整的环境变量,我必须在我的.bashrc中才能在脚本和笔记本中使用
export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python
export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7
export PYLIB=/opt/spark-2.1.0-bin-hadoop2.7/python/lib
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
#4
0
Perhaps this can help someone, According to the Anaconda documentation, you install FindSpark as follows:
也许这可以帮助某人,根据Anaconda文档,您安装FindSpark如下:
conda install -c conda-forge findspark
It was only after installing it as showned about that I was able to import FindSpark. No export statements required.
只是在将它安装为showned之后,我才能够导入FindSpark。不需要导出报表。