如何在水附子中进口火花塞

时间:2022-07-20 23:11:44

I am trying to import and use pyspark with anaconda.

我正在尝试导入和使用火花塞和水蟒。

After installing spark, and setting the $SPARK_HOME variable I tried:

安装spark之后,设置我尝试的$SPARK_HOME变量:

$ pip install pyspark

This won't work (of course) because I discovered that I need to tel python to look for pyspark under $SPARK_HOME/python/. The problem is that to do that, I need to set the $PYTHONPATH while anaconda don't use that environment variable.

这不会起作用(当然),因为我发现我需要用telpython在$SPARK_HOME/python/下查找pyspark。问题是,要做到这一点,我需要设置$PYTHONPATH,而anaconda不使用该环境变量。

I tried to copy the content of $SPARK_HOME/python/ to ANACONDA_HOME/lib/python2.7/site-packages/ but it won't work.

我试图将$SPARK_HOME/python/的内容复制到ANACONDA_HOME/lib/python2.7/site-packages/但它不起作用。

Is there any solution to use pyspark in anaconda?

在水蟒中是否有使用火苗的解决方案?

4 个解决方案

#1


8  

You can simply set PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON environmental variables to use either root Anaconda Python or a specific Anaconda environment. For example:

您可以简单地设置PYSPARK_DRIVER_PYTHON和PYSPARK_PYTHON环境变量来使用根Anaconda Python或特定的Anaconda环境。例如:

export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python

or

export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/envs/foo/bin/ipython 
export PYSPARK_PYTHON=$ANACONDA_ROOT/envs/foo/bin/python 

When you use $SPARK_HOME/bin/pyspark / $SPARK_HOME/bin/spark-submit it will choose a correct environment. Just remember that PySpark has to the same Python version on all machines.

当您使用$SPARK_HOME/bin/pyspark / $SPARK_HOME/bin/spark-submit时,它将选择正确的环境。请记住,PySpark在所有机器上都有相同的Python版本。

On a side note using PYTHONPATH should work just fine, even if it is not recommended.

顺便说一句,使用PYTHONPATH应该可以工作得很好,即使不推荐它。

#2


1  

I don't believe that you need nor can install pyspark as a module. Instead, I extended my $PYTHONPATH in my ~/.bash_profile as follows:

我认为您不需要也不能将pyspark安装为模块。相反,我在我的~/中扩展了我的$PYTHONPATH。bash_profile如下:

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH

出口到PYTHONPATH = $ SPARK_HOME / python:SPARK_HOME / python /构建:PYTHONPATH美元

After that, I was able to import pyspark as ps. Hope that works for you too.

在那之后,我可以导入pyspark as ps.希望这对你也适用。

#3


1  

Here are the complete set of environment variables I had to put in my .bashrc to get this to work in both scripts and notebook

这里有一组完整的环境变量,我必须在我的.bashrc中才能在脚本和笔记本中使用

export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python

export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7
export PYLIB=/opt/spark-2.1.0-bin-hadoop2.7/python/lib

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH

#4


0  

Perhaps this can help someone, According to the Anaconda documentation, you install FindSpark as follows:

也许这可以帮助某人,根据Anaconda文档,您安装FindSpark如下:

conda install -c conda-forge findspark 

It was only after installing it as showned about that I was able to import FindSpark. No export statements required.

只是在将它安装为showned之后,我才能够导入FindSpark。不需要导出报表。

#1


8  

You can simply set PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON environmental variables to use either root Anaconda Python or a specific Anaconda environment. For example:

您可以简单地设置PYSPARK_DRIVER_PYTHON和PYSPARK_PYTHON环境变量来使用根Anaconda Python或特定的Anaconda环境。例如:

export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python

or

export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/envs/foo/bin/ipython 
export PYSPARK_PYTHON=$ANACONDA_ROOT/envs/foo/bin/python 

When you use $SPARK_HOME/bin/pyspark / $SPARK_HOME/bin/spark-submit it will choose a correct environment. Just remember that PySpark has to the same Python version on all machines.

当您使用$SPARK_HOME/bin/pyspark / $SPARK_HOME/bin/spark-submit时,它将选择正确的环境。请记住,PySpark在所有机器上都有相同的Python版本。

On a side note using PYTHONPATH should work just fine, even if it is not recommended.

顺便说一句,使用PYTHONPATH应该可以工作得很好,即使不推荐它。

#2


1  

I don't believe that you need nor can install pyspark as a module. Instead, I extended my $PYTHONPATH in my ~/.bash_profile as follows:

我认为您不需要也不能将pyspark安装为模块。相反,我在我的~/中扩展了我的$PYTHONPATH。bash_profile如下:

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH

出口到PYTHONPATH = $ SPARK_HOME / python:SPARK_HOME / python /构建:PYTHONPATH美元

After that, I was able to import pyspark as ps. Hope that works for you too.

在那之后,我可以导入pyspark as ps.希望这对你也适用。

#3


1  

Here are the complete set of environment variables I had to put in my .bashrc to get this to work in both scripts and notebook

这里有一组完整的环境变量,我必须在我的.bashrc中才能在脚本和笔记本中使用

export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python

export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7
export PYLIB=/opt/spark-2.1.0-bin-hadoop2.7/python/lib

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH

#4


0  

Perhaps this can help someone, According to the Anaconda documentation, you install FindSpark as follows:

也许这可以帮助某人,根据Anaconda文档,您安装FindSpark如下:

conda install -c conda-forge findspark 

It was only after installing it as showned about that I was able to import FindSpark. No export statements required.

只是在将它安装为showned之后,我才能够导入FindSpark。不需要导出报表。