背景
xgboost是GBDT算法的实现,可以做回归,分类,和排序。支持各种语言调用,支持单机和分布式。非常适合于大规模数据集。
- 项目主页
- 安装
安装
我选择了Python调用xgboost的方式。
从项目主页下载源码,解压。
[root@biostacs qgzang]# git clone --recursive https://github.com/dmlc/xgboost
Cloning into 'xgboost'...
remote: Counting objects: 17097, done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 17097 (delta 3), reused 0 (delta 0), pack-reused 17068
Receiving objects: 100% (17097/17097), 5.60 MiB | 948.00 KiB/s, done.
Resolving deltas: 100% (10496/10496), done.
Submodule 'dmlc-core' (https://github.com/dmlc/dmlc-core) registered for path 'dmlc-core'
Submodule 'rabit' (https://github.com/dmlc/rabit) registered for path 'rabit'
Cloning into 'dmlc-core'...
remote: Counting objects: 3545, done.
remote: Total 3545 (delta 0), reused 0 (delta 0), pack-reused 3544
Receiving objects: 100% (3545/3545), 789.33 KiB | 181.00 KiB/s, done.
Resolving deltas: 100% (2099/2099), done.
Submodule path 'dmlc-core': checked out '9fd3b48462a7a651e12a197679f71e043dcb25a2'
Cloning into 'rabit'...
remote: Counting objects: 3085, done.
remote: Total 3085 (delta 0), reused 0 (delta 0), pack-reused 3085
Receiving objects: 100% (3085/3085), 881.50 KiB | 197.00 KiB/s, done.
Resolving deltas: 100% (2004/2004), done.
Submodule path 'rabit': checked out '8f61535b83e650331459d7f33a1615fa7d27b7bd'
[root@biostacs qgzang]# cd xgboost/
[root@biostacs xgboost]# ll
total 80
drwxr-xr-x. 2 root root 4096 6月 22 15:18 amalgamation
-rwxr-xr-x. 1 root root 759 6月 22 15:18 build.sh
-rw-r--r--. 1 root root 3548 6月 22 15:18 CONTRIBUTORS.md
drwxr-xr-x. 12 root root 4096 6月 22 15:18 demo
drwxr-xr-x. 12 root root 4096 6月 22 15:18 dmlc-core
drwxr-xr-x. 12 root root 4096 6月 22 15:18 doc
drwxr-xr-x. 3 root root 4096 6月 22 15:18 include
drwxr-xr-x. 6 root root 4096 6月 22 15:18 jvm-packages
-rw-r--r--. 1 root root 559 6月 22 15:18 LICENSE
drwxr-xr-x. 2 root root 4096 6月 22 15:18 make
-rw-r--r--. 1 root root 4988 6月 22 15:18 Makefile
-rw-r--r--. 1 root root 4087 6月 22 15:18 NEWS.md
drwxr-xr-x. 5 root root 4096 6月 22 15:18 plugin
drwxr-xr-x. 3 root root 4096 6月 22 15:18 python-package
drwxr-xr-x. 10 root root 4096 6月 22 15:18 rabit
-rw-r--r--. 1 root root 3843 6月 22 15:18 README.md
drwxr-xr-x. 9 root root 4096 6月 22 15:18 R-package
drwxr-xr-x. 9 root root 4096 6月 22 15:18 src
drwxr-xr-x. 5 root root 4096 6月 22 15:18 tests
在解压后的目录下执行make命令安装。
make -j4
安装python版xgboost
在子文件夹python-package目录下,执行python setup.py install。
[root@biostacs xgboost]# cd python-package/
[root@biostacs python-package]# ll
total 32
-rw-r--r--. 1 root root 4483 6月 22 15:18 build_trouble_shooting.md
-rw-r--r--. 1 root root 372 6月 22 15:18 MANIFEST.in
-rw-r--r--. 1 root root 2481 6月 22 15:18 README.rst
-rw-r--r--. 1 root root 41 6月 22 15:18 setup.cfg
-rw-r--r--. 1 root root 2277 6月 22 15:18 setup_pip.py
-rw-r--r--. 1 root root 1559 6月 22 15:18 setup.py
drwxr-xr-x. 2 root root 4096 6月 22 15:18 xgboost
[root@biostacs python-package]# python setup.py install
Install libxgboost from: ['/home/storage2T/qgzang/xgboost/python-package/xgboost/../../lib/libxgboost.so']
running install
running bdist_egg
running egg_info
creating xgboost.egg-info
writing requirements to xgboost.egg-info/requires.txt
writing xgboost.egg-info/PKG-INFO
writing top-level names to xgboost.egg-info/top_level.txt
writing dependency_links to xgboost.egg-info/dependency_links.txt
writing manifest file 'xgboost.egg-info/SOURCES.txt'
reading manifest file 'xgboost.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*' under directory 'xgboost/include'
warning: no files found matching '*' under directory 'xgboost/src'
warning: no previously-included files matching 'xgboost/build/*' found anywhere in distribution
warning: no previously-included files matching 'xgboost/*.o' found anywhere in distribution
warning: no previously-included files matching '*.pyo' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
writing manifest file 'xgboost.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/xgboost
copying xgboost/training.py -> build/lib/xgboost
copying xgboost/core.py -> build/lib/xgboost
copying xgboost/plotting.py -> build/lib/xgboost
copying xgboost/callback.py -> build/lib/xgboost
copying xgboost/__init__.py -> build/lib/xgboost
copying xgboost/libpath.py -> build/lib/xgboost
copying xgboost/compat.py -> build/lib/xgboost
copying xgboost/rabit.py -> build/lib/xgboost
copying xgboost/sklearn.py -> build/lib/xgboost
copying xgboost/VERSION -> build/lib/xgboost
copying xgboost/build-python.sh -> build/lib/xgboost
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/xgboost
copying build/lib/xgboost/training.py -> build/bdist.linux-x86_64/egg/xgboost
copying build/lib/xgboost/core.py -> build/bdist.linux-x86_64/egg/xgboost
copying build/lib/xgboost/plotting.py -> build/bdist.linux-x86_64/egg/xgboost
copying build/lib/xgboost/build-python.sh -> build/bdist.linux-x86_64/egg/xgboost
copying build/lib/xgboost/callback.py -> build/bdist.linux-x86_64/egg/xgboost
copying build/lib/xgboost/VERSION -> build/bdist.linux-x86_64/egg/xgboost
copying build/lib/xgboost/__init__.py -> build/bdist.linux-x86_64/egg/xgboost
copying build/lib/xgboost/libpath.py -> build/bdist.linux-x86_64/egg/xgboost
copying build/lib/xgboost/compat.py -> build/bdist.linux-x86_64/egg/xgboost
copying build/lib/xgboost/rabit.py -> build/bdist.linux-x86_64/egg/xgboost
copying build/lib/xgboost/sklearn.py -> build/bdist.linux-x86_64/egg/xgboost
byte-compiling build/bdist.linux-x86_64/egg/xgboost/training.py to training.pyc
byte-compiling build/bdist.linux-x86_64/egg/xgboost/core.py to core.pyc
byte-compiling build/bdist.linux-x86_64/egg/xgboost/plotting.py to plotting.pyc
byte-compiling build/bdist.linux-x86_64/egg/xgboost/callback.py to callback.pyc
byte-compiling build/bdist.linux-x86_64/egg/xgboost/__init__.py to __init__.pyc
byte-compiling build/bdist.linux-x86_64/egg/xgboost/libpath.py to libpath.pyc
byte-compiling build/bdist.linux-x86_64/egg/xgboost/compat.py to compat.pyc
byte-compiling build/bdist.linux-x86_64/egg/xgboost/rabit.py to rabit.pyc
byte-compiling build/bdist.linux-x86_64/egg/xgboost/sklearn.py to sklearn.pyc
installing package data to build/bdist.linux-x86_64/egg
running install_data
copying /home/storage2T/qgzang/xgboost/python-package/xgboost/../../lib/libxgboost.so -> build/bdist.linux-x86_64/egg/xgboost
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying xgboost.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying xgboost.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying xgboost.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying xgboost.egg-info/not-zip-safe -> build/bdist.linux-x86_64/egg/EGG-INFO
copying xgboost.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying xgboost.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
creating dist
creating 'dist/xgboost-0.4-py2.7.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing xgboost-0.4-py2.7.egg
creating /root/anaconda2/lib/python2.7/site-packages/xgboost-0.4-py2.7.egg
Extracting xgboost-0.4-py2.7.egg to /root/anaconda2/lib/python2.7/site-packages
Adding xgboost 0.4 to easy-install.pth file
Installed /root/anaconda2/lib/python2.7/site-packages/xgboost-0.4-py2.7.egg
Processing dependencies for xgboost==0.4
Searching for scipy==0.17.0
Best match: scipy 0.17.0
Adding scipy 0.17.0 to easy-install.pth file
Using /root/anaconda2/lib/python2.7/site-packages
Searching for numpy==1.10.4
Best match: numpy 1.10.4
Adding numpy 1.10.4 to easy-install.pth file
Using /root/anaconda2/lib/python2.7/site-packages
Finished processing dependencies for xgboost==0.4
可能会缺失一些依赖库需要安装。比如在步骤二需要你安装g++,在步骤三需要你安全python的一些数学库。
测试是否安装成功
ipython 里导入xgboost包
[root@biostacs python-package]# ipython
Python 2.7.11 |Anaconda custom (64-bit)| (default, Dec 6 2015, 18:08:32)
Type "copyright", "credits" or "license" for more information.
IPython 4.1.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import xgboost as xgb
没有报错。