Linux 全文搜索引擎 Sphinx 使用

时间:2022-01-20 08:27:08
Sphinx是一款高效的全文搜索引擎,同类的有Lucene, Xapian等。顺便提下记得douban的一次演讲中就提到Sphinx不能满足需求而转向Xapian。不过以我用Sphinx的经验来看,一般应用还是手到摛来。

一、下载Sphinx

当前的Release版本为0.9.9, 不过有编译上的小bug  http://sphinxsearch.com/bugs/view.php?id=453 ,不过很容易修复,当然您可以选择使用最新的1.10-beta版。
wget http://sphinxsearch.com/files/sphinx-0.9.9.tar.gz

二、安装Sphinx

因为有可能会需要支持postgresql所以加上--with-pgsql,如果不需要pgsql支持的不要加,以免出现依赖问题。

1. 通用安装

./configure --with-pgsql
make
make install
cd api/libsphinxclient
#sed这行仅适用于0.9.9
sed -ie '280s/^/static /' sphinxclient.c
./configure
make
make install

2. ArchLinux下的安装

因为我本机上系统为ArchLinux,为了保持系统整洁便于管理,可以制作了一个ArchLinux的PKGBUILD
# Contributor: Jiang Miao
pkgname=sphinx
pkgver=0.9.9
pkgrel=1
pkgdesc='Sphinx full search engine'
arch=(i686 x86_64)
license=('GPL')
source=(http://sphinxsearch.com/files/$pkgname-$pkgver.tar.gz)
md5sums=('7b9b618cb9b378f949bb1b91ddcc4f54')
# avoid make[1]: *** No rule to make target `.libs/libsphinxclient.a', needed by `test'. Stop.
# see https://bbs.archlinux.org/viewtopic.php?id=77214
options=('!makeflags')

build() {
cd $startdir/src/$pkgname-$pkgver
./configure --prefix=/usr \
--sysconfdir=/etc/sphinx \
--localstatedir=/var/lib/sphinx \
--with-pgsql
make || return 1
make DESTDIR=$startdir/pkg install || return 1

cd api/libsphinxclient
# fix bug 'error: static declaration of 'sock_close' follows non-static declaration'
# see http://sphinxsearch.com/bugs/view.php?id=453
sed -ie '280s/^/static /' sphinxclient.c
./configure --prefix=/usr
make || return 1
make DESTDIR=$startdir/pkg install || return 1
}
编译打包并安装
makepkg -c
sudo pacman -U sphinx-0.9.9-1-i686.pkg.tar.xz

三、测试sphinx

1. 导入使用sphinx的测试数据到表sphinx_test

$ mysql -uroot -p
mysql> CREATE DATABASE IF NOT EXISTS test;
Query OK, 1 row affected (0.00 sec)

mysql> source /etc/sphinx/example.sql
Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.13 sec)

Query OK, 4 rows affected (0.10 sec)
Records: 4 Duplicates: 0 Warnings: 0

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.14 sec)

Query OK, 10 rows affected (0.13 sec)
Records: 10 Duplicates: 0 Warnings: 0

2. 创建sphinx.conf

相关的columns为
id int
title VARCHAR(255)
content TEXT

sphinx.conf为
source test {
type = mysql
sql_host = localhost
sql_user = test
sql_pass = test_password
sql_db = test

sql_query = SELECT id, title, content FROM documents
sql_query_info = SELECT * FROM documents WHERE id = $id
}

index test {
source = test
path = ./data/test
}

searchd {
pid_file = ./run/searchd.pid
log = ./log/searchd.log
query_log = ./log/query.log
max_matches = 1000
}

3. 使用indexer创建索引

$ mkdir data log run
$ indexer test
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file './sphinx.conf'...
indexing index 'test'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.019 sec, 10074 bytes/sec, 208.80 docs/sec
total 1 reads, 0.000 sec, 0.2 kb/call avg, 0.0 msec/call avg
total 5 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg

4. 测试查询my test

$ search my test
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file './sphinx.conf'...
index 'test': query 'my test ': returned 2 matches of 2 total in 0.000 sec

displaying matches:
1. document=1, weight=3
id=1
group_id=1
group_id2=5
date_added=2011-02-24 19:49:41
title=test one
content=this is my test document number one. also checking search within phrases.
2. document=2, weight=3
id=2
group_id=1
group_id2=6
date_added=2011-02-24 19:49:41
title=test two
content=this is my test document number two

words:
1. 'my': 2 documents, 2 hits
2. 'test': 3 documents, 5 hits

四、PHP调用Sphinx API

1. 安装sphinx php pecl api

pecl install sphinx
# 加入sphinx扩展到配置文件
echo 'extension=sphinx.so' > /etc/php/conf.d/sphinx.ini
# 重启php-cgi,依环境的不同而不同,我这里是lighttpd
/etc/rc.d/lighttpd restart

2. 编写测试文件test.php

<?php
$cl = new SphinxClient();
$result = $cl->query("my test");
if ($result === false) {
die($cl->getLastError());
}
print_r($result);

3. 启动searchd

$ searchd
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file './sphinx.conf'...
listening on all interfaces, port=9312

4. 运行test.php查看效果

$ php test.php
Array
(
[error] =>
[warning] =>
[status] => 0
[fields] => Array
(
[0] => title
[1] => content
)

[attrs] => Array
(
)

[matches] => Array
(
[1] => Array
(
...

五、相关链接

Sphinx 官网
Sphinx 文档
Sphinx 0.9.9 文档
Sphinx PHP API文档