我在python和mysql中间加了一层memcached中间层,缓存sql查询的结果,以期望获得更好的系统性能。
参考:
http://www.cnblogs.com/rollenholt/archive/2012/05/29/2524327.html
http://www.the5fire.com/python-opt-mysql.html
python连接mysql需要先安装一些lib,我是ubuntu,比较easy,直接apt-get
sudo apt-get install libmysqld-dev sudo apt-get install libmysqlclient-dev sudo apt-get install python-mysqldb
然后就可以写python去connect mysql啦,当然,一开始,mysql的数据库里面是神马也没有的,要自己先去create一个数据库,然后再create table,insert data。下面是建表和插入数据的python代码
1 #!/usr/bin/env python 2 3 # 20140105, create_table.py 4 5 import MySQLdb 6 7 try: 8 conn=MySQLdb.connect(host='192.168.1.6',user='dba',passwd='111111',port=3306) 9 cur=conn.cursor() 10 11 #cur.execute('create database if not exists testdb') 12 conn.select_db('testdb') 13 cur.execute('create table id_info(id int,info varchar(20))') 14 15 # value=[1,'hi rollen'] 16 # cur.execute('insert into id_info values(%s,%s)',value) 17 18 rg = 100000 19 values=[] 20 for i in range(rg): 21 values.append((i,'aaa'+str(i))) 22 23 cur.executemany('insert into id_info values(%s,%s)',values) 24 25 values=[] 26 for i in range(rg): 27 values.append((i+rg,'bbb'+str(i))) 28 29 cur.executemany('insert into id_info values(%s,%s)',values) 30 31 values=[] 32 for i in range(rg): 33 values.append((i+2*rg,'ccc'+str(i))) 34 35 cur.executemany('insert into id_info values(%s,%s)',values) 36 37 # cur.execute('update id_info set info="I am rollen" where id=3') 38 39 conn.commit() 40 cur.close() 41 conn.close() 42 43 except MySQLdb.Error,e: 44 print "Mysql Error %d: %s" % (e.args[0], e.args[1])
经过以上代码的运行,testdb这个数据库里面就有了一个id_info表,并且这个表里面还有了300000行数据。然后是连接mysql并执行select,我比较喜欢用面向对象的方式来写代码,所以就把连接mysql的程序做了一点封装
1 #!/usr/bin/env python 2 3 # 20140105, conn_mysql.py 4 5 import MySQLdb 6 7 class conn_mysql(object): 8 def __init__(self): 9 print "init mysql" 10 11 def __del__(self): 12 print "quit mysql" 13 14 def connect_db(self): 15 self.conn=MySQLdb.connect(host='192.168.1.6',user='dba',passwd='111111',db='testdb',port=3306) 16 self.cur=self.conn.cursor() 17 self.conn.select_db('testdb') 18 19 def test_select(self): 20 count=self.cur.execute('select * from id_info') 21 print 'there has %s rows record' % count 22 23 result=self.cur.fetchone() 24 print result 25 print 'ID: %s info %s' % result 26 27 results=self.cur.fetchmany(5) 28 for r in results: 29 print r 30 31 print '=='*10 32 self.cur.scroll(0,mode='absolute') 33 34 results=self.cur.fetchall() 35 for r in results: 36 print r[1] 37 38 self.conn.commit() 39 40 def test_count(self, str_sql): 41 count=self.cur.execute(str_sql) 42 # print 'there has %s rows record' % count 43 44 result=self.cur.fetchone() 45 # print 'id_info has %s rows' % result 46 str_rows = '%s' % result 47 return str_rows 48 49 def disconnect_db(self): 50 self.cur.close() 51 self.conn.close()
用test_select方法,来测试是否连接上,talbe里面数据很多,我是在只有30条数据的时候运行这个测试,之后实验中就一直是用test_count。test_count这个函数的意思是,对输入的sql,其类似格式是“select count(*) from ......”这样的时候,就把结果以字符串形式返回。下面是测试程序
1 #!/usr/bin/env python 2 3 # 20140105, conn_mysql_raw.py 4 5 import conn_mysql 6 7 str_sql = 'select count(*) from id_info' 8 9 db_connect = conn_mysql.conn_mysql() 10 db_connect.connect_db() 11 12 for i in range(10000): 13 str_rows = db_connect.test_count('select count(*) from id_info where info like \'bbb%\'') 14 15 print(str_rows + ' rows selected.') 16 17 db_connect.disconnect_db()
针对之前的数据,运行10000次sql,select count(*) from id_info where info like 'bbb%',当然,每次的返回结果都是100000,主要是测试这个程序的运行时间。在我的机器上,时间是1.9s。
然后是在mysql前面加入一层memcached,需要先下载python-memcached-latest.tar.gz,这个自行google吧,apt-get源里面似乎是没有。我拿到的版本是python-memcached-1.53。安装python-memcached之前要先安装python-setuptools,不然会报错“ImportError: No module named 'setuptools'”
tar zxvf python-memcached-latest.tar.gz cd python-memcached-1.53/ sudo apt-get install python-setuptools sudo python setup.py install
然后,把memcached启动,run一段python代码测试一下
1 #!/usr/bin/env python 2 # 20140105, test_memcached.py 3 4 import memcache 5 6 mc = memcache.Client(['localhost:11211'],debug=0) 7 mc.set("foo","bar") 8 value = mc.get("foo") 9 print value
看到输出是“bar”就说明已经连上memcached了。下面就要用memcached做mysql的缓存,看性能能提升到什么程度。先改写mysql连接的封装类
1 #!/usr/bin/env python 2 3 # 20140105, conn_mysql.py 4 5 import MySQLdb 6 import memcache 7 import hashlib 8 9 class conn_mysql(object): 10 def __init__(self): 11 print "init mysql" 12 13 def __del__(self): 14 print "quit mysql" 15 16 def connect_db(self): 17 self.conn=MySQLdb.connect(host='192.168.1.6',user='dba',passwd='111111',db='testdb',port=3306) 18 self.cur=self.conn.cursor() 19 self.conn.select_db('testdb') 20 21 def test_select(self): 22 count=self.cur.execute('select * from id_info') 23 print 'there has %s rows record' % count 24 25 result=self.cur.fetchone() 26 print result 27 print 'ID: %s info %s' % result 28 29 results=self.cur.fetchmany(5) 30 for r in results: 31 print r 32 33 print '=='*10 34 self.cur.scroll(0,mode='absolute') 35 36 results=self.cur.fetchall() 37 for r in results: 38 print r[1] 39 40 self.conn.commit() 41 42 def test_count(self, str_sql): 43 count=self.cur.execute(str_sql) 44 # print 'there has %s rows record' % count 45 46 result=self.cur.fetchone() 47 # print 'id_info has %s rows' % result 48 str_rows = '%s' % result 49 return str_rows 50 51 def connect_cache(self): 52 self.mc = memcache.Client(['localhost:11211'],debug=0) 53 54 def test_count_cached(self, str_sql): 55 str_hash = hashlib.md5(str_sql).hexdigest() 56 #str_hash = myhash(str_sql) 57 58 result = self.mc.get(str_hash) 59 if result != None: 60 # str_org_sql = self.mc.get('SQL'+str_hash) 61 # if str_org_sql == str_sql: 62 str_rows = '%s' % result 63 return str_rows 64 65 count = self.cur.execute(str_sql) 66 # print 'there has %s rows record' % count 67 68 result = self.cur.fetchone() 69 self.mc.set(str_hash, result) 70 self.mc.set('SQL'+str_hash, str_sql) 71 # print(str_hash) 72 # print 'id_info has %s rows' % result 73 str_rows = '%s' % result 74 return str_rows 75 76 def disconnect_db(self): 77 self.cur.close() 78 self.conn.close()
增加memcached相关的配置信息,增加测试函数test_count_cached,先对输入的sql做字符串hash(我用的md5),以这个hash值为key去memcached中查找有没有结果,如果有就直接返回;否则再去mysql中查询,并把查询的结果做value,sql的hash值做key,存在memcached中。run这个测试函数
1 #!/usr/bin/env python 2 3 # 20140105, conn_mysql_memcached.py 4 5 import conn_mysql 6 7 str_sql = 'select count(*) from id_info where info like \'bbb%\'' 8 9 10 db_connect = conn_mysql.conn_mysql() 11 db_connect.connect_db() 12 db_connect.connect_cache() 13 14 for i in range(10000): 15 str_rows = db_connect.test_count_cached(str_sql) 16 17 print(str_rows + ' rows selected.') 18 19 db_connect.disconnect_db()
经过改进之后的test_count_cached的运行时间是1.0s,改进并不如我期望的大。可能的原因,我的mysql是装在本机上的,没有网络通讯的开销,一般情况下,mysql是在单独的数据库服务器上,而memcached是在业务服务器上,做一次sql查询是有网络开销的,所以在这种场景下,效果应该会更明显。
实验中还有一个小问题,我在对比sql文本的时候只对比了hash值,并没有对比sql文本本身,如果进行这样的对比,势必会造成性能下降。事实也是如此,我加入这段对比之后,test_count_cached的运行时间变为1.8s,不考虑误差的话,运行时间基本上是刚才的case的2倍。这也很显然,因为主要的开销都是在获取memcached中的结果,加入sql文本对比的同时也多了1次获取memcached结果的消耗。