2、主要着手修改parse.py以前是解析了写到postgresql,现在我要改了写到mysql,这个过程让我发现了,mysql和postgresql的诸多的区别,回顾一下
0x01. python用于mysql的库叫MySQLdb,psycopg2十分类似,包括他们的方法也很类似,都有connect、然后用conn生成一个游标cur,然后用cur.execute去执行SQL语句,用commit去提交变化(但是我发现创建表的话mysql不需要调用这个方法,而postgresql则需要),关闭等等方法也是相同的;
0x02. MySQLdb.connect的参数名称略微有一些不同,像这样
MySQLdb.connect(host=options.db_host, port=int(options.db_port), db=options.db_database, user=options.db_user, passwd=options.db_password)
port还必须转成int
psycopg2.connect形如这样:
psycopg2.connect(host=options.db_host, port=options.db_port, database=options.db_database, user=options.db_user, password=options.db_password)
要捕获的错误也不同了,但是都是类似的,MySQLdb对应的异常捕获就是MySQLdb.Error
0x03. 当然两个数据库的系统表上也有一些差异,
select ordinal_position,column_name,data_type from information_schema.columns where table_schema = '%s' and table_name = '%s'information_schema.columns这个表的三个字段在两个数据库都是相同的
0x04. 创建表涉及到字段类型和自增字段的设置等有比较多的不同,postgresql是这样
create table {table_schema}.{table_name} (
id serial primary key,
created_at timestamp with time zone default now() not null,
national_id text not null,
requested_at timestamp with time zone not null,
responded_at timestamp with time zone not null,
filename text
);
create index {table_schema}_{table_name}__national_id on {table_schema}.{table_name} (national_id);
create index {table_schema}_{table_name}__created_at on {table_schema}.{table_name} (created_at);
create index {table_schema}_{table_name}__filename on {table_schema}.{table_name} (filename);
create index {table_schema}_{table_name}__national_id__responded_at on {table_schema}.{table_name} (national_id, responded_at);
create index {table_schema}_{table_name}__national_id__requested_at on {table_schema}.{table_name} (national_id, requested_at);
改成mysql对应是这样
create table {table_schema}.{table_name} (
id int auto_increment primary key,
created_at timestamp not null default now(),
national_id VARCHAR(20) not null,
requested_at datetime not null,
responded_at datetime not null,
filename VARCHAR(100)
);
create index {table_schema}_{table_name}__national_id on {table_schema}.{table_name} (national_id);
create index {table_schema}_{table_name}__created_at on {table_schema}.{table_name} (created_at);
create index {table_schema}_{table_name}__filename on {table_schema}.{table_name} (filename);
create index {table_schema}_{table_name}__national_id__responded_at on {table_schema}.{table_name} (national_id, responded_at);
create index {table_schema}_{table_name}__national_id__requested_at on {table_schema}.{table_name} (national_id, requested_at);
可以看到serial = auto_increment , timestamp with time zone = timestamp ,一般来说mysql表中只有一个timestamp字段,听说是系统缺省取当前时间,我并没测试,其他又含日期和时间的字段就用datetime,mysql中也有test这个字段,但是一般是存大文本的,而且我偷懒用text建立索引的时候会报错,原因不明,我最后只好用更合适的varchar跟长度,not null仍然是not null,default也是相同
对于另外一个系统information_schema.tables,mysql有各个表的类型,如base_table,但是没有这个表是否可写入的字段,我直接省去了这个检测,只检测teble_type,两个数据库,我昨天发现的不同以上!