node以更快的搜索速度嵌入数据库

Actually im using SQlite to store some list of "files" with some attributes, no relational data only one file with multiple data, so i think any relational or nosql DB is valid for me. The problem right now is the Speed of the searchs, I need to embed the db using Node and storing a file in the project folder, actually im using node sqlite3, but i tested yerterday the better-sqlite3 module to, and the results are similar.

实际上我使用SQlite存储一些具有一些属性的“文件”列表,没有关系数据只有一个文件有多个数据,所以我认为任何关系或nosql数据库对我有效。现在的问题是搜索的速度,我需要使用Node嵌入数据库并将文件存储在项目文件夹中,实际上我使用的是节点sqlite3,但我测试了更好的-sqlite3模块,结果类似。

My table structure is like this:

我的表结构是这样的:

╔════╦══════════════╦══════════╦════════════════╦═════════╦═══════════════╦══════════════════╗
║ id ║     hash     ║   name   ║   description  ║   date  ║      tags     ║    languages     ║
╠════╬══════════════╬══════════╣════════════════╣═════════╣═══════════════╣══════════════════╣
║INT ║     TEXT     ║   TEXT   ║     TEXT       ║  NUMBER ║     JSON      ║       JSON       ║
║  2 ║ b2b22b2b2bb2 ║ two test ║  lorem ipsum b ║ 1233123 ║ ["d","e","f"] ║ ["ko","en","tk"] ║
║  3 ║ asdasdasdsad ║ 333 test ║  lorem ipsum c ║ 1233123 ║ ["a","d","c"] ║ ["es","de","fr"] ║
║  4 ║ 4s342s423424 ║ 444 test ║  lorem ipsum d ║ 1233123 ║ ["a","b","g"] ║ ["es","pt","fr"] ║
╚════╩══════════════╩══════════╩════════════════╩═════════╩═══════════════╩══════════════════╝

The results with arround 300.000 rows are:

arround 300.000行的结果是:

Select * from files WHERE name LIKE "%string%" : 300ms

从文件中选择* WHERE命名LIKE“%string%”:300ms

select * from files WHERE (tags LIKE '"music"' OR tags LIKE '"banana"') AND (languages LIKE '"esp"' OR languages LIKE '"ger"') : 400ms

select * from files WHERE(标签LIKE'“music”'OR tags LIKE'“banana”')和(语言LIKE'“esp”'或者语言LIKE'“ger”'):400ms

select id from files : 130ms (try with "select count(id) as counter FROM files", its slower that make this and count the results arround 30ms vs 150ms)

从文件中选择id:130ms(尝试使用“select count(id)作为计数器FROM文件”,其速度较慢,并计算结果为30ms vs 150ms)

The results are not bad... but here is only one search operation, and my program allow multiple users search at same time, so the search times become unaceptable. (10 clients, ~4 seconds per reply) Im running the test in a Core i7 4820K, with 500Gb SSD (550R/450W) move to the HDD RAID0 increase a lot the query times

结果还不错......但这里只有一个搜索操作,我的程序允许多个用户同时搜索,因此搜索时间变得难以辨认。 (10个客户端,每个回复约4秒)我在Core i7 4820K中运行测试,用500Gb SSD(550R / 450W)移动到HDD RAID0增加了很多查询次数

I try to create indexx per every search column, the inserts in this project are ocassional so i dont care to much about the insert speed, but is weird because put an index in the name, tags or languages fields not improbe to much the speed (arround 50ms only, but increase the table size a lot obviously)

我尝试为每个搜索列创建indexx,这个项目中的插入是ocassional,所以我不太关心插入速度,但是很奇怪,因为在名称,标签或语言字段中放置一个索引不会有很大的速度(仅限50ms,但显然增加了表格大小)

So.. im looking alternatives, i need a node embed DB with extreme search speed and no DB locking (i think with time the DB can grow to 2M rows), but without consume enormous quantity of memory, dont care about if is relational or not.

所以..我正在寻找替代方案,我需要一个节点嵌入数据库具有极高的搜索速度和没有数据库锁定(我认为随着时间的推移,数据库可以增长到2M行),但不消耗大量的内存,不关心是否是关系或不。

EDIT: Im making many many tests and here are my results:

编辑:我做了很多测试,这是我的结果:

For node-lmdb the creation speed is insanelly fast, like work with memcache, arroung 100.000 inserts in 4 seconds, and read the data is working good, but because is a key-value Database I need to transform the given data in JSON and then make the "search" logic, and this decrease the results a lot, here is a example code:

对于node-lmdb,创建速度非常快,就像使用memcache一样,在4秒内完成100,000次插入,并且读取数据工作正常,但因为是一个键值数据库我需要在JSON中转换给定数据然后制作“搜索”逻辑,这会减少很多结果,这里是一个示例代码:

const crypto = require('crypto')
const lmdb = require('node-lmdb')

const env = new lmdb.Env()

env.open({
    path: __dirname + "/mydata",
    mapSize: 2*1024*1024*1024, // maximum database size 
    maxDbs: 3
})

var dbi = env.openDbi({
    name: "myPrettyDatabase",
    create: true // will create if database did not exist 
})

// Begin transaction
var txn = env.beginTxn()

let t0 = new Date().getTime()


// Create cursor
let cursor = new lmdb.Cursor(txn, dbi)
let counter = 0
let find = 0

for (var found = cursor.goToFirst(); found !== null; found = 
cursor.goToNext()) {
    cursor.getCurrentString(function(key, data) {

        let js
        try {
            js = JSON.parse(data)
            counter++
        } catch (e) { js = null }

        if (js && String(js.name).indexOf('Lorem') !== -1) {
            find++
        }
    })
}

console.log('counter: ' + counter)
console.log('find: ' + find)

// Close cursor
cursor.close();


let t1 = new Date().getTime()
console.log('time: ' + (t1-t0))


// Commit transaction
txn.commit()

dbi.close()

The results are:

结果是:

$ node index.js counter: 215548 find: 113073 time: 1516

$ node index.js counter:215548 find:113073 time:1516

The list speed is arround 200ms but the JSON conversion and the litle "search" logic slow down under sqlite speed (or im doing something wrong)

列表速度是200ms,但JSON转换和litle“搜索”逻辑在sqlite速度下慢下来(或者我做错了)

I do other experiments with Tingodb, that is a embed DB but with a system like MongoDB, I insert 200K object like this:

我用Tingodb进行了其他实验,这是一个嵌入式数据库,但是像MongoDB这样的系统,我插入200K这样的对象:

{ hash: '3736b5da857a4c7b9b046f326004803a',
  name: 'inia, looked up one of the more obscure Latin words, consectetur, from a Lorem I',
  description: ', looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of ',
  tags: [ 'pc', 'pc', 'hd', 'mp4' ],
  languages: [ 'fre', 'jap', 'deu' ] }

The insertions was incredible, arround 100K in 2 ces, but... here is the experiment:

插入是令人难以置信的,2 ces围绕100K,但......这是实验:

const Db = require('tingodb')({cacheSize: 60000, cacheMaxObjSize: 4096}).Db
const assert = require('assert')
const crypto = require('crypto')
var db = new Db('./', {})

// Fetch a collection to insert document into 
var collection = db.collection("batch_document_insert_collection_safe")

let t0 = new Date().getTime()
collection.find({ tags: 'video' }).toArray(function(err, docs) {
    console.log(err)  
    console.log(docs)  
    let t1 = new Date().getTime()
    console.log('time: ' + (t1-t0))
})

Running this in the 200K DB cost a total of 38 SECONDSs dont know if is normal or not...

以200K DB的成本运行,总共38个SECONDSs不知道是否正常......

And about aladb, I test it and work well, I make another experiments (right now i dont have) and the performance is good and is similar to sqlite3 with some nice things, but is in some searchs is like 2x slower than sqlite (using LIKE %string% kill the engine).

关于aladb,我测试它并且运行良好,我做了另一个实验(现在我没有)并且性能很好并且类似于sqlite3有一些不错的东西,但是在某些搜索中比sqlite慢2倍(使用LIKE%string%kill the engine)。

EDIT 2: after many research and tests using ab command (ab -n 10000 -c 50 http://machine.tst:13375/library/search?tags=lorem) in linux machine to simulate multiple request, I finally keep using the sqlite3 library but creating one aditional table at start (memory one) and storing the request responses processed in the table (id(INT), hash(VARCHAR), object(TEXT), last(NUMBER)).

编辑2:经过多次研究和测试,在linux机器上使用ab命令(ab -n 10000 -c 50 http://machine.tst:13375 / library / search?tags = lorem)来模拟多个请求,我终于继续使用了sqlite3库,但在start(内存一)创建一个aditional表并存储在表中处理的请求响应(id(INT),hash(VARCHAR),object(TEXT),last(NUMBER))。

The first time I use the request data to create a unique hash ("GET" + "/a/b/c" + JSON(requestData)) and json encode the response, now the first time the query continue return at normal speed, but the next one is like use memcache or similar DB, and now I get from 10 request/s to ~450 request/s with 10% CPU usage.

我第一次使用请求数据创建一个唯一的哈希(“GET”+“/ a / b / c”+ JSON(requestData))和json编码响应,现在第一次查询继续以正常速度返回,但下一个就像使用memcache或类似的DB,现在我从10个请求/秒到~450个请求/秒,CPU使用率为10%。

Anyway I do a watcher event that check the "last" column of the cached lines to remove old request and preven memory problems, I check that only one request have the params changed multiple times, all other request are the same always, so i think the memory usage dont grow to much.

无论如何,我做一个观察器事件,检查缓存行的“最后”列,以删除旧请求和防止内存问题,我检查只有一个请求有多次更改参数,所有其他请求总是相同,所以我认为内存使用量不会增长太多。

If in the future if I found some better embed options than the sqlite3 I try and change the DB engine

如果将来如果我找到了比sqlite3更好的嵌入选项,我会尝试更改数据库引擎

1 个解决方案

#1

Try LMDB with Node-LMDB

尝试使用Node-LMDB的LMDB

The performance is quite good and for your use case it looks to be an ideal one. I could achieve 1,000,000 rows/sec per client.

性能非常好,对于您的用例,它看起来是理想的。我可以为每个客户端实现1,000,000行/秒。

#1