敏感词过滤

时间:2019-12-06 03:33:55
【文件属性】:

文件名称:敏感词过滤

文件大小:4KB

文件格式:RAR

更新时间:2019-12-06 03:33:55

脏话 敏感词 聊天 过滤

Author: Richard Zhang. Mail: 89205975@qq.com This library filters sensitive phrases by user's configuration. Currently, only support UTF8 & ANSI encoded strings. The matching rule is max-length-matching, the library tries to match sensitive phrase as long as possible. For example: "damn fucker" and "damn" are all in sensitive dictionary, the sentence "he's a damn fucker" will be processed to "he's a ***********". Even user insert some spaces or non-letter characters between sensitive words, the library is also able to deal with it. For example: "Bad boy" is added to sensitive dictionary, "Bad.boy", "Bad boy", "Bad/boy" can also be filtered. "你去死" is added to sensitive dictionary, "你 去 死", "你/去 死", "你 去 .死" can also be filtered. Compiling requirement: 1. STL C++11 2. BOOST multi_index_container Performance test condition: 1. Giving a sentence around 100 bytes (English & Chinese mixed) 2. Dirty phrases around 10,000 3. Do 1,000 loop test 4. Intel I7 CPU Test result: For each loop, it cost around 100us


【文件预览】:
PhraseFilter.h
PhraseFilter.cpp

网友评论