无意中发现,插入的字符中含有 emoji时,无法成功插入mysql数据库(DB使用 utf8编码).
由于我只有使用来显示微信公众号文章标题,不是很在意,所以可以过滤.
可以使用下列方法过滤掉表情.
- /**
- * 过滤emoji图标
- * @param $oriStr
- * @return string
- */
- public static function remove_emoji($oriStr) {
- $regex = '/(\\\u[ed][0-9a-f]{3})/i';
- $oriStr = json_encode($oriStr);
- $oriStr= preg_replace($regex, '', $oriStr);
- return json_decode($oriStr);
- }
public static function removeEmoji($text) {
$clean_text = "";
// Match Emoticons
$regexEmoticons = '/[\x{1F600}-\x{1F64F}]/u';
$clean_text = preg_replace($regexEmoticons, '', $text);
// Match Miscellaneous Symbols and Pictographs
$regexSymbols = '/[\x{1F300}-\x{1F5FF}]/u';
$clean_text = preg_replace($regexSymbols, '', $clean_text);
// Match Transport And Map Symbols
$regexTransport = '/[\x{1F680}-\x{1F6FF}]/u';
$clean_text = preg_replace($regexTransport, '', $clean_text);
// Match Miscellaneous Symbols
$regexMisc = '/[\x{2600}-\x{26FF}]/u';
$clean_text = preg_replace($regexMisc, '', $clean_text);
// Match Dingbats
$regexDingbats = '/[\x{2700}-\x{27BF}]/u';
$clean_text = preg_replace($regexDingbats, '', $clean_text);
return $clean_text;
}
其它参考书的方法:
function remove_emoji($text){
return preg_replace('/([0-9#][\x{20E3}])|[\x{00ae}\x{00a9}\x{203C}\x{2047}\x{2048}\x{2049}\x{3030}\x{303D}\x{2139}\x{2122}\x{3297}\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u', '', $text);
}
function removeEmojis( $string ) {
$string = str_replace( "?", "{%}", $string );
$string = mb_convert_encoding( $string, "ISO-8859-1", "UTF-8" );
$string = mb_convert_encoding( $string, "UTF-8", "ISO-8859-1" );
$string = str_replace( array( "?", "? ", " ?" ), array(""), $string );
$string = str_replace( "{%}", "?", $string );
return trim( $string );
}
Explanation:
convert the string from utf-8 to iso-8859-1
return back to utf-8 (mb_ function replace invalid characters to ''?''remove non-valid characters )
Replace ? to none
Return back the ''?'' character from the original string
Make sure you are using UTF-8 to work.
其实已经有个开源转换程序了。
http://code.iamcal.com/php/emoji/
https://github.com/iamcal/php-emoji
本文地址: http://blog.csdn.net/aerchi/article/details/68485987