My application is using Spring Integration for email polling from Outlook mailbox.
我的应用程序使用Spring Integration从Outlook邮箱进行电子邮件轮询。
As, it is receiving the String (email body)from an external system (Outlook), So I have no control over it.
因为它正在接收来自外部系统(Outlook)的字符串(电子邮件正文),所以我无法控制它。
For Example,
例如,
String emailBodyStr= "rejected by sundar14-\u200B.";
Now I am trying to remove the unicode character \u200B from this String.
现在我正在尝试从这个字符串中删除unicode字符\u200B。
What I tried already.
我已经试过了。
Try#1:
尝试# 1:
emailBodyStr = emailBodyStr.replaceAll("\u200B", "");
Try#2:
尝试# 2:
`emailBodyStr = emailBodyStr.replaceAll("\u200B", "").trim();`
Try#3 (using Apache Commons):
试试# 3(使用Apache Commons):
StringEscapeUtils.unescapeJava(emailBodyStr);
Try#4:
尝试# 4:
StringEscapeUtils.unescapeJava(emailBodyStr).trim();
Nothing worked till now.
工作到现在。
When I tried to print this String using below code.
当我尝试用下面的代码打印这个字符串时。
logger.info("Comment BEFORE:{}",emailBodyStr);
logger.info("Comment AFTER :{}",emailBodyStr);
In Eclipse console, it is NOT printing unicode char,
在Eclipse控制台中,它没有打印unicode字符,
Comment BEFORE:rejected by sundar14-.
之前评论:被sundar14-拒绝。
But the same code prints the unicode char in Linux console as below.
但是相同的代码在Linux控制台打印unicode字符如下所示。
Comment BEFORE:rejected by sundar14-\u200B.
评论:sundar14 - \ u200B拒绝了。
I read some examples where str.replace() is recommended, but please note that examples uses javascript, PHP and not Java.
我阅读了一些示例,其中推荐使用string .replace(),但请注意示例使用javascript、PHP而不是Java。
1 个解决方案
#1
3
Finally, I am able to remove 'Zero Width Space' character by using 'Unicode Regex'.
最后,我可以使用“Unicode Regex”删除“零宽度空间”字符。
String plainEmailBody = new String();
plainEmailBody = emailBodyStr.replaceAll("[\\p{Cf}]", "");
Reference to find the category of Unicode characters.
查找Unicode字符类别的引用。
- Character class from Java.
- 从Java字符类。
Character class from Java lists all of these unicode categories.
Java中的字符类列出了所有这些unicode类别。
- Website: http://www.fileformat.info/
- 网站:http://www.fileformat.info/
- Website: http://www.regular-expressions.info/ => Unicode Regular Expressions
- 网站:http://www.regular-expressions.info/ => Unicode正则表达式
Note 1: As I received this string from Outlook Email Body - none of the approaches listed in my question was working.
注1:当我收到Outlook邮件正文中的这个字符串时,我的问题中列出的方法都没有工作。
My application is receiving a String from an external system (Outlook), So I have no control over it.
我的应用程序正在接收来自外部系统(Outlook)的字符串,因此我无法控制它。
Note 2: This SO answer helped me to know about Unicode Regular Expressions .
注意2:这个SO答案帮助我了解Unicode正则表达式。
#1
3
Finally, I am able to remove 'Zero Width Space' character by using 'Unicode Regex'.
最后,我可以使用“Unicode Regex”删除“零宽度空间”字符。
String plainEmailBody = new String();
plainEmailBody = emailBodyStr.replaceAll("[\\p{Cf}]", "");
Reference to find the category of Unicode characters.
查找Unicode字符类别的引用。
- Character class from Java.
- 从Java字符类。
Character class from Java lists all of these unicode categories.
Java中的字符类列出了所有这些unicode类别。
- Website: http://www.fileformat.info/
- 网站:http://www.fileformat.info/
- Website: http://www.regular-expressions.info/ => Unicode Regular Expressions
- 网站:http://www.regular-expressions.info/ => Unicode正则表达式
Note 1: As I received this string from Outlook Email Body - none of the approaches listed in my question was working.
注1:当我收到Outlook邮件正文中的这个字符串时,我的问题中列出的方法都没有工作。
My application is receiving a String from an external system (Outlook), So I have no control over it.
我的应用程序正在接收来自外部系统(Outlook)的字符串,因此我无法控制它。
Note 2: This SO answer helped me to know about Unicode Regular Expressions .
注意2:这个SO答案帮助我了解Unicode正则表达式。