如何在Java中从字符串中删除\u200B(零长度空白Unicode字符)?

时间:2021-07-27 20:16:52

My application is using Spring Integration for email polling from Outlook mailbox.

我的应用程序使用Spring Integration从Outlook邮箱进行电子邮件轮询。

As, it is receiving the String (email body)from an external system (Outlook), So I have no control over it.

因为它正在接收来自外部系统(Outlook)的字符串(电子邮件正文),所以我无法控制它。

For Example,

例如,

String emailBodyStr= "rejected by sundar14-\u200B.";

Now I am trying to remove the unicode character \u200B from this String.

现在我正在尝试从这个字符串中删除unicode字符\u200B。

What I tried already.

我已经试过了。

Try#1:

尝试# 1:

emailBodyStr = emailBodyStr.replaceAll("\u200B", "");

Try#2:

尝试# 2:

`emailBodyStr = emailBodyStr.replaceAll("\u200B", "").trim();`

Try#3 (using Apache Commons):

试试# 3(使用Apache Commons):

StringEscapeUtils.unescapeJava(emailBodyStr);

Try#4:

尝试# 4:

StringEscapeUtils.unescapeJava(emailBodyStr).trim();

Nothing worked till now.

工作到现在。

When I tried to print this String using below code.

当我尝试用下面的代码打印这个字符串时。

logger.info("Comment BEFORE:{}",emailBodyStr);
logger.info("Comment AFTER :{}",emailBodyStr);

In Eclipse console, it is NOT printing unicode char,

在Eclipse控制台中,它没有打印unicode字符,

Comment BEFORE:rejected by sundar14-​.

之前评论:被sundar14-拒绝。

But the same code prints the unicode char in Linux console as below.

但是相同的代码在Linux控制台打印unicode字符如下所示。

Comment BEFORE:rejected by sundar14-\u200B.

评论:sundar14 - \ u200B拒绝了。

I read some examples where str.replace() is recommended, but please note that examples uses javascript, PHP and not Java.

我阅读了一些示例,其中推荐使用string .replace(),但请注意示例使用javascript、PHP而不是Java。

1 个解决方案

#1


3  

Finally, I am able to remove 'Zero Width Space' character by using 'Unicode Regex'.

最后,我可以使用“Unicode Regex”删除“零宽度空间”字符。

String plainEmailBody = new String();
plainEmailBody = emailBodyStr.replaceAll("[\\p{Cf}]", "");

Reference to find the category of Unicode characters.

查找Unicode字符类别的引用。

  1. Character class from Java.
  2. 从Java字符类。

Character class from Java lists all of these unicode categories.

Java中的字符类列出了所有这些unicode类别。

如何在Java中从字符串中删除\u200B(零长度空白Unicode字符)?

  1. Website: http://www.fileformat.info/
  2. 网站:http://www.fileformat.info/

如何在Java中从字符串中删除\u200B(零长度空白Unicode字符)?

  1. Website: http://www.regular-expressions.info/ => Unicode Regular Expressions
  2. 网站:http://www.regular-expressions.info/ => Unicode正则表达式

如何在Java中从字符串中删除\u200B(零长度空白Unicode字符)?

Note 1: As I received this string from Outlook Email Body - none of the approaches listed in my question was working.

注1:当我收到Outlook邮件正文中的这个字符串时,我的问题中列出的方法都没有工作。

My application is receiving a String from an external system (Outlook), So I have no control over it.

我的应用程序正在接收来自外部系统(Outlook)的字符串,因此我无法控制它。

Note 2: This SO answer helped me to know about Unicode Regular Expressions .

注意2:这个SO答案帮助我了解Unicode正则表达式。

#1


3  

Finally, I am able to remove 'Zero Width Space' character by using 'Unicode Regex'.

最后,我可以使用“Unicode Regex”删除“零宽度空间”字符。

String plainEmailBody = new String();
plainEmailBody = emailBodyStr.replaceAll("[\\p{Cf}]", "");

Reference to find the category of Unicode characters.

查找Unicode字符类别的引用。

  1. Character class from Java.
  2. 从Java字符类。

Character class from Java lists all of these unicode categories.

Java中的字符类列出了所有这些unicode类别。

如何在Java中从字符串中删除\u200B(零长度空白Unicode字符)?

  1. Website: http://www.fileformat.info/
  2. 网站:http://www.fileformat.info/

如何在Java中从字符串中删除\u200B(零长度空白Unicode字符)?

  1. Website: http://www.regular-expressions.info/ => Unicode Regular Expressions
  2. 网站:http://www.regular-expressions.info/ => Unicode正则表达式

如何在Java中从字符串中删除\u200B(零长度空白Unicode字符)?

Note 1: As I received this string from Outlook Email Body - none of the approaches listed in my question was working.

注1:当我收到Outlook邮件正文中的这个字符串时,我的问题中列出的方法都没有工作。

My application is receiving a String from an external system (Outlook), So I have no control over it.

我的应用程序正在接收来自外部系统(Outlook)的字符串,因此我无法控制它。

Note 2: This SO answer helped me to know about Unicode Regular Expressions .

注意2:这个SO答案帮助我了解Unicode正则表达式。