I need to display a word doc on a webpage. I am using a library named Docx4j to convert .doc to html. This is working fine. But, I'm getting the hyperlinks in the below format.
我需要在网页上显示word doc。我正在使用一个名为Docx4j的库将.doc转换为html。这是工作正常。但是,我得到的超链接是下面的格式。
To search on google go to this link [#?] HYPERLINK \"http://www.google.com/\" [#?][#?] google[#?] and type the text.
I'm able to convert it to
我可以把它转换成
To search on google go to this link (http://www.google.com) google and type the text.
using the below code
使用以下代码
String myText = "To search on google go to this link [#?] HYPERLINK \"http://www.google.com/\" [#?][#?] google[#?] and type the text.";
System.out.println(myText);
String firstReplace = myText.replaceAll("\\[", "").replaceAll("\\]", "").replaceAll("#\\?", "");
System.out.println(firstReplace);
String secondReplace = firstReplace.replaceAll("HYPER\\S+\\s+\"", "(");
System.out.println(secondReplace);
String finalReplace = secondReplace.replaceAll("/*\".", ")");
System.out.println("\n" + finalReplace);
Can someone please provide me a regex to convert the above string to
是否可以提供一个regex将上面的字符串转换为
To search on google go to this link google (http://www.google.com) and type the text.
--EDIT--
——编辑
There are some links which show up as
有一些链接显示为
[#?] HYPERLINK \"http://www.google.com/\" [#?][#?] google page[#?]
I should change them to
我应该把它们换成
google page (http://www.google.com)
How do I do this?
我该怎么做呢?
2 个解决方案
#1
2
You can use a group reference to match the word google
which comes after the parenthesis.
您可以使用组引用来匹配圆括号后面的单词谷歌。
You can replace the result of following regex:
您可以替换以下regex的结果:
'(\([^)]*\))\s?(\w+)'
With following :
后:
'$2 $1'
You can use str.replaceAll()
function for this aim.
您可以为此目的使用string . replaceall()函数。
Elaboration:
细化:
The first capture group (\([^)]*\))
will match the part between parenthesis, [^)]*
is a negated character class which match any combination of characters except closing parenthesis.
第一个捕获组(\((^))* \))将匹配括号之间的部分,(^))*否定字符类,除了关闭括号匹配的任意组合字符。
And the second one (\w+)
will match the words after that part, \w+
will match any combination of word characters.
第二个(\w+)将匹配后面的单词,\w+将匹配任何组合的单词字符。
#2
0
Removing the [#?] markers as early as you do in your question, means that you lose essential information to make the required text adjustments later. The basic template of your input is:
删除(# ?[英语背诵文选在你回答问题的时候就做记号,这就意味着你在以后做必要的文本调整时失去了必要的信息。你输入的基本模板是:
[#?] HYPERLINK *target* [#?] [#?] *clickable textual description of link* [#?]
So why don't you use those markers to your advantage?
所以你为什么不利用这些标记来为自己谋利呢?
Some regexp like this (NOTE: not tested, probably wrong, but just to give you the basic idea):
类似这样的一些regexp(注意:没有经过测试,可能是错误的,但只是为了让您了解基本的思想):
mystring.replaceAll("\\[#\\?\\] HYPERLINK (.*) \\[#\\?\\] \\[#\\?\\] (.*) \\[#\\?\\]", "$2 ($1)");
The above is designed to give you "google page (http://www.google.com)". But I would also question why you want to display it like that. Normally for HTML web pages you want it to be <a href="http://www.google.com">google page</a>
. To do that, just change the above code.
上面的设计是为了给你“谷歌页面(http://www.google.com)”。但我也会问你为什么要这样显示。通常对于HTML网页,你希望它是谷歌页面。要做到这一点,只需修改上面的代码。
#1
2
You can use a group reference to match the word google
which comes after the parenthesis.
您可以使用组引用来匹配圆括号后面的单词谷歌。
You can replace the result of following regex:
您可以替换以下regex的结果:
'(\([^)]*\))\s?(\w+)'
With following :
后:
'$2 $1'
You can use str.replaceAll()
function for this aim.
您可以为此目的使用string . replaceall()函数。
Elaboration:
细化:
The first capture group (\([^)]*\))
will match the part between parenthesis, [^)]*
is a negated character class which match any combination of characters except closing parenthesis.
第一个捕获组(\((^))* \))将匹配括号之间的部分,(^))*否定字符类,除了关闭括号匹配的任意组合字符。
And the second one (\w+)
will match the words after that part, \w+
will match any combination of word characters.
第二个(\w+)将匹配后面的单词,\w+将匹配任何组合的单词字符。
#2
0
Removing the [#?] markers as early as you do in your question, means that you lose essential information to make the required text adjustments later. The basic template of your input is:
删除(# ?[英语背诵文选在你回答问题的时候就做记号,这就意味着你在以后做必要的文本调整时失去了必要的信息。你输入的基本模板是:
[#?] HYPERLINK *target* [#?] [#?] *clickable textual description of link* [#?]
So why don't you use those markers to your advantage?
所以你为什么不利用这些标记来为自己谋利呢?
Some regexp like this (NOTE: not tested, probably wrong, but just to give you the basic idea):
类似这样的一些regexp(注意:没有经过测试,可能是错误的,但只是为了让您了解基本的思想):
mystring.replaceAll("\\[#\\?\\] HYPERLINK (.*) \\[#\\?\\] \\[#\\?\\] (.*) \\[#\\?\\]", "$2 ($1)");
The above is designed to give you "google page (http://www.google.com)". But I would also question why you want to display it like that. Normally for HTML web pages you want it to be <a href="http://www.google.com">google page</a>
. To do that, just change the above code.
上面的设计是为了给你“谷歌页面(http://www.google.com)”。但我也会问你为什么要这样显示。通常对于HTML网页,你希望它是谷歌页面。要做到这一点,只需修改上面的代码。