I have a HTML Table of Contents page containing list of book chapters with hyperlinks:
我有一个HTML目录页面,其中包含带有超链接的书籍章节列表:
<a href="final/main.html">Multimedia Implementation</a><br/>
<a href="final/toc.html">Table of Contents</a><br/>
<a href="final/pref01.html">About the Author</a><br/>
<a href="final/pref02.html">About the Technical Reviewers</a><br/>
<a href="final/pref03.html">Acknowledgments</a><br/>
<a href="final/part01.html">Part I: Introduction and Overview</a><br/>
<a href="final/ch01.html">Chapter 1. Technical Overview</a><br/>
...
I want create NCX file for a Kindle book which must contain details as follows:
我想为Kindle书创建NCX文件,其中必须包含以下详细信息:
<navPoint id="n1" playOrder="1">
<navLabel>
<text>Multimedia Implementation</text>
</navLabel>
<content src="final/main.html"/>
</navPoint>
<navPoint id="n2" playOrder="2">
<navLabel>
<text>Table of Contents</text>
</navLabel>
<content src="final/toc.html"/>
</navPoint>
<navPoint id="n3" playOrder="3">
<navLabel>
<text>About the Author</text>
</navLabel>
<content src="final/pref01.html"/>
</navPoint>
...
I'm using Notepad++: is it possible automate this process with regular expression?
我正在使用Notepad ++:是否可以使用正则表达式自动执行此过程?
2 个解决方案
#1
1
You cannot do everything using regex.. you can split the problem into two parts..
你无法使用正则表达式做任何事情..你可以将问题分成两部分..
- generate strings like
<navPoint id="n1" playOrder="1">
using program logic (increment variable) - remaining you can do with regex
使用程序逻辑(增量变量)生成
剩下的你可以用正则表达式做
Use the following regex to match:
使用以下正则表达式匹配:
<a\shref="([^"]*)">([^<]*)<\/a><br\/>
And replace with:
并替换为:
(generated string)<navLabel>\n<text>\2</text>\n<content src="\1"/>\n</navPoint>
See DEMO
#2
0
Yes, it is possibly to replace the links with <navpoint>
tags. The only thing I found no solution for is the incremental numbering of the <navpoint>
attributes id
and playOrder
...
是的,可能用
The following regex will do most of the work:
以下正则表达式将完成大部分工作:
/^<a[^>]*href="([^"]+)"[^>]*([^<]+).*$/gm
substitute with:
<navpoint id="n" playOrder="">\n<navLabel><text>$2</text></navLabel>\n<content src="$1" />\n</navpoint>\n
Regex details
/^<a .. only parse lines that start with an `<a` tag
.*href=" .. find the first occurance of `href="`
([^"]+) .. capture the text and stop when a " is found
"[^>]*> .. find the end of the <a> tag
([^<]+) .. capture the text and stop when a < is found (i.e. the </a> tag)
.*$/ .. continue to end of the line
gm .. search the whole string and parse each line individually
More detailled (but also more confusing) explanation is here: https://regex101.com/r/gA0yJ2/1 This link also demonstrates how the regex is working. You can test changes there if you like
更详细(但也更令人困惑)的解释如下:https://regex101.com/r/gA0yJ2/1此链接还演示了正则表达式的工作原理。如果您愿意,可以在那里测试更改
#1
1
You cannot do everything using regex.. you can split the problem into two parts..
你无法使用正则表达式做任何事情..你可以将问题分成两部分..
- generate strings like
<navPoint id="n1" playOrder="1">
using program logic (increment variable) - remaining you can do with regex
使用程序逻辑(增量变量)生成
剩下的你可以用正则表达式做
Use the following regex to match:
使用以下正则表达式匹配:
<a\shref="([^"]*)">([^<]*)<\/a><br\/>
And replace with:
并替换为:
(generated string)<navLabel>\n<text>\2</text>\n<content src="\1"/>\n</navPoint>
See DEMO
#2
0
Yes, it is possibly to replace the links with <navpoint>
tags. The only thing I found no solution for is the incremental numbering of the <navpoint>
attributes id
and playOrder
...
是的,可能用
The following regex will do most of the work:
以下正则表达式将完成大部分工作:
/^<a[^>]*href="([^"]+)"[^>]*([^<]+).*$/gm
substitute with:
<navpoint id="n" playOrder="">\n<navLabel><text>$2</text></navLabel>\n<content src="$1" />\n</navpoint>\n
Regex details
/^<a .. only parse lines that start with an `<a` tag
.*href=" .. find the first occurance of `href="`
([^"]+) .. capture the text and stop when a " is found
"[^>]*> .. find the end of the <a> tag
([^<]+) .. capture the text and stop when a < is found (i.e. the </a> tag)
.*$/ .. continue to end of the line
gm .. search the whole string and parse each line individually
More detailled (but also more confusing) explanation is here: https://regex101.com/r/gA0yJ2/1 This link also demonstrates how the regex is working. You can test changes there if you like
更详细(但也更令人困惑)的解释如下:https://regex101.com/r/gA0yJ2/1此链接还演示了正则表达式的工作原理。如果您愿意,可以在那里测试更改