使用Notepad ++和Regular表达式创建NCX文件

时间:2022-09-13 23:36:51

I have a HTML Table of Contents page containing list of book chapters with hyperlinks:


<a href="final/main.html">Multimedia Implementation</a><br/>
<a href="final/toc.html">Table of Contents</a><br/>
<a href="final/pref01.html">About the Author</a><br/>
<a href="final/pref02.html">About the Technical Reviewers</a><br/>
<a href="final/pref03.html">Acknowledgments</a><br/>
<a href="final/part01.html">Part I: Introduction and Overview</a><br/>
<a href="final/ch01.html">Chapter 1. Technical Overview</a><br/>

I want create NCX file for a Kindle book which must contain details as follows:


<navPoint id="n1" playOrder="1">
<text>Multimedia Implementation</text>
<content src="final/main.html"/>
<navPoint id="n2" playOrder="2">
<text>Table of Contents</text>
<content src="final/toc.html"/>
<navPoint id="n3" playOrder="3">
<text>About the Author</text>
<content src="final/pref01.html"/>

I'm using Notepad++: is it possible automate this process with regular expression?

我正在使用Notepad ++:是否可以使用正则表达式自动执行此过程?

2 个解决方案



You cannot do everything using regex.. you can split the problem into two parts..


  • generate strings like <navPoint id="n1" playOrder="1"> using program logic (increment variable)
  • 使用程序逻辑(增量变量)生成 等字符串

  • remaining you can do with regex
  • 剩下的你可以用正则表达式做

Use the following regex to match:



And replace with:


(generated string)<navLabel>\n<text>\2</text>\n<content src="\1"/>\n</navPoint>




Yes, it is possibly to replace the links with <navpoint> tags. The only thing I found no solution for is the incremental numbering of the <navpoint> attributes id and playOrder...

是的,可能用 标签替换链接。我唯一没有找到解决方案的是 属性id和playOrder的增量编号......

The following regex will do most of the work:



substitute with:

<navpoint id="n" playOrder="">\n<navLabel><text>$2</text></navLabel>\n<content src="$1" />\n</navpoint>\n

Regex details

/^<a     .. only parse lines that start with an `<a` tag
.*href=" .. find the first occurance of `href="`
([^"]+)  .. capture the text and stop when a " is found
"[^>]*>  .. find the end of the <a> tag
([^<]+)  .. capture the text and stop when a < is found (i.e. the </a> tag)
.*$/     .. continue to end of the line
gm       .. search the whole string and parse each line individually

More detailled (but also more confusing) explanation is here: https://regex101.com/r/gA0yJ2/1 This link also demonstrates how the regex is working. You can test changes there if you like




You cannot do everything using regex.. you can split the problem into two parts..


  • generate strings like <navPoint id="n1" playOrder="1"> using program logic (increment variable)
  • 使用程序逻辑(增量变量)生成 等字符串

  • remaining you can do with regex
  • 剩下的你可以用正则表达式做

Use the following regex to match:



And replace with:


(generated string)<navLabel>\n<text>\2</text>\n<content src="\1"/>\n</navPoint>




Yes, it is possibly to replace the links with <navpoint> tags. The only thing I found no solution for is the incremental numbering of the <navpoint> attributes id and playOrder...

是的,可能用 标签替换链接。我唯一没有找到解决方案的是 属性id和playOrder的增量编号......

The following regex will do most of the work:



substitute with:

<navpoint id="n" playOrder="">\n<navLabel><text>$2</text></navLabel>\n<content src="$1" />\n</navpoint>\n

Regex details

/^<a     .. only parse lines that start with an `<a` tag
.*href=" .. find the first occurance of `href="`
([^"]+)  .. capture the text and stop when a " is found
"[^>]*>  .. find the end of the <a> tag
([^<]+)  .. capture the text and stop when a < is found (i.e. the </a> tag)
.*$/     .. continue to end of the line
gm       .. search the whole string and parse each line individually

More detailled (but also more confusing) explanation is here: https://regex101.com/r/gA0yJ2/1 This link also demonstrates how the regex is working. You can test changes there if you like
