将源ASCII文件转换为JPEG

I publish technical books, in print, PDF, and Kindle/MOBI, with EPUB on the way.

我发布了EPUB技术书籍,印刷版,PDF版和Kindle / MOBI版。

The Kindle does not support monospace fonts, which are kinda useful for source code listings. The only way to do monospace fonts is to convert the text (Java source, HTML, XML, etc.) into JPEG images. More specifically, due to pagination issues, a given input ASCII file needs to be split into slices of ~6 lines each, with each slice turned into a JPEG, so listings can span a screen. This is a royal pain.

Kindle不支持等宽字体,这对源代码列表很有用。执行等宽字体的唯一方法是将文本(Java源,HTML,XML等)转换为JPEG图像。更具体地说,由于分页问题,给定的输入ASCII文件需要被分成每行约6行的片段,每个片段变成JPEG,因此列表可以跨越屏幕。这是一种皇室的痛苦。

My current mechanism to do that involves:

我目前的机制是:

Running expand to set a consistent 2-space tab size, which pipes to...

运行展开以设置一致的2空格选项卡大小,其中包含...

a2ps, which pipes to...

a2ps,管道......

A small Perl snippet to add a "%%LanguageLevel: 3\n" line, which pipes to...

一个小的Perl片段,用于添加“%% LanguageLevel:3 \ n”行,其中包含...

ImageMagick's convert, to take the (E)PS and make a JPEG out it, with an appropriate background, cropped to 575x148+5+28, etc.

ImageMagick的转换,取(E)PS并用适当的背景制作JPEG,裁剪为575x148 + 5 + 28等。

That used to work 100% of the time. It now works 95% of the time. The rest of the time, I get convert: geometry does not contain image errors, which I cannot seem to get rid of, in part because I don't understand what the problem is.

过去曾经100%的工作时间。它现在95%的时间都有效。其余的时间,我得到转换:几何不包含图像错误,我似乎无法摆脱,部分原因是我不明白问题是什么。

Before this process, I used to use a pretty-print engine (source-highlight) to get HTML out of the source code...but then the only thing I could find to convert the HTML into JPEGs was to automate screen-grabs from an embedded Gecko engine. Reliability stank, which is why I switched to my current mechanism.

在此过程之前,我曾经使用漂亮的打印引擎(源代码突出显示)从源代码中获取HTML ...但是我唯一可以找到的将HTML转换为JPEG的方法是自动化屏幕抓取嵌入式Gecko引擎。可靠性发臭,这就是我切换到现有机制的原因。

So, if you were you, and you needed to turn source listings into JPEG images, in an automated fashion, how would you do it? Bonus points if it offers some sort of pretty-print process (e.g., bolded keywords)!

那么,如果你是你,并且你需要以自动方式将源列表转换成JPEG图像,你会怎么做?奖励积分,如果它提供某种漂亮的打印过程(例如,粗体关键字)!

Or, if you know what typically causes convert: geometry does not contain image, that might help. My current process is ugly, but if I could get it back to 100% reliability, that'd be just fine for now.

或者,如果您知道通常导致转换的原因:几何图形不包含图像,这可能有所帮助。我目前的过程很难看,但如果我能恢复到100%的可靠性,那现在就好了。

Thanks in advance!

提前致谢!

5 个解决方案

#1

You might consider html2ps and then imagemagick's convert.

你可以考虑html2ps然后imagemagick的转换。

A thought: if your target (Kindle?) supports PNG, use that in preference to JPEG for this text rendering.

一个想法:如果您的目标(Kindle?)支持PNG,请使用优先于JPEG的文本渲染。

#2

html2ps is an excellent program -- I used it to produce a 1300-page book once, but it's overkill if you just want plain text -> postscript. Consider enscript instead.

html2ps是一个优秀的程序 - 我曾经用它来制作一本1300页的书,但是如果你只想要纯文本 - > postscript就太过分了。请考虑使用enscript。

#3

Because the question of converting HTML to JPG has been answered, I will offer a suggestion on the pretty printer. I've found Pygments to be pretty awesome. It supports different themes and has lexers for pretty much any language out there (they advertise the fact that it even highlights brainfuck). There's a command line tool and it's available on most Linux distros.

因为已经回答了将HTML转换为JPG的问题,所以我将在漂亮的打印机上提出建议。我发现Pygments非常棒。它支持不同的主题,并且几乎可以使用任何语言的词法分析器(它们宣传它甚至突出了brainfuck的事实)。有一个命令行工具,它可以在大多数Linux发行版上使用。

#4

Your Linux distribution may include pango-view and an assortment of fonts. This works on my FC6 system:

您的Linux发行版可能包括pango-view和各种字体。这适用于我的FC6系统:

pango-view --font=DejaVuLGCSansMono --dpi=200 --output=/tmp/text.jpg -q /tmp/text

You'll need to identify a monospaced font that is installed on your system. Look around /usr/share/fonts/.

您需要识别系统上安装的等宽字体。环顾/ usr / share / fonts /。

Pango supports Unicode.

Pango支持Unicode。

Leave off the -q while you're experimenting, it'll display to a window instead of to a file.

在您进行实验时,请关闭-q,它将显示在窗口而不是文件中。

#5

Don't use jpeg. It's optimized for photographs and does a terrible job with text and line art. Use gif or png instead. My understanding is that gif is now patent-free, so I would just use that.

不要使用jpeg。它针对照片进行了优化,并且在文本和线条艺术方面做得非常糟糕。请改用gif或png。我的理解是gif现在没有专利,所以我会用它。

#1