In webspiders/crawlers how can i get the actual initial rendered size of the font a user sees in an HTML document, keeping CSS in mind.
在webspiders / crawlers中,我如何获得用户在HTML文档中看到的字体的实际初始渲染大小,同时牢记CSS。
2 个解决方案
#1
Rendered text size? A user can change the text size at will using his/her browser settings. Not to mention that different browsers render the same content slightly differently.
呈现文字大小?用户可以使用他/她的浏览器设置随意更改文本大小。更不用说不同的浏览器会略微不同地呈现相同的内容。
#2
If you are satisfied with the answer being for the 'default', no user customization view for this purpose (which seems likely), I believe you are looking at a fairly painful scenario:
如果您对“默认”的答案感到满意,没有用于此目的的用户自定义视图(这似乎很可能),我相信您正在寻找一个相当痛苦的场景:
-
Embed a rendering engine with CSS support in your spider. Prefer the use of an engine which matches most of your users, or alternatively use all three common engines and store the information for all of them. The ease of embedding varies widely on your consuming technology.
在您的蜘蛛中嵌入一个CSS支持的渲染引擎。更喜欢使用与大多数用户匹配的引擎,或者使用所有三种常用引擎并存储所有这些引擎的信息。嵌入的简易性因您的消费技术而异。
-
Load the URI being spidered in the rendering engine(s).
将正在蜘蛛网中的URI加载到渲染引擎中。
-
Using the engine's API, query it's font metrics for an element containing what you consider representative text (choosing this is an exercise for which I won't even begin to predict a strategy). How you access this will depend entirely on the embedding scenario for your engine.
使用引擎的API,查询包含您认为代表性文本的元素的字体度量(选择这是一个我甚至不会开始预测策略的练习)。如何访问它将完全取决于引擎的嵌入方案。
I expect this is the 'hard way', but I'm not sure there is an 'easy' way.
我希望这是“艰难的方式”,但我不确定是否有“简单”的方式。
#1
Rendered text size? A user can change the text size at will using his/her browser settings. Not to mention that different browsers render the same content slightly differently.
呈现文字大小?用户可以使用他/她的浏览器设置随意更改文本大小。更不用说不同的浏览器会略微不同地呈现相同的内容。
#2
If you are satisfied with the answer being for the 'default', no user customization view for this purpose (which seems likely), I believe you are looking at a fairly painful scenario:
如果您对“默认”的答案感到满意,没有用于此目的的用户自定义视图(这似乎很可能),我相信您正在寻找一个相当痛苦的场景:
-
Embed a rendering engine with CSS support in your spider. Prefer the use of an engine which matches most of your users, or alternatively use all three common engines and store the information for all of them. The ease of embedding varies widely on your consuming technology.
在您的蜘蛛中嵌入一个CSS支持的渲染引擎。更喜欢使用与大多数用户匹配的引擎,或者使用所有三种常用引擎并存储所有这些引擎的信息。嵌入的简易性因您的消费技术而异。
-
Load the URI being spidered in the rendering engine(s).
将正在蜘蛛网中的URI加载到渲染引擎中。
-
Using the engine's API, query it's font metrics for an element containing what you consider representative text (choosing this is an exercise for which I won't even begin to predict a strategy). How you access this will depend entirely on the embedding scenario for your engine.
使用引擎的API,查询包含您认为代表性文本的元素的字体度量(选择这是一个我甚至不会开始预测策略的练习)。如何访问它将完全取决于引擎的嵌入方案。
I expect this is the 'hard way', but I'm not sure there is an 'easy' way.
我希望这是“艰难的方式”,但我不确定是否有“简单”的方式。