I'm using a mvvc framework (Angular) and having some trouble getting the site data indexed. All the static data is crawled fine but dynamic data from a cloud db is missed.
我使用的是mvvc框架(角度的),而且很难将站点数据编入索引。所有静态数据都被很好地抓取,但是来自云数据库的动态数据却被遗漏了。
Is there any way to politely ask a crawler to wait a few hundred ms before going at it?
有什么方法可以礼貌地让爬虫等几百毫秒后再行动呢?
2 个解决方案
#1
1
There is no way to tell a spider to wait. This would be counter-productive since their job is to index data as quickly as possible and each wait would accumulate into days/weeks/months of delays. (Note that Google has explored some javascript rendering, but that won't help with XHR content).
没有办法让蜘蛛等待。这将会产生反作用,因为他们的工作是尽可能快地索引数据,而每次等待都会累积成几天/几周/几个月的延迟。(请注意,谷歌已经研究了一些javascript呈现,但这对XHR内容没有帮助)。
The correct answer is to explore Making AJAX Applications Crawlable. The gist of this approach is that you utilize a tool like prerender.io during your deploy process to pre-render dynamic content. Then you list that content in your sitemap and either utilize the _escaped_fragment_
rewrites at your server or the meta tag as explained here (from Getting Started):
正确的答案是探索使AJAX应用程序能够爬行。这种方法的要点是您可以使用prerender之类的工具。在您的部署过程中,io以预呈现动态内容。然后在站点地图中列出这些内容,或者使用服务器上的_escaped_fragment_重写,或者使用这里解释的meta标记(从开始):
In order to make pages without hash fragments crawlable, you include a special meta tag in the head of the HTML of your page. The meta tag takes the following form:
为了使页面不受哈希片段的影响,您需要在页面的HTML头部包含一个特殊的元标记。元标签采用以下形式:
<meta name="fragment" content="!">
In both cases, you still have to pre-render dynamic content to cached HTML pages and make those available to the search engine when it requests content from your server.
在这两种情况下,仍然需要将动态内容预先呈现到缓存的HTML页面中,并在搜索引擎从服务器请求内容时将这些内容提供给搜索引擎。
#2
1
Best way to put noindex, nofollow that time.
最好的方法是不要写索引,不要跟着时间走。
after load data fully, you can remove that tags.
在完全加载数据之后,您可以删除该标记。
#1
1
There is no way to tell a spider to wait. This would be counter-productive since their job is to index data as quickly as possible and each wait would accumulate into days/weeks/months of delays. (Note that Google has explored some javascript rendering, but that won't help with XHR content).
没有办法让蜘蛛等待。这将会产生反作用,因为他们的工作是尽可能快地索引数据,而每次等待都会累积成几天/几周/几个月的延迟。(请注意,谷歌已经研究了一些javascript呈现,但这对XHR内容没有帮助)。
The correct answer is to explore Making AJAX Applications Crawlable. The gist of this approach is that you utilize a tool like prerender.io during your deploy process to pre-render dynamic content. Then you list that content in your sitemap and either utilize the _escaped_fragment_
rewrites at your server or the meta tag as explained here (from Getting Started):
正确的答案是探索使AJAX应用程序能够爬行。这种方法的要点是您可以使用prerender之类的工具。在您的部署过程中,io以预呈现动态内容。然后在站点地图中列出这些内容,或者使用服务器上的_escaped_fragment_重写,或者使用这里解释的meta标记(从开始):
In order to make pages without hash fragments crawlable, you include a special meta tag in the head of the HTML of your page. The meta tag takes the following form:
为了使页面不受哈希片段的影响,您需要在页面的HTML头部包含一个特殊的元标记。元标签采用以下形式:
<meta name="fragment" content="!">
In both cases, you still have to pre-render dynamic content to cached HTML pages and make those available to the search engine when it requests content from your server.
在这两种情况下,仍然需要将动态内容预先呈现到缓存的HTML页面中,并在搜索引擎从服务器请求内容时将这些内容提供给搜索引擎。
#2
1
Best way to put noindex, nofollow that time.
最好的方法是不要写索引,不要跟着时间走。
after load data fully, you can remove that tags.
在完全加载数据之后,您可以删除该标记。