googlebot在抓取时会保留会话吗?

时间:2021-06-05 15:32:44

When googlebot crawls pages does it have session? For example I am storing some variables on the session and using them in my site's pages. When googlebot crawls these pages will I still have the session-variables? In my global.asax I am storing some variables on the session at session start. Will I have any problem with Google bot?

当googlebot抓取网页时会有会话吗?例如,我在会话中存储了一些变量,并在我的网站页面中使用它们。当googlebot抓取这些网页时,我还会有会话变量吗?在我的global.asax中,我在会话开始时在会话中存储了一些变量。我对谷歌机器人有任何问题吗?

4 个解决方案

#1


2  

The answer to one of your question is: yes, you will have problems with Google bot.

您的一个问题的答案是:是的,您将遇到谷歌机器人的问题。

Generally we've encountered two types of issues with google bot:

一般来说,谷歌机器人遇到了两类问题:

  1. it sometimes does not retain HTTP cookies between requests. Our application relies on custom cookies and the there were plenty of google bot requests caught to carry no cookies at all.

    它有时不会在请求之间保留HTTP cookie。我们的应用程序依赖于自定义cookie,并且有大量谷歌机器人请求被捕获,根本不携带cookie。

  2. it makes long breaks between consecutive requests. For example, it retrieves your page and asks for it's scripts later on.

    它会在连续请求之间长时间休息。例如,它会检索您的页面,并在稍后询问它的脚本。

Both will cause troubles with your session. First - you need a precise ASPNETSessionID cookie to be passed between requests. Googlebot will probably sometimes fail to do that. Second - if there's a long timespan between requests, your session is going to terminate even if the cookie is there.

两者都会导致你的会话出现问题。首先 - 您需要在请求之间传递精确的ASPNETSessionID cookie。 Googlebot有时可能无法做到这一点。第二 - 如果请求之间的时间间隔很长,即使cookie存在,您的会话也将终止。

#2


9  

Googlebot actively tries to avoid sessions and does not support cookies. From First date with the Googlebot: Headers and compression (March 2008)

Googlebot会主动尝试避免会话,并且不支持Cookie。从与Googlebot的第一次约会:标题和压缩(2008年3月)

I usually avoid cookies (so no "Cookie:" header) since I don't want the content affected too much by session-specific info. And, if a server uses a session id in a dynamic URL rather than a cookie, I can usually figure this out, so that I don't end up crawling your same page a million times with a million different session ids.

我通常避免使用cookies(因此没有“Cookie:”标题),因为我不希望内容受会话特定信息的影响太大。而且,如果服务器在动态URL而不是cookie中使用会话ID,我通常可以解决这个问题,因此我不会最终使用一百万个不同的会话ID抓取同一页面一百万次。

I imagine most regular search engine bots will be similar in this respect. Google is trying to build an index of unique URLs. The URL is the unique key that identifies a unique page of content. Cookies (and sessions) are not passed when a user clicks a link in the SERPS. Google is primarily indexing pages, not sites.

我想大多数常规搜索引擎机器人在这方面都是类似的。 Google正在尝试构建唯一网址的索引。 URL是标识唯一内容页面的唯一键。当用户单击SERPS中的链接时,不会传递Cookie(和会话)。 Google主要是为网页编制索引,而不是网站。

#3


2  

Generally the answer is no, however other crawlers (of which there are plenty) work other ways.

通常答案是否定的,但是其他爬虫(其中有很多)以其他方式工作。

I should note that I have seen an instance of a google crawler for Adwords (not the normal googlebot) which DID present a session cookie.

我应该注意到,我已经看到了Adwords(不是正常的googlebot)的谷歌抓取工具的实例,DID提供了会话cookie。

#4


0  

It's very unlikely, I think. It should create a new session every time it crawls your website.

我认为这不太可能。它应该在每次抓取您的网站时创建一个新会话。

#1


2  

The answer to one of your question is: yes, you will have problems with Google bot.

您的一个问题的答案是:是的,您将遇到谷歌机器人的问题。

Generally we've encountered two types of issues with google bot:

一般来说,谷歌机器人遇到了两类问题:

  1. it sometimes does not retain HTTP cookies between requests. Our application relies on custom cookies and the there were plenty of google bot requests caught to carry no cookies at all.

    它有时不会在请求之间保留HTTP cookie。我们的应用程序依赖于自定义cookie,并且有大量谷歌机器人请求被捕获,根本不携带cookie。

  2. it makes long breaks between consecutive requests. For example, it retrieves your page and asks for it's scripts later on.

    它会在连续请求之间长时间休息。例如,它会检索您的页面,并在稍后询问它的脚本。

Both will cause troubles with your session. First - you need a precise ASPNETSessionID cookie to be passed between requests. Googlebot will probably sometimes fail to do that. Second - if there's a long timespan between requests, your session is going to terminate even if the cookie is there.

两者都会导致你的会话出现问题。首先 - 您需要在请求之间传递精确的ASPNETSessionID cookie。 Googlebot有时可能无法做到这一点。第二 - 如果请求之间的时间间隔很长,即使cookie存在,您的会话也将终止。

#2


9  

Googlebot actively tries to avoid sessions and does not support cookies. From First date with the Googlebot: Headers and compression (March 2008)

Googlebot会主动尝试避免会话,并且不支持Cookie。从与Googlebot的第一次约会:标题和压缩(2008年3月)

I usually avoid cookies (so no "Cookie:" header) since I don't want the content affected too much by session-specific info. And, if a server uses a session id in a dynamic URL rather than a cookie, I can usually figure this out, so that I don't end up crawling your same page a million times with a million different session ids.

我通常避免使用cookies(因此没有“Cookie:”标题),因为我不希望内容受会话特定信息的影响太大。而且,如果服务器在动态URL而不是cookie中使用会话ID,我通常可以解决这个问题,因此我不会最终使用一百万个不同的会话ID抓取同一页面一百万次。

I imagine most regular search engine bots will be similar in this respect. Google is trying to build an index of unique URLs. The URL is the unique key that identifies a unique page of content. Cookies (and sessions) are not passed when a user clicks a link in the SERPS. Google is primarily indexing pages, not sites.

我想大多数常规搜索引擎机器人在这方面都是类似的。 Google正在尝试构建唯一网址的索引。 URL是标识唯一内容页面的唯一键。当用户单击SERPS中的链接时,不会传递Cookie(和会话)。 Google主要是为网页编制索引,而不是网站。

#3


2  

Generally the answer is no, however other crawlers (of which there are plenty) work other ways.

通常答案是否定的,但是其他爬虫(其中有很多)以其他方式工作。

I should note that I have seen an instance of a google crawler for Adwords (not the normal googlebot) which DID present a session cookie.

我应该注意到,我已经看到了Adwords(不是正常的googlebot)的谷歌抓取工具的实例,DID提供了会话cookie。

#4


0  

It's very unlikely, I think. It should create a new session every time it crawls your website.

我认为这不太可能。它应该在每次抓取您的网站时创建一个新会话。