与BigQuery通信时,GAE App会出现套接字错误

时间:2022-09-24 15:25:53

Our GAE python application communicates with BigQuery using the Google Api Client for Python (currently we use version 1.3.1) with the GAE-specific authentication helpers. Very often we get a socket error while communicating with BigQuery.

我们的GAE python应用程序使用Google Api Client for Python(目前我们使用版本1.3.1)与GAE特定的身份验证帮助程序与BigQuery进行通信。我们经常在与BigQuery通信时遇到套接字错误。

More specifically, we build a python Google API client as follows

更具体地说,我们按如下方式构建了一个python Google API客户端

1. bq_scope = 'https://www.googleapis.com/auth/bigquery'
2. credentials = AppAssertionCredentials(scope=bq_scope)
3. http = credentials.authorize(httplib2.Http())
4. bq_service = build('bigquery', 'v2', http=http)

We then interact with the BQ service and get the following error


File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/gae_override/httplib.py", line 536, in getresponse 'An error occured while connecting to the server: %s' % e) error: An error occured while connecting to the server: Unable to fetch URL: [api url...]

文件“/base/data/home/runtimes/python27/python27_dist/lib/python2.7/gae_override/httplib.py”,第536行,在getresponse中'连接到服务器时出错:%s'%e)错误:连接到服务器时出错:无法获取URL:[api url ...]

The error raised is of type google.appengine.api.remote_socket._remote_socket_error.error, not an exception that wraps the error.


Initially we thought that it might be timeout-related, so we also tried setting a timeout altering line 3 in the above snippet to


3. http = credentials.authorize(httplib2.Http(timeout=60))

However, according to the log output of client library the API call takes less than 1 second to crash and explicitly setting the timeout did not change the system behavior.


Note that the error occurs in various API calls, not just a single one, and usually this happens on very light operations, for example we often see the error while polling BQ for a job status and rarely on data fetching. When we re-run the operation, the system works.


Any idea why this might happen and -perhaps- a best-practice to handle it?

知道为什么会发生这种情况并且 - 或许 - 这是处理它的最佳做法吗?

1 个解决方案


All HTTP(s) requests will be routed through the urlfetch service.


Beneath that, the Google Api Client for Python uses httplib2 to make HTTP(s) requests and under the covers this library uses socket.

在此之下,Google Api Client for Python使用httplib2来发出HTTP请求,并且这个库使用套接字。

Since the error is coming from socket you might try to set the timeout there.


import socket
timeout = 30

If we continue up the stack httplib2 will use the timeout parameter from the socket level timeout.



Moving further up the stack you can set the timeout and retries for BigQuery.


    timeout = 30000
    num_retries = 5
    query_request = bigquery_service.jobs()
    query_data = {
        'query': (query_var),
        'timeoutMs': timeout,

And finally you can set the timeout for urlfetch.


from google.appengine.api import urlfetch

If you believe it's timeout related you might want to test each library / level to make sure the timeout is being passed correctly. You can also use a basic timer to see the results.


start_query = time.time()
query_response = query_request.query(
end_query = time.time()
logging.info(end_query - start_query)

There are dozens of questions about timeout and deadline exceeded for GAE and BigQuery on this site so I wouldn't be surprised if you're hitting something weird.


Good luck!


