I have a site that I want to extract data from. The data retrieval is very straight forward.
我有一个网站,我想从中提取数据。数据检索非常简单。
It takes the parameters using HTTP POST and returns a JSON object. So, I have a list of queries that I want to do and then repeat at certain intervals to update a database. Is scrapy suitable for this or should I be using something else?
它使用HTTP POST获取参数并返回JSON对象。所以,我有一个我想要做的查询列表,然后以一定的间隔重复更新数据库。 scrapy适合这个还是我应该使用其他东西?
I don't actually need to follow links but I do need to send multiple requests at the same time.
我实际上不需要关注链接,但我确实需要同时发送多个请求。
3 个解决方案
#1
6
How does looks like the POST request? There are many variations, like simple query parameters (?a=1&b=2
), form-like payload (the body contains a=1&b=2
), or any other kind of payload (the body contains a string in some format, like json or xml).
看起来像POST请求怎么样?有许多变体,比如简单的查询参数(?a = 1&b = 2),类似形式的有效负载(正文包含a = 1&b = 2),或任何其他类型的有效负载(正文包含某种格式的字符串,如json或xml)。
In scrapy is fairly straightforward to make POST requests, see: http://doc.scrapy.org/en/latest/topics/request-response.html#request-usage-examples
在scrapy中发出POST请求相当简单,请参阅:http://doc.scrapy.org/en/latest/topics/request-response.html#request-usage-examples
For example, you may need something like this:
例如,您可能需要这样的东西:
# Warning: take care of the undefined variables and modules!
def start_requests(self):
payload = {"a": 1, "b": 2}
yield Request(url, self.parse_data, method="POST", body=urllib.urlencode(payload))
def parse_data(self, response):
# do stuff with data...
data = json.loads(response.body)
#2
0
For handling requests and retrieving response, scrapy is more than enough. And to parse JSON, just use the json
module in the standard library:
为了处理请求和检索响应,scrapy绰绰有余。要解析JSON,只需使用标准库中的json模块:
import json
data = ...
json_data = json.loads(data)
Hope this helps!
希望这可以帮助!
#3
0
Based on my understanding of the question, you just want to fetch/scrape data from a web page at certain intervals. Scrapy is generally used for crawling.
根据我对该问题的理解,您只想以一定的间隔从网页中获取/抓取数据。 Scrapy通常用于爬行。
If you just want to make http post requests you might consider using the python requests library.
如果您只想发出http post请求,可以考虑使用python请求库。
#1
6
How does looks like the POST request? There are many variations, like simple query parameters (?a=1&b=2
), form-like payload (the body contains a=1&b=2
), or any other kind of payload (the body contains a string in some format, like json or xml).
看起来像POST请求怎么样?有许多变体,比如简单的查询参数(?a = 1&b = 2),类似形式的有效负载(正文包含a = 1&b = 2),或任何其他类型的有效负载(正文包含某种格式的字符串,如json或xml)。
In scrapy is fairly straightforward to make POST requests, see: http://doc.scrapy.org/en/latest/topics/request-response.html#request-usage-examples
在scrapy中发出POST请求相当简单,请参阅:http://doc.scrapy.org/en/latest/topics/request-response.html#request-usage-examples
For example, you may need something like this:
例如,您可能需要这样的东西:
# Warning: take care of the undefined variables and modules!
def start_requests(self):
payload = {"a": 1, "b": 2}
yield Request(url, self.parse_data, method="POST", body=urllib.urlencode(payload))
def parse_data(self, response):
# do stuff with data...
data = json.loads(response.body)
#2
0
For handling requests and retrieving response, scrapy is more than enough. And to parse JSON, just use the json
module in the standard library:
为了处理请求和检索响应,scrapy绰绰有余。要解析JSON,只需使用标准库中的json模块:
import json
data = ...
json_data = json.loads(data)
Hope this helps!
希望这可以帮助!
#3
0
Based on my understanding of the question, you just want to fetch/scrape data from a web page at certain intervals. Scrapy is generally used for crawling.
根据我对该问题的理解,您只想以一定的间隔从网页中获取/抓取数据。 Scrapy通常用于爬行。
If you just want to make http post requests you might consider using the python requests library.
如果您只想发出http post请求,可以考虑使用python请求库。