I would like to parse a webpage to gather data for scientific use. The text I need to parse is located within a < span >. Parsing the HTML will not achieve this because this text is constantly changing, sometimes as fast as 10 updates each second. I know for a fact (due to a scientific paper I read) that it is possible.
我想解析一个网页来收集科学用数据。我需要解析的文本位于中。解析HTML将无法实现此目的,因为此文本不断变化,有时每秒快速更新10次。我知道一个事实(由于我读过一篇科学论文),这是可能的。
The webpage I need to gather the data from : http://realtime.springer.com/map Basically, each time the paper is downloaded, the marker is shown on the map. I am looking to data-mine the city/location for each marker in real-time as they pop up which you can see under the map on the left side.
我需要从以下网站收集数据的网页:http://realtime.springer.com/map基本上,每次下载纸张时,标记都会显示在地图上。我正在寻找实时数据挖掘每个标记的城市/位置,因为它们会弹出,您可以在左侧的地图下看到它们。
Questions:
1) How can I parse this real-time changing text since it is being generated by java-script code? Parsing webpages isn't a subject that is so new to me but real-time changing text is.
问题:1)如何解析这个实时更改文本,因为它是由java脚本代码生成的?解析网页不是一个对我来说如此新鲜的主题,而是实时更改文本。
2) Since speed is an issue here in parsing & writing the data, which language would be best for my project? I plan on writing to a SQL database since speed is very much an issue, so keep in mind the entire operation as well as the ease at which it can be done when considering each language. I would like python if there are well documented libraries I can use.
2)由于速度是解析和编写数据的一个问题,哪种语言最适合我的项目?我打算写一个SQL数据库,因为速度是一个很大的问题,所以请记住整个操作以及在考虑每种语言时可以轻松完成的操作。如果我有可以使用的文档很好的库,我想要python。
Thank you very much in advance for any advice.
非常感谢您提出任何建议。
1 个解决方案
#1
0
It looks like they are making a JSON call to get the map data. Assuming you have their permission (there is a copyright notice), you could call the same URL to get the raw data directly rather than parsing it from the map.
看起来他们正在进行JSON调用以获取地图数据。假设您拥有自己的权限(有版权声明),您可以调用相同的URL直接获取原始数据,而不是从地图中解析它。
$.getJSON('/ip2location/lookupMulti.php', { "rand": Math.random() }, function(data) {
for (var i=0; i<data.length; i++) {
var lat = data[i].lat;
var lng = data[i].lng;
var name = data[i].name;
}
// Etc...
Many companies have policies against pinging their server frequently (whether loading the main page or calling lookupMulti.php). If you do not have permission, you may well find that your IP is quickly banned.
许多公司都有针对经常ping服务器的策略(无论是加载主页还是调用lookupMulti.php)。如果您没有权限,您很可能会发现您的IP很快被禁止。
#1
0
It looks like they are making a JSON call to get the map data. Assuming you have their permission (there is a copyright notice), you could call the same URL to get the raw data directly rather than parsing it from the map.
看起来他们正在进行JSON调用以获取地图数据。假设您拥有自己的权限(有版权声明),您可以调用相同的URL直接获取原始数据,而不是从地图中解析它。
$.getJSON('/ip2location/lookupMulti.php', { "rand": Math.random() }, function(data) {
for (var i=0; i<data.length; i++) {
var lat = data[i].lat;
var lng = data[i].lng;
var name = data[i].name;
}
// Etc...
Many companies have policies against pinging their server frequently (whether loading the main page or calling lookupMulti.php). If you do not have permission, you may well find that your IP is quickly banned.
许多公司都有针对经常ping服务器的策略(无论是加载主页还是调用lookupMulti.php)。如果您没有权限,您很可能会发现您的IP很快被禁止。