从HTML表中提取数据并转换为JSON

时间:2022-10-24 17:23:38

I have a HTML table that I want to parse and convert to JSON.

我有一个要解析并转换为JSON的HTML表。

<table cellspacing="0" style="height: 24px;">
 <tr class="tr-hover">
  <th rowspan="15" scope="row">Network</th>
  <td class="ttl"><a href="network-bands.php3">Technology</a></td>
  <td class="nfo"><a href="#" class="link-network-detail collapse">GSM</a></td>
 </tr>
 <tr class="tr-toggle">
  <td class="ttl"><a href="network-bands.php3">2G bands</a></td>
  <td class="nfo">GSM 900 / 1800 - SIM 1 & SIM 2</td>
 </tr>  
 <tr class="tr-toggle">
  <td class="ttl"><a href="glossary.php3?term=gprs">GPRS</a></td>
  <td class="nfo">Class 12</td>
 </tr>  
 <tr class="tr-toggle">
  <td class="ttl"><a href="glossary.php3?term=edge">EDGE</a></td>
  <td class="nfo">Yes</td>
 </tr>
</table>

In the above table

在上面的表

<th rowspan="15" scope="row">Network</th> 

JSON array name should be "Network".

JSON数组名应该是“Network”。

<td class="ttl"><a href="network-bands.php3">Technology</a></td>

Technology is a subheading of Network, so it must be a JSON element inside the JSON array. The values coming inside Technology array should be the values from

技术是网络的子标题,所以它必须是JSON数组中的JSON元素。技术数组中的值应该是来自的值

<td class="nfo"><a href="#" class="link-network-detail collapse">GSM</a></td>

I hope my question is clear. How can i do that?

我希望我的问题是清楚的。我该怎么做呢?

3 个解决方案

#1


4  

Here is an answer using Jsoup and JSON as dependencies:

以下是使用Jsoup和JSON作为依赖项的答案:

final String HTML = "<table cellspacing=\"0\" style=\"height: 24px;\">\r\n<tr class=\"tr-hover\">\r\n<th rowspan=\"15\" scope=\"row\">Network</th>\r\n<td class=\"ttl\"><a href=\"network-bands.php3\">Technology</a></td>\r\n<td class=\"nfo\"><a href=\"#\" class=\"link-network-detail collapse\">GSM</a></td>\r\n</tr>\r\n<tr class=\"tr-toggle\">\r\n<td class=\"ttl\"><a href=\"network-bands.php3\">2G bands</a></td>\r\n<td class=\"nfo\">GSM 900 / 1800 - SIM 1 & SIM 2</td>\r\n</tr>   \r\n<tr class=\"tr-toggle\">\r\n<td class=\"ttl\"><a href=\"glossary.php3?term=gprs\">GPRS</a></td>\r\n<td class=\"nfo\">Class 12</td>\r\n</tr>   \r\n<tr class=\"tr-toggle\">\r\n<td class=\"ttl\"><a href=\"glossary.php3?term=edge\">EDGE</a></td>\r\n<td class=\"nfo\">Yes</td>\r\n</tr>\r\n</table>";
Document document = Jsoup.parse(HTML);
Element table = document.select("table").first();
String arrayName = table.select("th").first().text();
JSONObject jsonObj = new JSONObject();
JSONArray jsonArr = new JSONArray();
Elements ttls = table.getElementsByClass("ttl");
Elements nfos = table.getElementsByClass("nfo");
JSONObject jo = new JSONObject();
for (int i = 0, l = ttls.size(); i < l; i++) {
    String key = ttls.get(i).text();
    String value = nfos.get(i).text();
    jo.put(key, value);
}
jsonArr.put(jo);
jsonObj.put(arrayName, jsonArr);
System.out.println(jsonObj.toString());

Output (formatted):

输出(格式):

{
    "Network": [
        {
            "2G bands": "GSM 900 / 1800 - SIM 1 & SIM 2",
            "Technology": "GSM",
            "GPRS": "Class 12",
            "EDGE": "Yes"
        }
    ]
}

#2


0  

import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class Test1
{
public static void main(String[] args)
    {
TableElements("https://www.w3schools.com/html/html_tables.asp","customers");
    }
    public static void TableElements(String link,String id) `<br>
    {
        StringBuilder b = new StringBuilder();
//Provide the ChromeDriver location 
        System.setProperty("webdriver.chrome.driver", "C:/Users/xyz/Desktop/Ecllipse/chromedriver.exe");
        WebDriver driver = new ChromeDriver();
        driver.get(link);
        WebElement element;
        //Getting the Table code
`       try
        {
            element = driver.findElement(By.id(id));
        `}
        `catch(Exception e)
        `{
            `element = driver.findElement(By.className(id));
        `}
        `String html = element.getAttribute("innerHTML");
        //Formatting to Table code to Html
        `b.append(html);
        `b.insert(0,"<html><body><table>");
        `b.append("</table></body></html>");

    `Document document = Jsoup.parse(b.toString());
    `Element table = document.select("table").first();

    //Selecting Th and Td 
    `JSONArray jsonArr = new JSONArray();
    `Elements ttls = table.getElementsByTag("th");
    `Elements nfos = table.getElementsByTag("td");
    `String key = "";
    `String value = "";

    //Adding Td to Th in JSON array
    `for(int i=0,j=0;i<nfos.size();i++,j++)
    `{`<br>
        `if(j<ttls.size())
        `{`<br>
            `key = ttls.get(j).text();
            `value = nfos.get(i).text();
        `}
        `else
        `{
            `j=0;
            `key = ttls.get(j).text();
            `value = nfos.get(i).text();
        `}`<br>
        `JSONObject jo = new JSONObject();
        `try 
        `{
            `jo.put(key, value);
        `}`<br>
        `catch (JSONException e)
        `{
            `System.out.println("Unable to add objects to Json Array!");
        `}`<br>
        `jsonArr.put(jo);`<br>
    `}`<br>
    `String ji = "";`<br>
    `int j = 0;`<br>

    //Converting JSON array to Character array and removing unwanted characters
    `for (char ch: jsonArr.toString().toCharArray()) `<br>
    `{`<br>
        `if(ch == '}')`<br>
        `{`<br>
            `j++;`<br>
            `if(j%ttls.size() != 0)`<br>
                `ch = ' ';`<br>
        `}`<br>
        `else if(ch == '{')`<br>
        `{`<br>
            `if(j%ttls.size() != 0)`<br>
                `ch = ' ';`<br>
        `}`<br>
        `ji+=ch;`<br>
    `}`<br>
    `System.out.println(ji);`<br>
    `driver.close();`<br>
`}`<br>
}

#3


-1  

Ok, here you go... its not very pretty and it relies on your data being in a specific structure but here's a simple bit of Javascript to extract the json. hope this give you a place to start at least.

好吧,给你……它不是很漂亮,它依赖于数据的特定结构,但是这里有一个简单的Javascript来提取json。希望这至少给你一个起点。

var titles = $('td.ttl'), titleLink, info;
var json="{\n\t\"network\": {\n";
$.each(titles,function(a,b){
    titleLink =$(titles[a]).find('a')[0];
    json+="\t\t\"" + $(titleLink).text() + "\": ";
    info=$(titles[a]).next()[0];
    json+="\"" + $(info).text() + "\"";
    console.log(a, (titles.length-1));
    if(a!=(titles.length-1)){
        json+=",";
    }
    json+="\n";
});
json+="\t}\n}";
console.log(json)

...and here's the JSFiddle - http://jsfiddle.net/0hhkqq1d/

…这是JSFiddle——http://jsfiddle.net/0hhkqq1d/

Enjoy!

享受吧!

#1


4  

Here is an answer using Jsoup and JSON as dependencies:

以下是使用Jsoup和JSON作为依赖项的答案:

final String HTML = "<table cellspacing=\"0\" style=\"height: 24px;\">\r\n<tr class=\"tr-hover\">\r\n<th rowspan=\"15\" scope=\"row\">Network</th>\r\n<td class=\"ttl\"><a href=\"network-bands.php3\">Technology</a></td>\r\n<td class=\"nfo\"><a href=\"#\" class=\"link-network-detail collapse\">GSM</a></td>\r\n</tr>\r\n<tr class=\"tr-toggle\">\r\n<td class=\"ttl\"><a href=\"network-bands.php3\">2G bands</a></td>\r\n<td class=\"nfo\">GSM 900 / 1800 - SIM 1 & SIM 2</td>\r\n</tr>   \r\n<tr class=\"tr-toggle\">\r\n<td class=\"ttl\"><a href=\"glossary.php3?term=gprs\">GPRS</a></td>\r\n<td class=\"nfo\">Class 12</td>\r\n</tr>   \r\n<tr class=\"tr-toggle\">\r\n<td class=\"ttl\"><a href=\"glossary.php3?term=edge\">EDGE</a></td>\r\n<td class=\"nfo\">Yes</td>\r\n</tr>\r\n</table>";
Document document = Jsoup.parse(HTML);
Element table = document.select("table").first();
String arrayName = table.select("th").first().text();
JSONObject jsonObj = new JSONObject();
JSONArray jsonArr = new JSONArray();
Elements ttls = table.getElementsByClass("ttl");
Elements nfos = table.getElementsByClass("nfo");
JSONObject jo = new JSONObject();
for (int i = 0, l = ttls.size(); i < l; i++) {
    String key = ttls.get(i).text();
    String value = nfos.get(i).text();
    jo.put(key, value);
}
jsonArr.put(jo);
jsonObj.put(arrayName, jsonArr);
System.out.println(jsonObj.toString());

Output (formatted):

输出(格式):

{
    "Network": [
        {
            "2G bands": "GSM 900 / 1800 - SIM 1 & SIM 2",
            "Technology": "GSM",
            "GPRS": "Class 12",
            "EDGE": "Yes"
        }
    ]
}

#2


0  

import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class Test1
{
public static void main(String[] args)
    {
TableElements("https://www.w3schools.com/html/html_tables.asp","customers");
    }
    public static void TableElements(String link,String id) `<br>
    {
        StringBuilder b = new StringBuilder();
//Provide the ChromeDriver location 
        System.setProperty("webdriver.chrome.driver", "C:/Users/xyz/Desktop/Ecllipse/chromedriver.exe");
        WebDriver driver = new ChromeDriver();
        driver.get(link);
        WebElement element;
        //Getting the Table code
`       try
        {
            element = driver.findElement(By.id(id));
        `}
        `catch(Exception e)
        `{
            `element = driver.findElement(By.className(id));
        `}
        `String html = element.getAttribute("innerHTML");
        //Formatting to Table code to Html
        `b.append(html);
        `b.insert(0,"<html><body><table>");
        `b.append("</table></body></html>");

    `Document document = Jsoup.parse(b.toString());
    `Element table = document.select("table").first();

    //Selecting Th and Td 
    `JSONArray jsonArr = new JSONArray();
    `Elements ttls = table.getElementsByTag("th");
    `Elements nfos = table.getElementsByTag("td");
    `String key = "";
    `String value = "";

    //Adding Td to Th in JSON array
    `for(int i=0,j=0;i<nfos.size();i++,j++)
    `{`<br>
        `if(j<ttls.size())
        `{`<br>
            `key = ttls.get(j).text();
            `value = nfos.get(i).text();
        `}
        `else
        `{
            `j=0;
            `key = ttls.get(j).text();
            `value = nfos.get(i).text();
        `}`<br>
        `JSONObject jo = new JSONObject();
        `try 
        `{
            `jo.put(key, value);
        `}`<br>
        `catch (JSONException e)
        `{
            `System.out.println("Unable to add objects to Json Array!");
        `}`<br>
        `jsonArr.put(jo);`<br>
    `}`<br>
    `String ji = "";`<br>
    `int j = 0;`<br>

    //Converting JSON array to Character array and removing unwanted characters
    `for (char ch: jsonArr.toString().toCharArray()) `<br>
    `{`<br>
        `if(ch == '}')`<br>
        `{`<br>
            `j++;`<br>
            `if(j%ttls.size() != 0)`<br>
                `ch = ' ';`<br>
        `}`<br>
        `else if(ch == '{')`<br>
        `{`<br>
            `if(j%ttls.size() != 0)`<br>
                `ch = ' ';`<br>
        `}`<br>
        `ji+=ch;`<br>
    `}`<br>
    `System.out.println(ji);`<br>
    `driver.close();`<br>
`}`<br>
}

#3


-1  

Ok, here you go... its not very pretty and it relies on your data being in a specific structure but here's a simple bit of Javascript to extract the json. hope this give you a place to start at least.

好吧,给你……它不是很漂亮,它依赖于数据的特定结构,但是这里有一个简单的Javascript来提取json。希望这至少给你一个起点。

var titles = $('td.ttl'), titleLink, info;
var json="{\n\t\"network\": {\n";
$.each(titles,function(a,b){
    titleLink =$(titles[a]).find('a')[0];
    json+="\t\t\"" + $(titleLink).text() + "\": ";
    info=$(titles[a]).next()[0];
    json+="\"" + $(info).text() + "\"";
    console.log(a, (titles.length-1));
    if(a!=(titles.length-1)){
        json+=",";
    }
    json+="\n";
});
json+="\t}\n}";
console.log(json)

...and here's the JSFiddle - http://jsfiddle.net/0hhkqq1d/

…这是JSFiddle——http://jsfiddle.net/0hhkqq1d/

Enjoy!

享受吧!