learnhtml:使用机器学习提取Web内容下载

【文件属性】：

文件名称：learnhtml:使用机器学习提取Web内容

文件大小：10.39MB

文件格式：ZIP

更新时间：2024-05-27 11:33:43

html deep-learning content-extraction HTML

学习技巧 HTML Web内容提取库，主要使用DOM功能以及一些文本功能。在Dragnet数据集上的标记级F1得分为.96 。要求首先，您将需要安装依赖项。对于二进制依赖项： sudo apt-get install recode libxml2-dev libxslt1-dev unzip Python依赖项： pip install -r requirements.txt 生成项目并在本地安装 pip install -e . 运行脚本 ./learnhtml/cli/prepare_data.sh << WHERE>> <> 版权所有（C）2018 Nichita Uțiu

立即下载

【文件预览】：
learnhtml-master
----MANIFEST.in(324B)
----requirements.txt(178B)
----LICENSE(11KB)
----setup.py(3KB)
----README.md(737B)
----learnhtml()
--------compat.py(9KB)
--------utils()
--------log.py(152B)
--------__init__.py(47B)
--------dataset_conversion()
--------cli()
--------data()
--------model_selection.py(18KB)
--------extractor.py(2KB)
--------features.py(13KB)
----tests()
--------test_utils_module.py(645B)
--------dataset_dragnet()
--------dataset_cleaneval()
--------test_heightDepthSelector.py(1KB)
--------__init__.py(0B)
--------test_features.py(20KB)
--------test_HTMLExtractor.py(2KB)
--------test_multiColumnTransformer.py(4KB)
--------test_itemSelector.py(6KB)
--------test_sparse_generator.py(3KB)
--------test_convert_dataset.py(29KB)
----.gitignore(1KB)

秒客网

learnhtml:使用机器学习提取Web内容

网友评论

相关文章