TextCorpusFetcher:专为语言建模任务而自动提取文本数据的项目下载

【文件属性】：

文件名称：TextCorpusFetcher:专为语言建模任务而自动提取文本数据的项目

文件大小：10KB

文件格式：ZIP

更新时间：2024-05-02 15:42:17

Python

TextCorpusFetcher 专为语言建模任务而自动提取文本数据的项目。目前正在*上工作。通过查看文章（带有主题）->提取文本->在页面上引用的子文章上进行迭代，我们可以自动执行文本提取。入门在计算机上本地运行项目的说明。先决条件作为前提条件，您需要拥有python3。安装要求为了安装要求，您可以按照以下步骤操作。首先创建一个虚拟环境并为其命名： python3 -m venv envname source envname/bin/activate 升级点并安装要求： pip install -U pip pip install -r requirements.txt 跑步要运行提取程序，只需运行main.py脚本：例如，获取与“美食”相关的所有文章，深度为3： python CorpusFetcher/main.py --category cu

立即下载

【文件预览】：
TextCorpusFetcher-main
----user-config.py(35B)
----download_html.sh(345B)
----.github()
--------workflows()
----throttle.ctrl(0B)
----tests()
--------test_fetch_articles.py(995B)
----pywikibot.lwp(0B)
----CorpusFetcher()
--------main.py(4KB)
--------__init__.py(0B)
----LICENSE(11KB)
----requirements.txt(121B)
----.gitignore(1KB)
----Makefile(138B)
----README.md(1KB)

秒客网

TextCorpusFetcher:专为语言建模任务而自动提取文本数据的项目

网友评论

相关文章