tatoeba_tinysegmenter下载

【文件属性】：

文件名称：tatoeba_tinysegmenter

文件大小：2KB

文件格式：ZIP

更新时间：2024-04-14 18:41:06

Python

tatoeba_tinysegmenter 这使用Tatoeba日语语料库和Tinysegmenter的Python端口在JSON中创建日语句子的标记化列表。输入：带有日语句子列的CSV文件。第一行（标题）应为“句子”。输出：带有键“句子”的JSON对象，其中包含数组数组。每个数组都包含一个句子的标记。方向：下载Tatoeba日语语料库的csv文件： ://tatoeba.org/eng/downloads 添加第一行（标题），并在最上方的单元格中输入“句子”。安装Tinysegmenter ：（在终端中）pip install tinysegmenter 检索一个例句：将open（'sentences_jp_tokenized.json'）设为f：data = json.load（f）print（data ['sentences'] [0]） Pytho

立即下载

【文件预览】：
tatoeba_tinysegmenter-main
----convert_csv_tinysegmenter.py(875B)
----convert_csv_to_json_jp_mecab.py(909B)
----README.md(1KB)

秒客网

tatoeba_tinysegmenter

网友评论