中文分词及词性标注语料

时间:2021-01-07 10:32:07
【文件属性】:

文件名称:中文分词及词性标注语料

文件大小:40.61MB

文件格式:RAR

更新时间:2021-01-07 10:32:07

中文分词 词性标注 语料

中文分词及词性标注语料,包含微软亚研院、搜狗、北京大学等的语料库


【文件预览】:
icwb2-data
----gold()
--------as_testing_gold.utf8(920KB)
--------msr_training_words.txt(723KB)
--------as_training_words.utf8(1.33MB)
--------msr_test_gold.txt(569KB)
--------cityu_test_gold.txt(171KB)
--------pku_training_words.utf8(479KB)
--------pku_training_words.txt(339KB)
--------as_testing_gold.txt(624KB)
--------cityu_training_words.utf8(571KB)
--------cityu_training_words.txt(412KB)
--------msr_training_words.utf8(1.02MB)
--------pku_test_gold.utf8(701KB)
--------cityu_test_gold.utf8(235KB)
--------as_training_words.txt(951KB)
--------pku_test_gold.txt(539KB)
--------msr_test_gold.utf8(749KB)
----scripts()
--------score(7KB)
--------mwseg.pl(3KB)
----doc()
--------result_instructions.txt(4KB)
--------instructions.txt(7KB)
----testing()
--------as_test.txt(412KB)
--------pku_test.utf8(498KB)
--------pku_test.txt(335KB)
--------cityu_test.utf8(197KB)
--------as_test.utf8(604KB)
--------cityu_test.txt(133KB)
--------msr_test.txt(367KB)
--------msr_test.utf8(547KB)
----training()
--------as_training.utf8(38.86MB)
--------cityu_training.txt(5.94MB)
--------cityu_training.utf8(8.15MB)
--------pku_training.txt(5.63MB)
--------as_training.b5(26.36MB)
--------pku_training.utf8(7.37MB)
--------msr_training.txt(12.25MB)
--------msr_training.utf8(16.11MB)
----README(2KB)

网友评论

  • 骗人的。。。里面没有语料库,没有词性标注