wikiqa 数据集

时间:2021-04-23 03:19:32
【文件属性】:

文件名称:wikiqa 数据集

文件大小:6.77MB

文件格式:ZIP

更新时间:2021-04-23 03:19:32

NLP

We describe the WikiQA dataset, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering. Most previous work on answer sentence selection focuses on a dataset created using the TREC-QA data, which includes editor-generated questions and candidate answer sentences selected by matching content words in the question. WikiQA is constructed using a more natural process and is more than an order of magnitude larger than the previous dataset. In addition, the WikiQA dataset also includes questions for which there are no correct sentences, enabling researchers to work on answer triggering, a critical component in any QA system. We compare several systems on the task of answer sentence selection on both datasets and also describe the performance of a system on the problem of answer triggering using the WikiQA dataset.


【文件预览】:
WikiQACorpus
----WikiQA.tsv(5.98MB)
----WikiQA-train.ref(213KB)
----WikiQASent.pos.ans.tsv(528KB)
----README.txt(5KB)
----eval.py(2KB)
----LICENSE.pdf(246KB)
----Guidelines_Phase2.pdf(349KB)
----WikiQA-test-filtered.ref(23KB)
----WikiQA-dev.tsv(564KB)
----WikiQA-dev.ref(26KB)
----Guidelines_Phase1.pdf(773KB)
----WikiQA-test.tsv(1.24MB)
----WikiQA-train.txt(3.5MB)
----WikiQA-test.ref(61KB)
----WikiQA-dev-filtered.ref(11KB)
----WikiQA-dev.txt(472KB)
----emnlp-table()
--------WikiQA.CNN-Cnt.dev.rank(56KB)
--------WikiQA.CNN.dev.rank(74KB)
--------WikiQA.CNN.test.rank(169KB)
--------WikiQA.CNN-Cnt.test.rank(127KB)
----WikiQA-train.tsv(4.16MB)
----WikiQA-test.txt(1.05MB)

网友评论