Chinese Entity Linking Comprehensive

时间:2022-06-16 05:15:25
【文件属性】:

文件名称:Chinese Entity Linking Comprehensive

文件大小:23.64MB

文件格式:TGZ

更新时间:2022-06-16 05:15:25

LDC Chinese Entity 中文实体连接

TAC KBP Chinese Entity Linking Comprehensive Training and Evaluation Data 2011-2014 LDC2015E17 March 20, 2015 Linguistic Data Consortium 1. Overview Text Analysis Conference (TAC) is a series of workshops organized by the National Institute of Standards and Technology (NIST). TAC was developed to encourage research in natural language processing (NLP) and related applications by providing a large test collection, common evaluation procedures, and a forum for researchers to share their results. Through its various evaluations, the Knowledge Base Population (KBP) track of TAC encourages the development of systems that can match entities mentioned in natural texts with those appearing in a knowledge base and extract novel information about entities from a document collection and add it to a new or existing knowledge base. The goal of Entity Linking is to determine whether or not the entity referred to in each query has a matching entity node in the reference Knowledge Base (KB) (LDC2014T16). If there is a matching node for a query, annotators create a link between the two. If there is not a matching node for a query, the entity is marked as 'NIL' and then clustered with other NIL entities into equivalence classes. For more information, please refer to the Entity Linking section of NIST's 2014 TAC KBP website (2014 was the last year in which the Chinese Entity Linking evaluation was conducted as of the time this package was created) at http://nlp.cs.rpi.edu/kbp/2014/ This package contains all evaluation and training data developed in support of TAC KBP Chinese Entity Linking during the four years since the task's inception in 2011. This includes queries, KB links, equivalence class clusters for NIL entities (those that could not be linked to an entity in the knowledge base), and entity type information for each of the queries. The data included in this package were originally released by LDC to TAC KBP coordinators and performers under the following ecorpora catalog IDs and titles: LDC2011E46: TAC 2011 KBP Cross-lingual Sample Entity Linking Queries V1.1 LDC2011E55: TAC 2011 KBP Cross-lingual Training Entity Linking V1.1 LDC2012E34: TAC 2011 KBP Cross-Lingual Evaluation Entity Linking Annotation LDC2012E66: TAC 2012 KBP Chinese Entity Linking Web Training Queries and Annotations LDC2012E103: TAC 2012 KBP Chinese Entity Linking Evaluation Annotations V1.2 LDC2013E96: TAC 2013 KBP Chinese Entity Linking Evaluation Queries and Knowledge Base Links V1.2 LDC2014E47: TAC 2014 KBP Chinese Entity Linking Discussion Forum Training Data LDC2014E83: TAC 2014 KBP Chinese Entity Linking Evaluation Queries and Knowledge Base Links V2.0 2. Contents ./README.txt This file ./data/2011/eval/tac_kbp_2011_chinese_entity_linking_evaluation_queries.xml This file contains 2176 queries. Each query entry consists of the following fields: - A query ID formatted as the letters "EL_CLCMN_" (if a Chinese language query) or "EL_CLENG_" (if an English language query) plus a five-digit zero-padded, sequentially assigned integer (e.g. "EL_CLCMN_00001"). - The full namestring of the query entity. - An ID for a document in ./data/2011/eval/source_documents/ from which the namestring was extracted. The queries are distributed by language and type as follows: KB-Link GPE ORG PER Total ---------------------------------------- CMN NW NIL: 120 291 420 831 CMN NW Non-NIL: 279 150 221 650 ENG NW NIL: 90 129 20 239 ENG NW Non-NIL: 93 72 104 269 ENG WB NIL: 16 0 5 21 ENG WB Non-NIL: 44 68 54 166 ---------------------------------------- Total: 624 710 824 2176 ./data/2011/eval/tac_kbp_2011_chinese_entity_linking_evaluation_KB_links.tab This file contains the responses for each query as identified by human annotators at LDC. This file is tab delimited, with 4 fields total. The column descriptions are as follows: 1. query ID - The ID for the query detailed in tac_kbp_2011_chinese_entity_linking_evaluation_queries.xml to which the subsequent information pertains 2. entity ID - A unique entity node ID or NIL ID, correspondent to entity linking annotation and NIL-coreference (clustering) annotation respectively. If the entity node ID begins with "E", the text refers to an entity in the Knowledge Base (TAC KBP Reference Knowledge Base - LDC2014T16). If the given query is not linked to an entity in the Knowledge Base (KB), then it is given a NIL-ID, which consists of "NIL" plus a four-digit zero-padded sequentially assigned integer (e.g. NIL-0001, NIL-0002). Both the entities with an entity node ID of "E" type and "NIL" type are assumed to be co-referenced (clustered), with the same "E" type ID or the same "NIL" ID if they refer to the same entity. Each "E" type ID and NIL ID is distinct from one another. 3. entity-type - GPE, ORG, or PER type indicator for the entity 4. genre - WB/NW/DF indicating the source genre of the document for the query (WB for web data, NW for newswire data, or DF for discussion forum data). ./data/2011/eval/source_documents/* This directory contains all of the source documents listed in the attribute for each query in tac_kbp_2011_chinese_entity_linking_evaluation_queries.xml. See section 5 for more information about source documents. ./data/2011/training/tac_kbp_2011_chinese_entity_linking_sample_and_training_queries.xml This file is a concatenation of the queries files originally released in LDC2011E46 (sample) and LDC2011E55 (training). This file contains 2171 queries. Each query entry consists of the following fields: - A query ID formatted as the letters "EL_CLCMN_" (if a Chinese language query) or "EL_CLENG_" (if an English language query) plus a five-digit zero-padded, sequentially assigned integer (e.g. "EL_CLCMN_00001"). - The full namestring of the query entity. - An ID for a document in ./data/2011/training/source_documents/ from which the namestring was extracted. The queries are distributed by language and type as follows: KB-Link GPE ORG PER Total ---------------------------------------- CMN NW NIL: 124 293 426 843 CMN NW Non-NIL: 284 149 227 660 ENG NW NIL: 143 116 63 322 ENG NW Non-NIL: 122 100 100 322 ENG WB NIL: 0 1 0 1 ENG WB Non-NIL: 14 3 6 23 ---------------------------------------- Total: 687 662 822 2171 ./data/2011/training/tac_kbp_2011_chinese_entity_linking_sample_and_training_KB_links.tab This file is a concatenation of the KB_links files originally released in LDC2011E46 (sample) and LDC2011E55 (training). This file contains the responses for each query as identified by human annotators at LDC. This file is tab delimited, with 4 fields total. The column descriptions are as follows: 1. query ID - The ID for the query detailed in tac_kbp_2011_chinese_entity_linking_sample_and_training_queries.xml to which the subsequent information pertains 2. entity ID - A unique entity node ID or NIL ID, correspondent to entity linking annotation and NIL-coreference (clustering) annotation respectively. If the entity node ID begins with "E", the text refers to an entity in the Knowledge Base (TAC KBP Reference Knowledge Base - LDC2014T16). If the given query is not linked to an entity in the Knowledge Base (KB), then it is given a NIL-ID, which consists of "NIL" plus a four-digit zero-padded sequentially assigned integer (e.g. NIL-0001, NIL-0002). Both the entities with an entity node ID of "E" type and "NIL" type are assumed to be co-referenced (clustered), with the same "E" type ID or the same "NIL" ID if they refer to the same entity. Each "E" type ID and NIL ID is distinct from one another. 3. entity-type - GPE, ORG, or PER type indicator for the entity 4. genre - WB/NW/DF indicating the source genre of the document for the query (WB for web data, NW for newswire data, or DF for discussion forum data). ./data/2011/training/source_documents/* This directory contains all of the source documents listed in the of tac_kbp_2011_chinese_entity_linking_sample_and_training_queries.xml See section 5 for more information about source documents. ./data/2012/eval/tac_kbp_2012_chinese_entity_linking_evaluation_queries.xml This file contains 2122 queries. Each query entry consists of the following fields: - A query ID formatted as the letters "EL_CMN_" plus a five-digit zero-padded, sequentially assigned integer (e.g., "EL_CMN_00001"). - The full namestring of the query entity. - An ID for a document in ./data/2012/eval/source_documents/ from which the namestring was extracted. - The starting offset for the namestring. - The ending offset for the namestring. The queries are distributed by language and type as follows: KB-Link GPE ORG PER Total ---------------------------------------- CMN NW NIL: 99 89 167 355 CMN NW Non-NIL: 164 167 148 479 CMN WB NIL: 88 86 68 242 CMN WB Non-NIL: 131 112 110 353 ENG NW NIL: 90 79 68 237 ENG NW Non-NIL: 101 107 83 291 ENG WB NIL: 6 26 16 48 ENG WB Non-NIL: 26 52 39 117 ---------------------------------------- Total: 705 718 699 2122 ./data/2012/eval/tac_kbp_2012_chinese_entity_linking_evaluation_KB_links.tab This file contains the responses for each query as identified by human annotators at LDC. This file is tab delimited, with 5 fields total. The column descriptions are as follows: 1. query ID - The ID for the query detailed in tac_kbp_2012_chinese_entity_linking_evaluation_queries.xml to which the subsequent information pertains 2. entity ID - A unique entity node ID or NIL ID, correspondent to entity linking annotation and NIL-coreference (clustering) annotation respectively. If the entity node ID begins with "E", the text refers to an entity in the Knowledge Base (TAC KBP Reference Knowledge Base - LDC2014T16). If the given query is not linked to an entity in the Knowledge Base (KB), then it is given a NIL-ID, which consists of "NIL" plus a three-digit zero-padded sequentially assigned integer (e.g. NIL001, NIL002). Both the entities with an entity node ID of "E" type and "NIL" type are assumed to be co-referenced (clustered), with the same "E" type ID or the same "NIL" ID if they refer to the same entity. Each "E" type ID and NIL ID is distinct from one another. 3. entity-type - GPE, ORG, or PER type indicator for the entity 4. genre - WB/NW/DF indicating the source genre of the document for the query (WB for web data, NW for newswire data, or DF for discussion forum data). 5. web-search - (Y/N) indicating whether the annotator made use of web searches in order to make the linking judgment. ./data/2012/eval/source_documents/* This directory contains all of the source documents listed in the of tac_kbp_2012_chinese_entity_linking_evaluation_queries.xml See section 5 for more information about source documents. ./data/2012/training/tac_kbp_2012_chinese_entity_linking_training_queries.xml This file contains 158 queries. Each query entry consists of the following fields: - A query ID formatted as the letters "EL_CMN_" plus a five-digit zero-padded, sequentially assigned integer (e.g., "EL_CMN_00001"). - The full namestring of the query entity. - An ID for a document in ./data/2012/training/source_documents/ from which the namestring was extracted. - The starting offset for the namestring. - The ending offset for the namestring. The queries are distributed by language and type as follows: KB-Link GPE ORG PER Total ---------------------------------------- CMN NW NIL: 2 2 2 6 CMN NW Non-NIL: 0 2 0 2 CMN WB NIL: 16 16 17 49 CMN WB Non-NIL: 24 25 24 73 ENG WB NIL: 3 4 0 7 ENG WB Non-NIL: 7 5 9 21 ---------------------------------------- Total: 52 54 52 158 ./data/2012/training/tac_kbp_2012_chinese_entity_linking_training_KB_links.tab This file contains the responses for each query as identified by human annotators at LDC. This file is tab delimited, with 5 fields total. The column descriptions are as follows: 1. query ID - The ID for the query detailed in tac_kbp_2012_chinese_entity_linking_training_queries.xml to which the subsequent information pertains 2. entity ID - A unique entity node ID or NIL ID, correspondent to entity linking annotation and NIL-coreference (clustering) annotation respectively. If the entity node ID begins with "E", the text refers to an entity in the Knowledge Base (TAC KBP Reference Knowledge Base - LDC2014T16). If the given query is not linked to an entity in the Knowledge Base (KB), then it is given a NIL-ID, which consists of "NIL" plus a three-digit zero-padded sequentially assigned integer (e.g. NIL001, NIL002). Both the entities with an entity node ID of "E" type and "NIL" type are assumed to be co-referenced (clustered), with the same "E" type ID or the same "NIL" ID if they refer to the same entity. Each "E" type ID and NIL ID is distinct from one another. 3. entity-type - GPE, ORG, or PER type indicator for the entity 4. genre - WB/NW/DF indicating the source genre of the document for the query (WB for web data, NW for newswire data, or DF for discussion forum data). 5. web-search - (Y/N) indicating whether the annotator made use of web searches in order to make the linking judgment. ./data/2012/training/source_documents/* This directory contains all of the source documents listed in the of tac_kbp_2012_chinese_entity_linking_training_queries.xml See section 5 for more information about source documents. ./data/2013/eval/tac_kbp_2013_chinese_entity_linking_evaluation_queries.xml This file contains 2155 queries. Each query entry consists of the following fields: - A query ID formatted as the letters "EL13_CMN" plus a four-digit zero-padded, sequentially assigned integer (e.g., "EL13_CMN_0001"). - The full namestring of the query entity. - An ID for a document in ./data/2013/eval/source_documents/ from which the namestring was extracted. - The starting offset for the namestring. - The ending offset for the namestring. The queries are distributed by language and type as follows: KB-Link PER ORG GPE Total ----------------------------------------- CMN NW NIL: 123 197 125 445 CMN NW Non-NIL: 124 119 163 406 CMN WB NIL: 112 105 87 304 CMN WB Non-NIL: 173 150 162 485 ENG NW NIL: 52 16 68 136 ENG NW Non-NIL: 83 87 64 234 ENG WB NIL: 11 19 7 37 ENG WB Non-NIL: 28 42 38 108 ----------------------------------------- Total: 706 735 714 2155 ./data/2013/eval/tac_kbp_2013_chinese_entity_linking_evaluation_KB_links.tab This file contains the responses for each query as identified by human annotators at LDC. This file is tab delimited, with 6 fields total. The column descriptions are as follows: 1. query ID - The ID for the query detailed in tac_kbp_2013_chinese_entity_linking_evaluation_queries.xml to which the subsequent information pertains 2. entity ID - A unique entity node ID or NIL ID, correspondent to entity linking annotation and NIL-coreference (clustering) annotation respectively. If the entity node ID begins with "E", the text refers to an entity in the Knowledge Base (TAC KBP Reference Knowledge Base - LDC2014T16). If the given query is not linked to an entity in the Knowledge Base (KB), then it is given a NIL-ID, which consists of "NIL" plus a three-digit zero-padded sequentially assigned integer (e.g. NIL001, NIL002). Both the entities with an entity node ID of "E" type and "NIL" type are assumed to be co-referenced (clustered), with the same "E" type ID or the same "NIL" ID if they refer to the same entity. Each "E" type ID and NIL ID is distinct from one another. 3. entity-type - GPE, ORG, or PER type indicator for the entity 4. genre - WB/NW/DF indicating the source genre of the document for the query (WB for web data, NW for newswire data, or DF for discussion forum data). 5. web-search - (Y/N) indicating whether the annotator made use of web searches in order to make the linking judgment. 6. wiki text - (Y/N) indicating whether the annotator made use of the wiki text in the knowledge base (as opposed to just the infobox information) in order to make the linking judgment. ./data/2013/eval/source_documents/* This directory contains all of the source documents listed in the of tac_kbp_2013_chinese_entity_linking_evaluation_queries.xml See section 5 for more information about source documents. ./data/2014/eval/tac_kbp_2014_chinese_entity_linking_evaluation_queries.xml This file contains 2739 queries. Each query entry consists of the following fields: - A query ID formatted as the letters "EL14_CMN_" plus a four-digit zero-padded, sequentially assigned integer (e.g., "EL14_CMN_0001"). - The full namestring of the query entity. - An ID for a document in ./data/2014/eval/source_documents/ from which the namestring was extracted. - The starting offset for the namestring. - The ending offset for the namestring. The queries are distributed by language and type as follows: KB-Link PER ORG GPE Total --------------------------------------------- CMN DF NIL: 118 40 16 174 CMN DF Non-NIL: 426 61 66 553 CMN NW NIL: 179 413 300 892 CMN NW Non-NIL: 349 139 184 672 ENG DF NIL: 1 4 5 10 ENG DF Non-NIL: 5 26 25 56 ENG NW NIL: 10 65 32 107 ENG NW Non-NIL: 87 66 119 272 ENG WB Non-NIL: 1 0 2 3 --------------------------------------------- Total: 1176 814 749 2739 ./data/2014/eval/tac_kbp_2014_chinese_entity_linking_evaluation_KB_links.tab This file contains the responses for each query as identified by human annotators at LDC. This file is tab delimited, with 6 fields total. The column descriptions are as follows: 1. query ID - The ID for the query detailed in tac_kbp_2014_chinese_entity_linking_evaluation_queries.xml to which the subsequent information pertains 2. entity ID - A unique entity node ID or NIL ID, correspondent to entity linking annotation and NIL-coreference (clustering) annotation respectively. If the entity node ID begins with "E", the text refers to an entity in the Knowledge Base (TAC KBP Reference Knowledge Base - LDC2014T16). If the given query is not linked to an entity in the Knowledge Base (KB), then it is given a NIL-ID, which consists of "NIL" plus a three-digit zero-padded sequentially assigned integer (e.g. NIL001, NIL002). Both the entities with an entity node ID of "E" type and "NIL" type are assumed to be co-referenced (clustered), with the same "E" type ID or the same "NIL" ID if they refer to the same entity. Each "E" type ID and NIL ID is distinct from one another. 3. entity-type - GPE, ORG, or PER type indicator for the entity 4. genre - WB/NW/DF indicating the source genre of the document for the query (WB for web data, NW for newswire data, or DF for discussion forum data). 5. web-search - (Y/N) indicating whether the annotator made use of web searches in order to make the linking judgment. 6. wiki text - (Y/N) indicating whether the annotator made use of the wiki text in the knowledge base (as opposed to just the infobox information) in order to make the linking judgment. ./data/2014/eval/source_documents/* This directory contains all of the source documents listed in the of tac_kbp_2014_chinese_entity_linking_evaluation_queries.xml See section 5 for more information about source documents. ./data/2014/training/tac_kbp_2014_chinese_entity_linking_training_queries.xml This file contains 514 queries. Each query entry consists of the following fields: - A query ID formatted as the letters "EL14_CMN_TRAINING" plus a four-digit zero-padded, sequentially assigned integer (e.g., "EL14_CMN_TRAINING_0001"). - The full namestring of the query entity. - An ID for a document in ./data/2014/training/source_documents/ from which the namestring was extracted. - The starting offset for the namestring. - The ending offset for the namestring. The queries are distributed by language and type as follows: KB-Link PER ORG GPE Total ----------------------------------------- ENG DF NIL: 1 6 3 10 ENG DF Non-NIL: 33 37 41 111 CMN DF NIL: 28 46 6 80 CMN DF Non-NIL: 109 83 121 313 ----------------------------------------- Total: 171 172 171 514 ./data/2014/training/tac_kbp_2014_chinese_entity_linking_training_KB_links.tab This file contains the responses for each query as identified by human annotators at LDC. This file is tab delimited, with 6 fields total. The column descriptions are as follows: 1. query ID - The ID for the query detailed in tac_kbp_2014_chinese_entity_linking_training_queries.xml to which the subsequent information pertains 2. entity ID - A unique entity node ID or NIL ID, correspondent to entity linking annotation and NIL-coreference (clustering) annotation respectively. If the entity node ID begins with "E", the text refers to an entity in the Knowledge Base (TAC KBP Reference Knowledge Base - LDC2014T16). If the given query is not linked to an entity in the Knowledge Base (KB), then it is given a NIL-ID, which consists of "NIL" plus a three-digit zero-padded sequentially assigned integer (e.g. NIL001, NIL002). Both the entities with an entity node ID of "E" type and "NIL" type are assumed to be co-referenced (clustered), with the same "E" type ID or the same "NIL" ID if they refer to the same entity. Each "E" type ID and NIL ID is distinct from one another. 3. entity-type - GPE, ORG, or PER type indicator for the entity 4. genre - WB/NW/DF indicating the source genre of the document for the query (all DF or discussion forum threads in these data). 5. web-search - (Y/N) indicating whether the annotator made use of web searches in order to make the linking judgment. 6. wiki text - (Y/N) indicating whether the annotator made use of the wiki text in the knowledge base (as opposed to just the infobox information) in order to make the linking judgment. ./data/2014/training/source_documents/* This directory contains all of the source documents listed in the of tac_kbp_2014_chinese_entity_linking_training_queries.xml See section 5 for more information about source documents. ./dtd/2011_kbpentlink.dtd DTD for: tac_kbp_2011_chinese_entity_linking_evaluation_queries.xml tac_kbp_2011_chinese_entity_linking_sample_and_training_queries.xml ./dtd/2012_2013_2014_kbpentlink.dtd DTD for: tac_kbp_2012_chinese_entity_linking_evaluation_queries.xml tac_kbp_2012_chinese_entity_linking_training_queries.xml tac_kbp_2013_chinese_entity_linking_evaluation_queries.xml tac_kbp_2014_chinese_entity_linking_evaluation_queries.xml tac_kbp_2014_chinese_entity_linking_training_queries.xml 3. Annotation Given a name string and using information from the query's source document, bilingual Chinese/English-speaking annotators used a specialized search engine to look in the Knowledge Base for a page in which the entity referred to by the query was the central topic. If such a page was found, a link was created between the query and the matching KB node ID. If no matching page was found, the query was marked as NIL and later coreferenced with other NIL entities. Annotators were allowed to use online searching to assist in determining the KB link/NIL status. Queries for which a human annotator could not confidently determine the KB link status were removed from the final data sets. 4. Text Normalization Name string matches are case and punctuation sensitive. The only text normalization performed was: 1. conversion of newlines to spaces, except where preceding characters were hyphens ("-"), in which case newlines were removed 2. conversion of multiple spaces to a single space 5. Source Documents All the text data in the source files have been taken directly from previous LDC corpus releases, and are being provided here essentially "as-is", with little or no additional quality control. An overall scan of character content in the source collections indicates some relatively small quantities of various problems, especially in the web and discussion forum data, including language mismatch (characters from Chinese, Korean, Japanese, Arabic, Russian, etc.), and encoding errors (some documents have apparently undergone "double encoding" into UTF-8, and others may have been "noisy" to begin with, or may have gone through an improper encoding conversion, yielding occurrences of the Unicode "replacement character" (U+FFFD) throughout the corpus); the web collection also has characters whose Unicode code points lie outside the "Basic Multilanguage Plane" (BMP), i.e. above U+FFFF. All documents that have filenames beginning with "cmn-NG" and "eng-NG" are Web Document data (WB) and some of these fail XML parsing (see below for details). All files that start with "bolt-" are Discussion Forum threads (DF) and have the XML structure described below. All other files are Newswire data (NW) and have the newswire markup pattern detailed below. Note as well that some source documents are duplicated across a few of the separated source_documents directories, indicating that some queries from different data sets originated from the same source documents. As it is acceptable for sources to be reused for Entity Linking queries, this duplication is intentional and expected. The subsections below go into more detail regarding the markup and other properties of the three source data types: 5.1 Newswire Data Newswire data use the following markup framework: <HEADLINE> ... </HEADLINE> ...

...

... where the HEADLINE and DATELINE tags are optional (not always present), and the TEXT content may or may not include "

...

" tags (depending on whether or not the "doc_type_label" is "story"). All the newswire files are parseable as XML. 5.2 Discussion Forum Data Discussion forum files use the following markup framework: <headline> ... </headline> ... ... ... ... where there may be arbitrarily deep nesting of quote elements, and other elements may be present (e.g. "..." anchor tags). As mentioned in section 2 above, each unit contains at least five post elements. All the discussion forum files are parseable as XML. 5.3 Web Document Data "Web" files use the following markup framework: {doc_id_string} ... ... <BODY> <HEADLINE> ... </HEADLINE> ... ... ... </BODY> Other kinds of tags may be present ("", "", etc). Some of the web source documents contain material that interferes with XML parsing (e.g. unescaped "&", or "" tags that lack a corresponding ""). 6. Using the Data 6.1 Offset calculation The values of the beg and end XML elements in the later queries.xml files indicate character offsets to identify text extents in the source. Offset counting starts from the initial character (character 0) of the source document and includes newlines and all markup characters - that is, the offsets are based on treating the source document file as "raw text", with all its markup included. 6.2 Proper ingesting of XML queries While the character offsets are calculated based on treating the source document as "raw text", the "name" strings being referenced by the queries sometimes contain XML metacharacters, and these had to be "re-escaped" for proper inclusion in the queries.xml file. For example, an actual name like "AT&T" may show up a source document file as "AT&T" (because the source document was originally formatted as XML data). But since the source doc is being treated here as raw text, this name string is treated in queries.xml as having 7 characters (i.e., the character offsets, when provided, will point to a string of length 7). However, the "name" element itself, as presented in the queries.xml file, will be even longer - "AT&amp;T" - because the queries.xml file is intended to be handled by an XML parser, which will return "AT&T" when this "name" element is extracted. Using the queries.xml data without XML parsing would yield a mismatch between the "name" value and the corresponding string in the source data. 7. Copyright Information (c) 2015 Trustees of the University of Pennsylvania 8. Contact Information For further information about this data release, contact the following project staff at LDC: Joseph Ellis, Project Manager Jeremy Getman, Lead Annotator Stephanie Strassel, PI -------------------------------------------------------------------------- README created by Jeremy Getman on February 4, 2015 updated by Joe Ellis on February 16, 2015 updated by Jeremy Getman on February 17, 2015 updated by Joe Ellis on March 18, 2015


网友评论