TY - GEN
T1 - Answering definition questions
T2 - 5th International Conference on Web Information Systems and Technologies, WEBIST 2009
AU - Figueroa, Alejandro
AU - Atkinson, John
PY - 2010
Y1 - 2010
N2 - A crucial step in the answering process of definition questions, such as "Who is Gordon Brown?", is the ranking of answer candidates. In definition Question Answering (QA), sentences are normally interpreted as potential answers, and one of the most promising ranking strategies predicates upon Language Models (LMs). However, one of the factors that makes LMs less attractive is the fact that they can suffer from data sparseness, when the training material is insufficient or candidate sentences are too long. This paper analyses two methods, different in nature, for tackling data sparseness head-on: (1) combining LMs learnt from different, but overlapping, training corpora, and (2) selective substitutions grounded upon part-of-speech (POS) taggings. Results show that the first method improves the Mean Average Precision (MAP) of the top-ranked answers, while at the same time, it diminishes the average F-score of the final output. Conversely, the impact of the second approach depends on the test corpus.
AB - A crucial step in the answering process of definition questions, such as "Who is Gordon Brown?", is the ranking of answer candidates. In definition Question Answering (QA), sentences are normally interpreted as potential answers, and one of the most promising ranking strategies predicates upon Language Models (LMs). However, one of the factors that makes LMs less attractive is the fact that they can suffer from data sparseness, when the training material is insufficient or candidate sentences are too long. This paper analyses two methods, different in nature, for tackling data sparseness head-on: (1) combining LMs learnt from different, but overlapping, training corpora, and (2) selective substitutions grounded upon part-of-speech (POS) taggings. Results show that the first method improves the Mean Average Precision (MAP) of the top-ranked answers, while at the same time, it diminishes the average F-score of the final output. Conversely, the impact of the second approach depends on the test corpus.
KW - Data sparseness
KW - Definition question answering
KW - Definition questions
KW - Definition search
KW - Lexical dependency paths
KW - Web question answering
KW - n-gram language models
UR - http://www.scopus.com/inward/record.url?scp=77952781169&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-12436-5_22
DO - 10.1007/978-3-642-12436-5_22
M3 - Conference contribution
AN - SCOPUS:77952781169
SN - 3642124356
SN - 9783642124358
T3 - Lecture Notes in Business Information Processing
SP - 297
EP - 310
BT - Web Information Systems and Technologies - 5th International Conference, WEBIST 2009, Revised Selected Papers
PB - Springer Verlag
Y2 - 23 March 2009 through 26 March 2009
ER -