|
|
COMBINING LEXICAL RESOURCES IN A BROAD-COVERAGE SEMANTIC PARSER
John Dowding and Matthew Purver
Sponsored by the Stanford Humanities Center/Mellon Foundation
Graduate Research Program
We describe an on-going effort to produce the lexicon for a robust
broad-coverage semantic parser by combining syntactic and semantic
information from several publicly available lexical resources.
This parser is motivated by a need to extract propositional content
from human-human meetings, as part of DARPA's CALO project.
Extracting this content requires a broad-coverage lexicon, since the
meeting topics are not determined in advance. The parser is applied
to highly error-full speech recognition results (30%-40% word error
rates, so it must be robust. These speech recognition results are
represented as Word Confusion Networks (Tur, et. al., 2002), each of
which may encode a large number of potential utterance hypotheses, so
the parser must be fast. For these reasons, we decided on an approach
that would depend heavily on the lexicon, with a relatively
impoverished set of grammatical rules, focusing on extract basic
predicate-argument structure, with less attention paid to more varied
syntactic forms.
The resources we are currently using are COMLEX, VerbNet, WordNet,
and NomLex. These resources each provide unique types of syntactic
and semantic information:
- COMLEX (Grishman, Macleod, and Meyers, 1994) intends to provide
detailed syntactic information for the 40,000 most common words of
English. We extract from COMLEX lexical information for 4,200
adjectives (gradability and subcategorization), 5,665 verbs
(subcategorization), 23,195 nouns (mass/count and temporality), and
3,120 adverbs (syntactic distribution), as well as most closed-class
lexical categories. COMLEX also provides morphological variants for
irregular forms.
- VerbNet (Kipper, Dang, and Palmer, 2000) provides semantic
information for 5,000 verbs. This information includes the verb
class, verb frames, thematic roles, syntax-semantic mapping, and
selectional restrictions.
- From WordNet (Miller, 1995) we identify another 15,539 nouns, and
the semantic class information for all nouns. These semantic
classes are hand-aligned to the selectional classes used in VerbNet,
based on the upper ontology of EuroWordNet.
- NOMLEX (Macleod et al., 1998) (and NOMLEXPLUS (Meyers et. al.,
2004)) provide syntactic information for nominalizations, and
information for mapping the noun arguments to the corresponding verb
syntactic positions. When combined with VerbNet's selectional
restrictions on thematic roles, this provides additional selection
for nominalizations.
These lexical and grammar rules are converted to the Prolog-based
format used in the Gemini framework (Dowding et. al., 1993), which
includes a fast bottom-up robust parser in which syntactic and
semantic information is applied interleaved. The semantic rules in
this grammar produce a Minimal Recursion Semantics representation
(Copestake, Flickenger, and Sag, 1997), motivated by a desire to make
the semantic features extracted by the parser available as inputs to
further machine learning algorithms for identifying higher-level
semantic content, such as the action items that have been assigned, or
decisions that have been made.
This work is similar to prior work by (Shi and Mihalcea, 2005),
(Crouch and King, 2005), and (Swift, 2005). It differs from prior
work primarily in the inclusion of NOMLEX, and the mapping of
nominalizations to verb frames.
References
Copestake Ann, Dan Flickinger, and Ivan A. Sag. "Minimal Recursion
Semantics. An introduction." CSLI, Stanford University. 1997.
Crouch, R. S.; King, T. H. "Unifying lexical resources. " Proceedings
of Interdisciplinary Workshop on the Identification and Representation
of Verb Features and Verb Classes; 2005 February 28 - March 1;
Saarbruecken; Germany. pp. 32-37.
Dowding, John, Jean Mark Gawron, Douglas Appelt, John Bear, Lynn
Cherny, Robert Moore and Douglas Moran, "GEMINI: A Natural Language
System for Spoken-Language Understanding", in Proceedings of the
31st Annual Meeting of the Association for Computational Linguistics, 1993,
pp. 54-61.
Grishman, Ralph, Catherine Macleod and Adam Meyers (1994). "COMLEX
Syntax: Building a Computational Lexicon", Coling 1994, Kyoto.
Kipper, Karin, Hoa Trang Dang, Martha Palmer. "Class-Based Construction
of a Verb Lexicon." AAAI-2000 Seventeenth National Conference on
Artificial Intelligence, Austin, TX, July 30 - August 3, 2000.
Miller, George A. "WordNet: a lexical database for English." In:
Communications of the ACM 38 (11), November 1995, pp. 39-41
Macleod, Catherine, Ralph Grishman, Adam Meyers, Leslie Barrett, Ruth
Reeves. "NOMLEX: A Lexicon of Nominalizations." Proceedings of
EURALEX'98, Liege, Belgium, August 1998.
Meyers, Adam, Ruth Reeves, Catherine Macleod, Rachel Szekeley, Veronkia
Zielinska, and Brian Young, "The Cross-Breeding of Dictionaries",
Proceedings of LREC-2004, Lisbon, Portugal, 2004.
Shi, Lei and Rada Mihalcea, "Putting Pieces Together: Combining
FrameNet, VerbNet and WordNet for Robust Semantic Parsing", in
Proceedings of the Sixth International Conference on Intelligent Text
Processing and Computational Linguistics, Mexico, 2005
Swift, Mary. "Towards automatic verb acquisition from VerbNet for spoken
dialog processing." In Proceedings of Interdisciplinary Workshop on the
Identification and Representation of Verb Features and Verb Classes,
edited by Katrin Erk, Alissa Melinger & Sabine Schulte im Walde,
pp. 115-120. Saarbruecken, Germany, February 2005.
Tur, G., J. Wright, A. Gorin, G. Riccardi. and D. Hakkani-Tur,
"Improving spoken language understanding using word confusion
networks", in Proceedings of the ICSLP, 2002.
|
|