COMBINING LEXICAL RESOURCES IN A BROAD-COVERAGE SEMANTIC PARSER

John Dowding and Matthew Purver


Sponsored by the Stanford Humanities Center/Mellon Foundation
Graduate Research Program



We describe an on-going effort to produce the lexicon for a robust broad-coverage semantic parser by combining syntactic and semantic information from several publicly available lexical resources.

This parser is motivated by a need to extract propositional content from human-human meetings, as part of DARPA's CALO project. Extracting this content requires a broad-coverage lexicon, since the meeting topics are not determined in advance. The parser is applied to highly error-full speech recognition results (30%-40% word error rates, so it must be robust. These speech recognition results are represented as Word Confusion Networks (Tur, et. al., 2002), each of which may encode a large number of potential utterance hypotheses, so the parser must be fast. For these reasons, we decided on an approach that would depend heavily on the lexicon, with a relatively impoverished set of grammatical rules, focusing on extract basic predicate-argument structure, with less attention paid to more varied syntactic forms.

The resources we are currently using are COMLEX, VerbNet, WordNet, and NomLex. These resources each provide unique types of syntactic and semantic information:

- COMLEX (Grishman, Macleod, and Meyers, 1994) intends to provide detailed syntactic information for the 40,000 most common words of English. We extract from COMLEX lexical information for 4,200 adjectives (gradability and subcategorization), 5,665 verbs (subcategorization), 23,195 nouns (mass/count and temporality), and 3,120 adverbs (syntactic distribution), as well as most closed-class lexical categories. COMLEX also provides morphological variants for irregular forms.

- VerbNet (Kipper, Dang, and Palmer, 2000) provides semantic information for 5,000 verbs. This information includes the verb class, verb frames, thematic roles, syntax-semantic mapping, and selectional restrictions.

- From WordNet (Miller, 1995) we identify another 15,539 nouns, and the semantic class information for all nouns. These semantic classes are hand-aligned to the selectional classes used in VerbNet, based on the upper ontology of EuroWordNet.

- NOMLEX (Macleod et al., 1998) (and NOMLEXPLUS (Meyers et. al., 2004)) provide syntactic information for nominalizations, and information for mapping the noun arguments to the corresponding verb syntactic positions. When combined with VerbNet's selectional restrictions on thematic roles, this provides additional selection for nominalizations.

These lexical and grammar rules are converted to the Prolog-based format used in the Gemini framework (Dowding et. al., 1993), which includes a fast bottom-up robust parser in which syntactic and semantic information is applied interleaved. The semantic rules in this grammar produce a Minimal Recursion Semantics representation (Copestake, Flickenger, and Sag, 1997), motivated by a desire to make the semantic features extracted by the parser available as inputs to further machine learning algorithms for identifying higher-level semantic content, such as the action items that have been assigned, or decisions that have been made.

This work is similar to prior work by (Shi and Mihalcea, 2005), (Crouch and King, 2005), and (Swift, 2005). It differs from prior work primarily in the inclusion of NOMLEX, and the mapping of nominalizations to verb frames.

References


Copestake Ann, Dan Flickinger, and Ivan A. Sag. "Minimal Recursion Semantics. An introduction." CSLI, Stanford University. 1997.

Crouch, R. S.; King, T. H. "Unifying lexical resources. " Proceedings of Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes; 2005 February 28 - March 1; Saarbruecken; Germany. pp. 32-37.

Dowding, John, Jean Mark Gawron, Douglas Appelt, John Bear, Lynn Cherny, Robert Moore and Douglas Moran, "GEMINI: A Natural Language System for Spoken-Language Understanding", in Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, 1993, pp. 54-61.

Grishman, Ralph, Catherine Macleod and Adam Meyers (1994). "COMLEX Syntax: Building a Computational Lexicon", Coling 1994, Kyoto.

Kipper, Karin, Hoa Trang Dang, Martha Palmer. "Class-Based Construction of a Verb Lexicon." AAAI-2000 Seventeenth National Conference on Artificial Intelligence, Austin, TX, July 30 - August 3, 2000.

Miller, George A. "WordNet: a lexical database for English." In: Communications of the ACM 38 (11), November 1995, pp. 39-41

Macleod, Catherine, Ralph Grishman, Adam Meyers, Leslie Barrett, Ruth Reeves. "NOMLEX: A Lexicon of Nominalizations." Proceedings of EURALEX'98, Liege, Belgium, August 1998.

Meyers, Adam, Ruth Reeves, Catherine Macleod, Rachel Szekeley, Veronkia Zielinska, and Brian Young, "The Cross-Breeding of Dictionaries", Proceedings of LREC-2004, Lisbon, Portugal, 2004.

Shi, Lei and Rada Mihalcea, "Putting Pieces Together: Combining FrameNet, VerbNet and WordNet for Robust Semantic Parsing", in Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico, 2005

Swift, Mary. "Towards automatic verb acquisition from VerbNet for spoken dialog processing." In Proceedings of Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes, edited by Katrin Erk, Alissa Melinger & Sabine Schulte im Walde, pp. 115-120. Saarbruecken, Germany, February 2005.

Tur, G., J. Wright, A. Gorin, G. Riccardi. and D. Hakkani-Tur, "Improving spoken language understanding using word confusion networks", in Proceedings of the ICSLP, 2002.