home

events

contacts

mailing list


directions



Linguistics Department

Stanford University

Stanford Humanities Center
Mellon Foundation
Graduate Research Workshop Program

 Stanford Semantics and Pragmatics Workshop:

THE CONSTRUCTION OF MEANING



SemFest, March 14, CSLI:

14:45-15:15

Dominic Widdows and Scott Cederberg

Combining information to learn word-meanings

It is widely accepted that most of the nomenclature in an adult vocabulary is learnt through encountering new words in some linguistic context and inferring their meaning from those of familiar words. For example, it is very unlikely that a person will know the meaning of the word `mortgage' without first understanding the words `money', `house' and `loan'. Much of this learning is through reading and there is some evidence that new words are often not successfully learned in a single or even a few encounters (Landauer and Dumais, 1997). It follows that much word-learning must be done by combining evidence gleaned from many different situations.
To judge whether this idea can be modelled in practice, we investigated how the `semantic class' or genus of an unknown object might be learned from a large corpus. The most widespread techniques used to find class-names for words from corpora are variants of the finite-state method developed by Hearst (1992), which relies on distinct patterns like
"x such as y" and "y and other x"
to deduce that y is a kind of x. For example, the sentence
(1) They can also develop pressure sores on the elbows and other joints. (BNC)
provides evidence that the `elbow' is a kind of `joint'.
There are (at least) two problems with this approach. The first problem is that many of the relationships extracted by such methods are out-of-context or simply wrong. To combat this, we have used the notion of "latent semantic similarity" (Landauer and Dumais, 1997), which can be used to measure whether two word or phrases share enough broad contextual features to be semantically related at all, and to filter out mistakes.
The second problem is data-sparseness: many significant relations of this type may not be attested in such simple phrases but are nonetheless learned by humans in the course of experience through inference. One such train of inference is as follows:
(y is a kind of x) AND (y and z are in the same class of objects) => z is also a kind of x (*)
For example, if we already know that an elbow is a kind of joint, the following sentence provides good evidence that a hip is also a kind of joint:
(2) She says she knows people who need hip and elbow replacements. (BNC)
Coordination patterns such as those in (2) occur much more frequently in corpora than patterns attesting direct object/genus relations such as (1). In previous work we collected such instances of coordination and developed a combinatoric algorithm to collect these examples into recognized semantic classes with high reliability (Widdows and Dorow, 2002), which enables the reasoning in (*) to be implemented reliably on a large scale.
We will present examples of all of these techniques and how they can be used together to present a model for lexical learning where both coverage and accuracy are significantly improved by combining different sources of information.
References
Hearst, M. (1992). Automated Acquisition of Hyponyms from Large Text Corpora. Proceedings of the 14th International Conference on Computational Linguistics, Nantes, France.
Landauer, T.K. and Dumais, S. T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, pages 211--240.
Widdows, D. and Dorow, B. (2002). A graph model for unsupervised lexical acquisition. Proceedings of the 19th International Conference on Computational Linguistics, pages 1093--1099, Taipei, Taiwan.

Please contact one of the workshop organizers if you have suggestions for presentations or the workshop in general.
Back to the workshop homepage.




This workshop is sponsored by the Stanford Humanities Center, and funded by a grant from the Mellon Foundation.













This page is maintained by Judith Tonhauser