home
events
contacts
mailing list
directions
Linguistics
Department
Stanford
University |
Stanford
Humanities Center
Mellon
Foundation
Graduate
Research Workshop Program
Stanford
Semantics and Pragmatics Workshop:
THE
CONSTRUCTION OF MEANING
SemFest, March 14, CSLI:
14:45-15:15
Dominic Widdows and Scott Cederberg
Combining information to learn word-meanings
It is widely accepted that most of the nomenclature in an adult
vocabulary is learnt through encountering new words in some linguistic
context and inferring their meaning from those of familiar words. For
example, it is very unlikely that a person will know the meaning of
the word `mortgage' without first understanding the words `money',
`house' and `loan'. Much of this learning is through reading and there
is some evidence that new words are often not successfully learned in
a single or even a few encounters (Landauer and Dumais, 1997). It
follows that much word-learning must be done by combining evidence
gleaned from many different situations.
To judge whether this idea can be modelled in practice, we
investigated how the `semantic class' or genus of an unknown object
might be learned from a large corpus. The most widespread techniques
used to find class-names for words from corpora are variants of the
finite-state method developed by Hearst (1992), which relies on
distinct patterns like
"x such as y" and "y and other x"
to deduce that y is a kind of x. For example, the sentence
(1) They can also develop pressure sores on the elbows and other
joints. (BNC)
provides evidence that the `elbow' is a kind of `joint'.
There are (at least) two problems with this approach. The first
problem is that many of the relationships extracted by such methods
are out-of-context or simply wrong. To combat this, we have used the
notion of "latent semantic similarity" (Landauer and Dumais, 1997),
which can be used to measure whether two word or phrases share enough
broad contextual features to be semantically related at all, and to
filter out mistakes.
The second problem is data-sparseness: many significant relations of
this type may not be attested in such simple phrases but are
nonetheless learned by humans in the course of experience through
inference. One such train of inference is as follows:
(y is a kind of x) AND (y and z are in the same class of objects)
=> z is also a kind of x (*)
For example, if we already know that an elbow is a kind of joint, the
following sentence provides good evidence that a hip is also a kind of
joint:
(2) She says she knows people who need hip and elbow replacements. (BNC)
Coordination patterns such as those in (2) occur much more frequently
in corpora than patterns attesting direct object/genus relations such
as (1). In previous work we collected such instances of coordination
and developed a combinatoric algorithm to collect these examples into
recognized semantic classes with high reliability (Widdows and
Dorow, 2002), which enables the reasoning in (*) to be implemented
reliably on a large scale.
We will present examples of all of these techniques and how they can
be used together to present a model for lexical learning where both
coverage and accuracy are significantly improved by combining
different sources of information.
References
Hearst, M. (1992). Automated Acquisition of Hyponyms from Large Text
Corpora. Proceedings of the 14th International Conference on
Computational Linguistics, Nantes, France.
Landauer, T.K. and Dumais, S. T. (1997). A solution to Plato's
problem: The Latent Semantic Analysis theory of acquisition, induction
and representation of knowledge. Psychological Review, 104, pages
211--240.
Widdows, D. and Dorow, B. (2002). A graph model for unsupervised
lexical acquisition. Proceedings of the 19th International Conference
on Computational Linguistics, pages 1093--1099, Taipei, Taiwan.
Please contact one of the workshop organizers
if you have suggestions for presentations or the workshop in general.
Back to the workshop homepage.
This workshop is sponsored by
the Stanford Humanities Center, and funded by a grant from the Mellon
Foundation.
|