NLP Notes (a few entries!)

keywords: NLP, AI, Reasoning

Computers are very stupid machines that can be programmed to do very intelligent things. Programmers are very intelligent people who are able to do very stupid things. This is a dangerous combination.

– Bill Bryson

Classification of subordinate Clauses in CGEL

1.  Finite

    1.  Content

        1.  Declarative (ordinary, subjunctive, and irrelis)

            1.  Expandable

                1.  Expanded

                2.  Bare

            2.  Non-expandable (Bare)

        2.  Interrogative (ordinary and subjunctive)

            1.  Open

            2.  Closed

        3.  Exclamative

    2.  Comparative

    3.  Relative

        1.  Integrated

            1.  Non-wh

                1.  That

                2.  Bare

            2.  Wh

        2.  Supplementary

            1.  Wh

            2.  Non-wh (rare)

        3.  Cleft

2.  Non-finite

    1.  Relative

        1.  Wh

        2.  Non-wh

    2.  Content

        1.  Ordinary

            1.  Infinitival

                1.  Bare

                2.  To

                    1.  Non-interogative

                    2.  Interogative

                        1.  Open

                        2.  Closed

            2.  Participal

        2.  Hollow

            1.  Infinitival

            2.  Gerund-participal

3.  Verbless

    1.  Comparative

    2.  Content

        1.  Interrogative

        2.  Declarative


The syntax and semantics of compounds is interesting and bears some additional discussion. The following has been adapted from Ray Jackendoff"s "Compounding in the Parallel Architecture and Conceptual Semantics". These schematic forms are for binary compounds. We commonly encounter deeper compounds in indices. The top-down nature of these compounds allows us to reduce all processing to sequences of binary compound constructions.

The following defines the formal N-N compound schemata (or constructions) that we encounter:

Argument schema: [$N_1$ $N_2$] = [$Y_2 (\dots, X_1, \dots)$] $\qquad$ ‘a $N_2$ by/of/... $N_1$

Modifier schema: [$N_1$ $N_2$] = [${Y_2}^ \forall$; [F($\dots, X_1, \dots, \forall, \dots$)]] $\qquad$ ‘an $N_2$ such that F is true of $N_1$ and $N_2$

Below is a list of the (most prominent) basic functions for English compounds, with examples. (This list is Jackendoff"s but is similar to those that have been produced by others). With one exception, these seem rather plausible as functions that are readily available pragmatically. (Reminder: X is the meaning of $N_1$, Y is the meaning of $N_2$, except in the last two cases.)

• CLASSIFY ($X_1$, $Y_2$), ‘$N_1$ classifies $N_2$’: beta cell, X-ray. This is the loosest possible relation, in which the meaning of $N_1$ plays only a classificatory role.

$Y_2$($X_1$), ‘(a/the) $N_2$ of/by $N_1$’: sea level, union member, wavelength, hairstyle, helicopter attack, tooth decay. This is the argument schema. It is sometimes reversible, with the extra coercion shown in (21b): $X_1$($Y_2$), ‘an $N_2$ that $N_1$’s things’: attack helicopter, curling iron, guard dog; also ‘an $N_2$ that people $N_1$’: chewing gum, drinking water.

• BOTH ($X_1$,$Y_2$), ‘both $N_1$ and $N_2$’: boy king, politician-tycoon. (“Dvandva” compounds)

• SAME/SIMILAR ($X_1$, $Y_2$), ‘$N_1$ and $N_2$ are the same/similar’: zebrafish, piggy bank, string bean, sunflower. This is not reversible, because the function is symmetric; asymmetry arises only through profiling.

• KIND ($X_1$, $Y_2$), ‘$N_1$ is a kind of $N_2$’: puppy dog, ferryboat, limestone. Reversible: seal pup, bear cub (there are other possible analyses as well, perhaps promiscuously)

• SERVES-AS ($Y_2$, $X_1$), ‘$N_2$ that serves as $N_1$’: handlebar, extension cord, farmland, retainer fee, buffer state.

• LOC ($X_1$, $Y_2$), ‘$N_2$ is located at/in/on $N_1$’: sunspot, window seat, tree house, background music, nose hair, donut hole. Reversible: ‘$N_1$ located at/in/on $N_2$’, or, reprofiled, ‘$N_2$ with $N_1$ at/in/on it’: raincloud, garlic bread, inkpad, stairwell, icewater, water bed.[i]

• LOCtemp ($X_1$, $Y_2$), ‘$N_2$ takes place at time $N_1$’: spring rain, morning swim, 3 a.m. blues. A special case of LOC ($X_1$, $Y_2$).

• CAUSE ($X_1$, $Y_2$), ‘$N_2$ caused by $N_1$’: sunburn, diaper rash, knife wound, surface drag.

• COMP ($Y_2$, $X_1$), ‘$N_2$ is composed of $N_1$’: felafel ball, rubber band, rag doll, tinfoil, brass instrument. Reversible: ‘$N_1$ is composed of $N_2$’, or, reprofiled, ‘$N_2$ that $N_1$ is composed of’: wallboard, bathwater, brick cheese, sheet metal.

• PART ($X_1$, $Y_2$), ‘$N_2$ is part of $N_1$’: apple core, doorknob, fingertip, stovetop, mold cavity. Reversible: ‘$N_2$ with $N_1$ as a part’: snare drum, lungfish, string instrument, ham sandwich, wheelchair. If $N_1$ is a mass noun, this relation paraphrases better as ‘$N_2$ is composed in part of $N_1$’: gingerbread, cinnamon bun, cheesecake, noodle soup. Reversible: ‘$N_2$ that forms part of $N_1$’: stew beef, cake flour, lunch meat.[ii]

• MAKE (X, Y, FROM Z), ‘X makes Y from Z.’ This creates two families of compounds, depending on which two arguments are mapped to $N_1$ and $N_2$.

  • $N_2$ made by $N_1$’: moonbeam, anthill, footprint, horse shit. Reversible: ‘$N_2$ that makes $N_1$’: honeybee, lightbulb, musk deer, textile mill
  • $N_2$ made from $N_1$’: apple juice, olive oil, grain alcohol, cane sugar, cornstarch. Reversible: ‘$N_2$ that $N_1$ is made from’: sugar beet, rubber tree.[iii]

• PROTECT (X, Y, FROM Z), ‘X protects Y from Z.’ This is the one function in the group that does not seem especially “basic.” It too creates two families of compounds.

  • $N_2$ protects $N_1$’: chastity belt, lifeboat, safety pin
  • $N_2$ protects from $N_1$’: mothball, flea collar, cough drop, mosquito net, sun hat

The basic functions and the action modalities can fill in F in the modifier schema above to build compound meanings, as in:

  • $window_1 \space seat_2 = {SEAT_2}^\forall$ ; [LOC ($\forall$ AT $WINDOW_1$)]

  • $felafel_1 \space ball_2 = {BALL_2}^\forall$; [COMP ($forall$, $FELAFEL_1$)]

(Additional schemas are described in Jackendoff’s paper for more remote cases.)

We can extend this model from abstract conceptual semantics to a formal ontology schema by identifying the domains of the NP constituents. It appears that a preference rule system can express the rule type inference set with good fidelity to what a human curator would produce.

Output is in the form of two kinds of assertions:

  • (New) terms that are probable concept names (or synonyms) and that are either common names or named entities, and
  • Inferred relationships, possibly involving an existing bootstrap ontology concept, along with zero or more types each with an indication of confidence.

Automatic incorporation of these assertions into an existing ontology is straightforward but should be reviewed by a domain expert.

NPN compounding patterns may also be inferred, though less informative for new ontological inferences from book index material. The forms are very rich as illustrated below.

Jackendoff compounding

Taxonomy of NPN from Jackendoff, 'Construction after Construction'

  • iThese cases verge closely on "X with Y as a part", below. It is not clear to me whether they are distinct.
  • iiThe difference between COMP and PART can be illustrated by the ambiguity of clarinet quartet. On the COMP reading it means "quartet of four clarinets"; on the PART reading it means "quartet of which a clarinet is a distinctive member", e.g. a clarinet and three strings.
  • iiiThis relation differs from PART in that Z is no longer identifiable as such in Y. However, the distinction is slippery.