Application of Heuristics

greenspun.com : LUSENET : Brandeis CS114 : One Thread

In the problem specifications, a heuristic example is given: NAME : "adjective_noun", TAGS : ['NN', 'VB'], LEFT : ['JJ'], RIGHT : [], ACTION : [DELETE, ['VB']]

In coding the application of this heuristic, is it meant that the left tag MUST be 'JJ', or merely that it CAN be 'JJ'. In other words, should our tagger apply this heuristic to the word 'dog' in the theoretical example [['red', 'JJ', 'NN'], ['dog', 'NN', 'VB']]??

Similarly, should the data under TAGS exactly match the tags of the token in question, or does it just need to be a subset of the tags of the token in question? (would it apply to the heuristic to 'dog' if the example was [['red', 'JJ'], ['dog', 'NN', 'VB', 'DT']]?)

I realize that these questions are somewhat subjective, and depend on our own personal choices for implementation (a lenient tagger that throws out wrong tags vs. a strict tagger that doesn't disambiguate enough tags), but i was just wondering if any one method is preferred over the other. Thank you.

-- Anonymous, March 11, 1999

Answers

my .25US$:

I think in the example given, the tags should _exactly_ match the case. I would say just create seperate cases for each of the tags in question if you need to. Another scenerio is to have an 'all' character, such as *, which can mean any other tags can be present. ("left = 'NN' 'JJ' *" or something similar) But this opens us up to inaccurate tagging. Personally I'm just creating seperate entries for each case.

If you look in the lexicon, you'll see many of the adjectives are tagged just as JJ. Others are tagged as JJ NN etc., but there could be two nouns next to each other. The easiest way to start unambiguating the text is by going from the tags with singular entries (NN, NNP) and either deleting entries or reducing entries to one from surrounding words. Perhaps you can find a heuristic to eliminate singular entries from tags... such as a NN never having a JJ following it. If this is the case you can delete the NN tag, perhaps leaving a more useful set of tags.

Now a question of my own: can we add simple heuristics which can disambiguate on the basis of word suffixes and prefixes? Example: -ous, -ful -> JJ if not NNP.

Andrew

-- Anonymous, March 11, 1999


Moderation questions? read the FAQ