.../data/chunker
This directory contains files that support the chunker. For file that can be overridden/customized by users, click on the filename for more detailed information below.
A list of verbs that are commonly prefixed into longer verb phrases, e.g. "planned to go", "need to think", "gets to come" | |
---|---|
Externalization of linking verbs considered by the chunker. | |
Used by sentencetype.dat in identifying sentence types | |
Data file containing words that weight sentiment phrases following them in situations of mixed sentiment | |
Used by sentencetype.dat in identifying sentence types | |
Data file of words that are a direct multiplier of adjacent sentiment-bearing phrases | |
Externalization of patterns used by chunker to impact negation of chunks. | |
Externalization of the negators considered by the chunker. |
These files may be customized within a chunker
section of a user
directory, however it is not recommended.
auxiliary.dat
Certain verbs commonly occur in longer verb phrases. This file lists verbs that fall into this category to ensure the two verbs are part of the same chunk.
Back to top
copulas.dat
This file contains a list of verbs which the chunker uses as linking verbs. Note that this file does not contain all forms of individual verbs. For example, the verb "to be" in English is conjugated as "I am, you are, he/she/it is, we are, they are"; in this file you see the forms "are" and "be".
Back to top
describers.dat
This file is used by sentencetype.dat
in the identification of different sentence types, such as imperative sentences. If sentencetype.dat
is overridden in a user
directory, this file may also be needed.
Back to top
discourse.dat
This file is used to specify words that weight the sentiment of phrases following them. For example:
The restaurant was nice and clean, but the food was awful.
The use of the word "but" indicates a change in sentiment, and the weight for "but" in discourse.dat allows for slightly higher weighting on the sentiment phrases found at the end of the sentence based on observations that they convey the true sentiment intended.
Back to top
instructive_modal.dat
This file is used by sentencetype.dat
in the identification of different sentence types, such as imperative sentences. If sentencetype.dat
is overridden in a user
directory, this file may also be needed.
Back to top
intensifiers.dat
This file contains a list of words that will modify the the sentiment score of the next token only. The file format is:
<word> <tab> <intensifier-multiplier-amount>
Note: If a word is both an intensifier and a sentiment phrase, then it WILL NOT contribute its sentiment score to the document.
Back to top
negationbreaks.dat
This file contains a list of words which will stop negation in the middle of a chunk.
Back to top
negations.dat
This file contains a list of words that can negate (or invert) sentiment. The file format is:
<word-or-construction> <tab> <negation-multiplier-amount>
In all cases in the out-of-the-box default negations file, the inversion is an exact mirror. For example, the sentiment of "I enjoy watching baseball." is inverted when the following negation is encountered: "I never enjoy watching baseball."
Note: Entries without a negator multiplier amount are automatically assigned -1.0 for a value.
Back to top
Updated over 2 years ago