Salience recognizes a number of different entity types, especially People, Places, Companies and Products. If you would like the software to recognize additional types of entities, or if you are having problems with certain Entities being ignored or picked up incorrectly, the data files under the entities folder can expand and refine the recognition system. The files for a type of entity are contained within the folder of that name (e.g. Entities/Companies). Entities/lists is used for defining new lists of entities. The following is a discussion of the data files you may wish to change.
The most commonly used, and most important features of the Salience Engine is its support of customer-defined lists. The default system recognizes entities for:
companies people places products email addresses dates
Frequently, users need to recognize other sorts of entities like: publishers or medical terms. In this case the user can build a custom dictionary or .cdl file. CDL files are built by the user and placed in the /data/salience/entities/lists folder. The file format is as follows:
word1 word2<tab>label word1 word2 word3 word4<tab>label
A .cdl file is included in the system as an example, publishers.cdl. It lists some of the major publishers in the United States. The user may build multiple .cdl files within the lists directory as each cdl file is hashed into the system when the Salience Engine session is created through the API. If you build a new list, any running programs that use Salience Engine will need to start a new session. A .cdl file entry will generally contain between 1 and 4 words, the maximum length of a CDL entry is 12 words.
Users may also choose to mix and match lists in a single .cdl file so that less files have to be managed by the user. The following is an example of a .cdl file that will detect cars and planes:
Subaru Car Corvette Car Cessna 160 Plane F22 Raptor Plane Volvo V70 Car
CDL files can also support in-line normalization of the customer-defined entities, through an optional third column in the CDL file.
Ford F150 Car Ford F-series truck Ford F250 Car Ford F-series truck Ford F350 Car Ford F-series truck
Updated about a year ago