Salience

The Salience Developer Hub

Welcome to the Salience developer hub. You'll find comprehensive guides and documentation to help you start working with Salience as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started

Preprocess Files

/data/salience/preprocess.dat enables you to replace text in your document with different text before it is tokenized. This is useful if you are trying to detect long entities (you can replace them with a different token) clean up HTML etc.

NOTE: When adding this file to a data directory on Linux, please ensure that the filename is preprocess.dat (all lower-case).

The format of the file is:

text-to-be-replaced<tab>replacement-text

It is important to note that the replacements are carried out in order, so if you have replacements for 'the swimming team' and 'the swimming teams' then you should ensure that 'the swimming teams' is before 'the swimming team' otherwise the text will already have been replaced. You cannot use multiple preprocess files: all preprocess text must share a file.

Updated 5 months ago

Preprocess Files


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.