The Salience text analytics engine is designed to support multiple languages with a single flexible codebase. During the development of support for each new language, Lexalytics creates components that are specific to the needs of the individual language. This page links to additional information about the specifics of each language that we support in Salience.
All Salience functionality is developed for analysis of English content first, with subsequent updates to deploy these techniques to the other languages we support. The table below describes the functionality currently available across the currently available language packs.
|Collection Query Topics||Y||Y||Y||Y||Y||Y||Y||Y||Y||Y|
|Collection Concept Topics||Y||Y||Y||Y||Y||Y||Y||Y||Y||Y|
Core NLP consists of document tokenization, POS tagging, and chunking. Document details enables access to core NLP results such as bigrams and trigrams, POS tags, term frequencies, etc.
All languages support phrase-based sentiment analysis, which is the recommended approach. Model-based sentiment is also supported with a default sentiment model in most languages, and a tool provided to enable customers to generate sentiment models from their own content.
Categorization functionality based on Wikipedia was released in Salience 5.1.1, support for this feature is currently only available for English. Intention extraction was released in Salience 6, support for this feature is currently only available in English.
The default threshold for entity extraction is 55. For improved recall in entity extraction from Chinese and Korean content, we recommend decreasing the default threshold to 35.
Entity relationship extraction is a pattern-based feature that is functionally supported in each language, but the patterns have not been translated into non-English languages.
Entity opinion extraction is a pattern-based feature that is functionally supported in each language, but the patterns have not been translated into non-English languages.
Updated 2 months ago