Document
getDocumentDetails
Summary
Gets various bits of useful information about the current text including term frequency analysis and document chunk information.
This method provides a wrapper around the underlying C API method lxaGetDocumentDetails.
Syntax
salience6.getDocumentDetails(oSession, acConfigurationID)
Parameters
| A SalienceSession object previously created via opensession |
---|---|
| An identifier for a configuration added through addConfiguration, or empty string for default configuration |
Returns
If successful, returns a Python dictionary containing with the following keys:
| A list of tokens contained with the document and their term frequencies |
---|---|
| A string containing the calculated fingerprint of the document (DEPRECATED) |
| A string providing the internal Salience representation of the document after preprocessing |
| A list of the individual sentences in the document, where each sentence item contains a structure of information about the sentence. |
| An integer giving the count of sentences in the document |
| An integer giving the count of words in the document |
Example
import salience6 as se6
session = se6.openSession('/path/to/license.v5','/path/to/data')
ret = se6.prepareTextFromFile(session,'/path/to/aFile.txt')
if (ret==0):
details = se6.getDocumentDetails(session,"")
print details
else:
if (ret==6):
print se6.getLastWarnings(session)
se6.closeSession(session)
getSummary
Summary
Returns a structure of summary information for the current text. The structure provides a default summary and an alternate summary, as well as ranking for the sentences in the summary.
The default summary method determines the most significant fragments in the document, and extracts the first sentence from those fragments. The alternate method extracts sentences that connect the most fragments.
This method provides a wrapper around the underlying C API method lxaGetSummary.
Syntax
salience6.getSummary(oSession, nLength, acConfigurationID)
Parameters
| A SalienceSession object previously created via opensession |
---|---|
| Maximum number of sentences for the summary |
| An identifier for a configuration added through addConfiguration, or empty string for default configuration |
Returns
If successful, returns a Python dictionary containing with the following keys:
| The default summary of the document |
---|---|
| Information for the individual sentences in the default summary, including ranking of importance in the summary |
| An alternate summary of the document |
| Information for the individual sentences in the alternate summary, including ranking of importance in the summary |
Example
import salience6 as se6
session = se6.openSession('/path/to/license.v5','/path/to/data')
ret = se6.prepareTextFromFile(session,'/path/to/aFile.txt')
if (ret==0):
summary = se6.getSummary(session, 3, "")
print summary.
else:
if (ret==6):
print se6.getLastWarnings(session)
se6.closeSession(session)
getDocumentSentiment
Summary
Returns sentiment analysis of the document text. This consists of results for phrase-based and model-based sentiment analysis.
This method provides a wrapper around the underlying C API method lxaGetSentiment.
Syntax
salience6.getDocumentSentiment(oSession, acConfigurationID)
Parameters
| A SalienceSession object previously created via opensession |
---|---|
| An identifier for a configuration added through addConfiguration, or empty string for default configuration |
Returns
If successful, returns a Python dictionary containing with the following keys:
| The phrase-based sentiment score for the document |
---|---|
| A list of phrases considered in phrase-based sentiment analysis, where each item contains a structure of information about a particular sentiment-bearing phrase |
| A list of model-based sentiment results, where each item contains a structure of information about sentiment analysis based on a specific model found in the data directory |
The DocumentSentiment object returned by GetDocumentSentiment has a getSentimentScore() function, and a getSentimentPhrases() function returning a vector of SentimentPhrases. The former is mostly an average of the scores of the latter.
Example
import salience6 as se6
session = se6.openSession('/path/to/license.v5','/path/to/data')
ret = se6.prepareTextFromFile(session,'/path/to/aFile.txt')
if (ret==0):
sentiment = se6.getDocumentSentiment(session, "")
print sentiment
else:
if (ret==6):
print se6.getLastWarnings(session)
se6.closeSession(session)
getDocumentThemes
Summary
Returns the themes of the text. This method provides a wrapper around the underlying C API method lxaGetThemes.
Syntax
salience6.getDocumentThemes(oSession, acConfigurationID)
Parameters
| A SalienceSession object previously created via opensession |
---|---|
| An identifier for a configuration added through addConfiguration, or empty string for default configuration |
Returns
If successful, returns a Python list consisting of items that contain the following information about a theme:
| The text of the theme |
---|---|
| The stemmed version of the theme |
| The normalized version of the theme |
| An indicator is this is a "meta-theme" (1) or not (0) |
| A measure of the strength of the theme within the document |
| The sentiment score for the theme |
| A measure (from 1 to 7) of the content on which the sentiment score for the theme is based |
| An indicator specifying if the theme is contained within the summary of the document |
| A summary of the document content relevant to the theme |
Example
import salience6 as se6
session = se6.openSession('/path/to/license.v5','/path/to/data')
ret = se6.prepareTextFromFile(session,'/path/to/aFile.txt')
if (ret==0):
themes = se6.getDocumentThemes(session, "")
for theme in themes:
print theme["theme"], theme["score"]
else:
if (ret==6):
print se6.getLastWarnings(session)
se5.closeSession(session)
getQueryDefinedTopics
Summary
Returns the topics determined for the text via user-defined queries. Before calling this method, you must specify the topic list using the Query Topic List option.
This method provides a wrapper around the underlying C API method lxaGetQueryDefinedTopics.
Syntax
salience6.getQueryDefinedTopics(oSession, acConfigurationID)
Parameters
| A SalienceSession object previously created via opensession |
---|---|
| An identifier for a configuration added through addConfiguration, or empty string for default configuration |
Returns
If successful, returns a Python list consisting of items that contain the following information about a topic:
| The label for the topic |
---|---|
| The number of query terms from the query definition which occur within the document |
| 0 (not used) |
| The sentiment score for document content associated with the topic |
| Summary related to query hits |
| 0 (indicates query topic result) |
Example
import salience6 as se6
session = se6.openSession('/path/to/license.v5','/path/to/data')
ret = se6.prepareTextFromFile(session,'/path/to/aFile.txt')
if (ret==0):
se6.setOption_QueryTopicList(session, '/path/to/queries.txt')
topics = se6.getQueryDefinedTopics(session, "")
for topic in topics:
print topic["topic"], topic["hits"], topic["score"]
else:
if (ret==6):
print se6.getLastWarnings(session)
se6.closeSession(session)
getConceptDefinedTopics
Summary
Returns the topics determined for the text via the Salience 6 Concept Matrix. Before calling this method, you must specify a concept topic list using the Concept Topic List option.
This method provides a wrapper around the underlying C API method lxaGetConceptDefinedTopics.
Syntax
salience6.getConceptDefinedTopics(oSession, acConfigurationID)
Parameters
| A SalienceSession object previously created via opensession |
---|---|
| An identifier for a configuration added through addConfiguration, or empty string for default configuration |
Returns
If successful, returns a Python list consisting of items that contain the following information about a topic:
| The label for the topic |
---|---|
| 0 (this field is not used) |
| Strength of the concept topic match to document content |
| Sentiment for content related to the topic within the document |
| Summary of content related to topic |
| 1 (indicates concept topic result) |
Example
import salience6 as se6
session = se6.openSession('/path/to/license.v5','/path/to/data')
ret = se6.prepareTextFromFile(session,'/path/to/aFile.txt')
if (ret==0):
se6.setOption_ConceptTopicList(session, '/path/to/queries.txt')
topics = se6.getConceptDefinedTopics(session, "")
for topic in topics:
print topic["topic"], topic["score"]
else:
if (ret==6):
print se6.getLastWarnings(session)
se6.closeSession(session)
explainConceptMatches
Summary
Returns a formatted block of text listing the concept topics determined for the text via the Salience 6 Concept Matrix, as well as individual terms that occur in the text that generate the matches. Before calling this method, you must specify a concept topic list using the Concept Topic List option.
This method has a longer execution time than the call to getConceptDefinedTopics and should be reserved for use in diagnostic or research interfaces or other application areas where a longer execution time is feasible.
This method provides a wrapper around the underlying C API method lxaExplainConceptMatches.
Syntax
salience6.explainConceptMatches(oSession, acConfigurationID)
Parameters
| A SalienceSession object previously created via opensession |
---|---|
| An identifier for a configuration added through addConfiguration, or empty string for default configuration |
Returns
If successful, returns a string containing a formatted block of text. Each line in the text string returned contains either a topic label and overall match score or (indented) a document term contributing to the match for a certain topic and the term match score.
Example
import salience6 as se6
session = se6.openSession('/path/to/license.v5','/path/to/data')
ret = se6.prepareTextFromFile(session,'/path/to/aFile.txt')
if (ret==0):
se6.setOption_ConceptTopicList(session, '/path/to/queries.txt')
matches = se6.explainConceptMatches(session, "")
print matches
else:
if (ret==6):
print se6.getLastWarnings(session)
se6.closeSession(session)
getDocumentCategories
Summary
This method returns the categories for a document based on a predefined set of categories, which has been extracted from Wikipedia content classification into a wide spectrum of categories. Customers have the ability to tune the category set through datafiles, allowing certain categories to be excluded from consideration, or tuning other categories through additional terms. Categories are returned as a list of Salience Topic structures.
This method provides a wrapper around the underlying C API method lxaGetDocumentCategories.
Syntax
salience6.getDocumentCategories(oSession, acConfigurationID)
Parameters
| A SalienceSession object previously created via opensession |
---|---|
| An identifier for a configuration added through addConfiguration, or empty string for default configuration |
Returns
If successful, returns a Python list consisting of items that contain the following information about a document category:
| The label for the topic/category |
---|---|
| An integer indicating the type of category result: 2=category node, 3=category leaf, 4=category explain info |
| An float value indicating the match score for the category |
| An float value indicating the sentiment for the category |
Example
import salience6 as se6
session = se6.openSession('/path/to/license.v5','/path/to/data')
ret = se6.prepareTextFromFile(session,'/path/to/aFile.txt')
if (ret==0):
categories = se6.getDocumentCategories(session, "")
for category in categories:
print category["topic"]
else:
if (ret==6):
print se6.getLastWarnings(session)
se6.closeSession(session)
getDocumentClasses
Summary
This method retrieves the classifications for a document based on the provided classification model.
This method provides a wrapper around the underlying C API method lxaGetDocumentClasses.
Syntax
salience6.getDocumentClasses(oSession, acClassificationFile, acConfigurationID)
Parameters
| A SalienceSession object previously created via opensession |
---|---|
| A path to a Salience-compatible classification model |
| An identifier for a configuration added through addConfiguration, or empty string for default configuration |
Returns
If successful, returns a Python list consisting of items that contain the following information about a document classification:
| The label for the classification |
---|---|
| A float value indicating the score for the classification. A threshold can be set using setOption_ClassificationThreshold |
getDocumentIntentions
Summary
Retrieves the intentions expressed within the document content. Intentions are returned as a list of SalienceIntention structures.
This method provides a wrapper around the underlying C API method lxaGetIntentions.
Syntax
salience6.getDocumentIntentions(oSession, acConfigurationID)
Parameters
| A SalienceSession object previously created via opensession |
---|---|
| An identifier for a configuration added through addConfiguration |
Returns
If successful, returns a Python list consisting of items that contain the following information about each intention identified in the document:
| The intention type, out of the set of defined intention types, that was detected |
---|---|
| The expresser of the intention, if detected. Otherwise, this list entry will be empty |
| The object of the intention, if detected. Otherwise, this list entry will be empty |
| The phrase expressing the intention |
| A child list containing positional information about the chunk identifying the expresser of the intention. This will be an empty list if "who" has not been detected |
| A child list containing positional information about the chunk identifying the object of the intention |
| A child list containing positional information about the chunk containing the intention |
Updated about 2 years ago