Salience runs machine learned models (for both core functionality and customer specific enhancements) through a process called MLHandler. Salience manages the lifecycle of the MLHandler process: it spawns it when needed and closes it down when no longer in use.
MLHandler will spawn many worker processes to support all running sessions of Salience. Again, this is handled automatically.
Cutting edge results in NLP use a technique called "Contextualized Word Embeddings" which require a hefty 16GB of GPU RAM. Since this is not something available to every customer, we plan on continuing to support simpler models indefinitely. It is usually not recommended that you run contextualized word embeddings on a CPU machine: the runtime will slow to many seconds per document. However, if there are business cases where high RAM GPU's are not available, but the runtime is worth the accuracy benefit (for example, testing the impact of these models before purchasing the GPU's), the models do run on CPU machines.
The one thing to be aware of if going down this path is that the code will attempt to use your GPU if it's available, even if it has insufficient RAM. This will lead to "Insufficient GPU RAM" allocation errors. The solution is to set the following environment variable before launching Salience:
"CUDA_VISIBLE_DEVICES" = "-1"
Updated about 1 year ago