OCR Error Correction

What is OCR error correction?

OCR error correction attempts to correct characteristic errors introduced to the text during the OCR process. Lexalytics does not perform OCR itself. OCR error correction is like spell checking, but uses statistics about common OCR errors, which are different from the kinds of errors humans make when writing or typing.

Why use it?

Errors in text will lower Salience's accuracy in entity and sentiment detection. The OCR error correction in Salience focuses first on correcting punctuation. Entity and sentiment detection rides on top of parts of speech POS), and parts of speech ride on to of grammar. Extraneous or missing periods and commas cause the POS tagger to be less accurate, which drives down sentiment and entity detection accuracy.

When to use it

Salience's OCR correction is optional. Use it when you have text processed with OCR from which you are extracting entities, sentiment or topics. Don't use it if the text was human-entered. Humans make different kinds of errors and you may make the text worse.


OCR correction functions only in English currently.