OCR error correction attempts to correct characteristic errors introduced to the text during the OCR process. Lexalytics does not perform OCR itself. OCR error correction is like spell checking, but uses statistics about common OCR errors, which are different from the kinds of errors humans make when writing or typing.
Errors in text will lower Salience's accuracy in entity and sentiment detection. The OCR error correction in Salience focuses first on correcting punctuation. Entity and sentiment detection rides on top of parts of speech POS), and parts of speech ride on to of grammar. Extraneous or missing periods and commas cause the POS tagger to be less accurate, which drives down sentiment and entity detection accuracy.
Salience's OCR correction is optional. Use it when you have text processed with OCR from which you are extracting entities, sentiment or topics. Don't use it if the text was human-entered. Humans make different kinds of errors and you may make the text worse.
OCR correction functions only in English currently.
Updated about 1 year ago