Optical character recognition or OCR is the conversion of scanned documents of printed text into editable text file. It is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any kind of scanned. It is a common method of digitizing printed texts, documents, sales receipts, mail or any scanned documents so that they can be electronically searched, edited, stored more compactly and displayed online.
OCR scanned documents with the TIFF Viewer
- Fast and easy
Tiff Viewer OCR provides the possibility to transform scanned documents into editable and searchable text files, which can be conveniently viewed and shared via electronic means. Tiff Viewer OCR is also quick and accurate, ensuring the document's content remains intact while saving time and money. The ability to instantly search through content is extremely useful, especially in an office setting that has to deal with high document inflow.
In Tiff Viewer, users can OCR pages, scanned documents, receipts, multipage TIFF images, page ranges or even selected areas of any supported file format easily with only one click.
Also, users can configure the default text editor (for example Microsoft Word or Notepad) to open the OCR automatically and the Output Directory to make the user's experience and usability even better.
In order for a more accurate result, an OCR Language Pack is available for Tiff Viewer, which contains the character set for every major language. The OCR results will be better if the user uses the language the document written in.
NOTE: Optical Character Recognition runs in the background, so users can use Tiff Viewer even if a very large document was selected to OCR. When the process is finished, Tiff Viewer informs the user on the status bar at the bottom of Tiff Viewer and opens the extracted text.
Save OCR Data
In the Tiff Viewer users have the following options to handle the extracted OCR data:
Do not save OCR data - the TIFF Viewer will open the extracted OCR data in the selected application after the OCR process has finished, but the OCR data will not be saved in a separate file. If no application is configured for opening the output file, the Tiff Viewer will use the default text editor on the system.
Save OCR data to Tiff tag – the TIFF Viewer will save the OCR data to the Tiff tag of the currently opened Tiff file. No separate OCR text file will be created.
Save OCR data to separate file – If the ‘Save OCR data to separate file’ option is selected the Tiff Viewer will save the OCR text files next to the saved document. Also the users can select the directory where the OCR text or HTML files should be saved.
The user can select the file format to save the OCR data, from the dropdown menu option (next to the ‘Save OCR data to separate file’ option).
- If Plain text is selected, the TIFF Viewer saves the OCR data as plain text, with .txt extension.
- If HTML is selected, the TIFF Viewer saves the OCR data as .html file extension. HTML file format includes position and font information.
Black Ice TIFF Viewer OCR has a built-in whitelist and blacklist feature, where the user can specify a character sequence, and the OCR results will have to become one of the characters from the white-list.
NOTE: There must not be more than one newline characters at the end of the file.
To configure the OCR whitelist, open (or create if the file not exists) the whitelist.txt file with any text editor from the following location:
<TIFFVIEWER INSTALL DIR>\BiOCR\whitelist.txt
Edit the content of the file by entering the whitelist characters, for example:
Save the file to apply the configuration.
From now on, the TIFF Viewer will force the OCR result to become one of the characters in the whitelist.
To configure the OCR blacklist, open (or create if the file not exists) the blacklist.txt file with any text editor from the following location:
<TIFFVIEWER INSTALL DIR>\BiOCR\blacklist.txt
Edit the content of the file by entering the blacklist characters, for example:
Save the file to apply the configuration.
From now on, the TIFF Viewer will not display any character from the blacklist in the OCR result.