General tab

Use this tab to set general OCR attributes.

Option Description
Activate Use this combo box to activate the component according to a condition (see Conditional Activation).
Pass through Set this option to "Yes" to pass the original document to subsequent components in the workflow. You can use conditions in this field (see Conditional Activation).
Note: If you enter an invalid condition into Pass through box, the activation is "Yes" by default.
Input files Defines the file types that the component will process.

Enter a wildcard character and extension (such as *.pdf) to define a file type. Separate entries using a comma (,) or semicolon (;). By default this box lists the following file types: *.pdf;*.tif;*.tiff;*.jpg;*.jpeg;*.jfif;*.bmp;*.pcx;*.dcx;*.jp2;*.jpc;*.j2c;*.gif;*.png;*.jb2

You can use the following wildcard characters to specify file types:

  • * — Any string of characters.
  • ? — Any single character.
Languages Select the language of the text to be recognized from the list. If necessary, multiple languages may be entered by separating language names with a comma. You can use RRTs in this field to define language recognition at run time.
Note: RRTs used in this text box should be replaced with internal language names. To view internal language names, expand a language category node in the Select language dialog box and select a language. The internal name appears at the bottom of the dialog box.
Recognition mode Select the mode of recognition, that is, a desired balance of speed/errors rate. There are three recognition modes available:
  • Full mode - the recognition will be slow, but the error rate will be the least possible.
  • Balanced mode - the middle level mode between Full and Fast modes.
  • Fast mode - select this check box to provide 2-2.5 times faster recognition speed at the cost of a moderately increased error rate (1.5-2 times more errors). On good print quality texts with simple layouts, the OCR component makes an average of 1-2 errors per page, and such moderate increase in error rate can be easily tolerated in many cases, such as full text indexing with "fuzzy" searches.
Output OCR text as This group allows you to specify how to output the recognized text.
File Select this check box if you want to save recognized text as a file. The file is passed to the subsequent components.

Specify the file format for saving the recognition results manually or by selecting it from the drop-down list. Possible formats are TXT, CSV, HTML, PDF, PDF/A, RTF, DOCX, XLS, XLSX. If needed, multiple file formats may be entered with a "," separating formats. You can use RRTs from another component in this box. Specify the parameters of the output file in the Format Settings dialog box (see Format Settings).

Set up output file Click this button to open Format settings dialog box.
Run-time replacement ~FRO::OCRText~ Select this check box to save recognized text as the ~FRO::OCRText~ Runtime Replacement Tag.
Zoned OCR Select this check box to use zoned OCR. Recognized fields will be output as RRTs or/and as CSV files.
Set up zoned OCR Click this button to configure settings for a zoned OCR. This button is enabled only if the Zoned OCR check box is selected. This button opens the Setup Zoned OCR dialog box.
Note: It is mandatory to select at least one of the check boxes in the Output OCR text as group.