Restrictions and limitations

  • The OCR component supports image files in the following formats:
    Format Description
    • Files in PDF format (Version 1.7 or earlier)
    • 2-bit — Uncompressed black and white
    • 4- and 8-bit — Uncompressed Palette, RLE compressed Palette
    • 16-bit — Uncompressed, Uncompressed Mask
    • 24-bit — Uncompressed
    • 32-bit — Uncompressed, Uncompressed Mask
    PCX, DCX
    • Black and white
    • 2-, 4- and 8-bit Palette
    • 24 bit color
    • Gray, color
    JPEG 2000
    • Gray, color
    • Black and white
    • Black and white — Uncompressed, CCITT3, CCITT3FAX, CCITT4, Packbits, ZIP, LZW
    • Gray — Uncompressed, Packbits, JPEG, ZIP, LZW
    • 24-bit Color — uncompressed, JPEG, ZIP, LZW
    • 1-, 4-, 8- bit palette — Uncompressed, Packbits, ZIP, LZW
    • Multi image TIFF
    • Black and white - LZW-compressed
    • 2-, 3-, 4-, 5-, 6-, 7-, 8-bit palette — LZW-compressed
    • Black and white, gray, color
  • The resolution of input image should not be less than 300 dpi for correct recognition.
  • When multipage text is inserted in one of the XLS fields, then all fields in the output XSL file become empty. This does not occur for one page text.
  • This component extracts only text and graphic information from the input document and does not extract metadata. Therefore, if the original PDF document has bookmarks, hyperlinks, interactive elements, annotations, and so on, the component will not export these elements to the output document.
  • Sometimes, when you select the JPEG color picture format for PDF and PDF\A, the component saves the image with JPEG and ZIP compression.
  • If the original document contains both color (or gray) and black-and-white pages and the output PDF or PDF\A file's mode is Text under the page image, then the JPEG quality value will be ignored for the entire document. In such cases the component will save color pages with 100% JPEG quality. A corresponding warning message will appear at run-time, unless the specified picture format is CCITT4, which does not use JPEG quality. If the format is CCITT4 then color pages will be converted into black-and-white.
  • The component does not convert color PDF images to grayscale and black-and-white ones when is used on the workflow server 6.0.