Optical Character Recognition (OCR)

Optical Character Recognition (OCR)

OCR (Optical Character Recognition) is a new and advanced optical recognition technology. It extracts the text from an image or scanned document to be automatically stored and indexed in a database, among other features. As a general characteristic, this data recognition system is applied using regular expressions of pattern recognition within the text.

Among other characteristics, Axional/OCR facilitates data entry for various types of documents which serve as the starting point for company-specific business processes. Likewise, it simplifies information storage and filing, removing the need for access to the physical document in order to examine it in detail. OCR thus provides an efficient information entry system for company databases, making the integration of any structured physical document possible. This Axional model aims, more specifically, to automatically integrate digitized supplier invoices into the ERP system database.

As complementary advantages, it is worth mentioning that the automatization process inherent in digitization avoids errors associated with manual data entry, in addition to providing greater data reliability and efficiency in general. Moreover, automatization implies time savings, reducing tasks that do not add value to your company.

Axional/OCR‘s standout characteristics include:

  1. Batch processing of documents.
  2. Option to store files in document units. An uploaded file is subdivided into individual document units through document splitting parameters.
  3. Tool to manually subdivide a file into document units.
  4. Templates automatically applied to documents via multiple selection criteria.
  5. Application of multiple templates according to document model.
  6. Iterative location of field values.
  7. Multiple criteria to define the region and pattern of values to be extracted.
  8. Standardization of numerical and date fields according to language, country, or region.
  9. Validation of required information, with correspondences between master tables and key data extracted from the document.
  10. Validations of document items grouped together, according to inherent criteria of equivalence and relevance.
  11. Calculation of information for preset items.
  12. Adjustment of values after they have been initially obtained.
  13. Feedback in the validation loop, facilitating machine learning in the matching of the extracted data.
  14. Control of batch status and modifications.
  15. Control of document status and followup on document modifications.
  16. Drill across link to access the resulting documents.

All the processes which make up Axional OCR are detailed below, from gathering files to extracting information and generating the final invoice.

  • In the first step, the process that collects scanned invoice files embeds a layer of metacharacters in the PDF file for later recognition of invoice data. From this point onwards, the PDF always contains the information available for later extraction.
    Following the overlay of metacharacters, the moment the document enters the database, a document splitting filter is applied to differentiate each individual invoice from the whole file. The result is subdivided invoices, each stored in a separate file.
  • Secondly, using the supplier invoice template, a metadata search and extraction is performed on the text of the document. As such, prior setup of invoice templates and formats is required.
    A set of templates must be defined to specify the metadata search expressions which will extract relevant information from the scanned document.
    Users should keep in mind that assigning a specific template to a supplier invoice is an automatic process. Axiolab/OCR uses recognition to match the text of the digitized document with the most suitable supplier template.
    Axiolab/OCR can create multiple templates per supplier. One noteworthy feature is that the system will evaluate which template out of the available set offers the maximum number of matches, with the goal of optimizing the effectiveness and relevance of data extraction.
  • Once the text from the document has been extracted by applying the template’s data recognition expressions, that data is indexed in the database for validation. This step is followed by generation of an electronic invoice or other destination document in Axional/ERP.
    The data validation process confirms that all required information is available to generate a definitive invoice in the Axional system. At this intermediate point, the various information fields are editable to allow for adjustments, if invoice scanning was not effective enough to extract all information correctly.
    It is important to recognize that current text recognition procedures have their limitations, especially if the source of information is defective in some way.
    Another important characteristic of this module is that the system factors in user modifications. This feedback process allows the system to learn from digitization, so that the next processing of an invoice or document contains fewer information errors.
    In other words, for each template the database stores specific modifications to be used by default in future handling of invoices with that template.
  • Once the data validation process has verified all information is correct, the system automatically generates the invoice or destination document. This process will also reconcile delivery notes pending billing, if the applied invoice template contains information on the supplier’s purchase order or delivery note.
    If the invoice does not reconcile any delivery notes, the system will propose a direct purchase invoice. The process uses relevant information to identify whether the document corresponds to an invoice or a corrective invoice (credit note). Simultaneously, the system identifies if the purchase should be considered an investment or an expenditure.
  • Management of invoices created from the OCR module. Once an invoice is generated, amounts are validated via the document workflow and authorization procedures apply.
    As a final functionality, the system prevents duplicates both when uploading files and creating electronic invoices.

 

Axional/OCR provides a complete environment for the digitization, integration, management, and storage of your company’s physical documents.

Empower your business today

Our team is ready to offer you the best services