As mentioned in our previous article Microsoft 365: Microsoft Syntex Updates Part 1, Microsoft continues to leverage AI and machine learning to develop and enhance an end-to-end Intelligent Document Processing (IDP) solution for its customers.
Previously, we reviewed Syntex updates for Content Assembly, Translation, Summarization, Optical Character Recognition (OCR), Taxonomy Tagging, Image Tagging, and the new preview features for Pay-as-You-Go customers. In this article, we continue our journey of discovering the latest updates to Syntex with Intelligent Document Processing.
Intelligent Document Processing
As part of the IDP solution, Syntex integrates AI from Microsoft AI Builder, Microsoft Azure, and other Microsoft sources, to bring structure to content. Content can be tagged with extracted data, classified, and with the application of sensitivity and retention labels, can be secured. Content is seamlessly transformed into knowledge as the taxonomy tags and labels are integrated into Microsoft Viva Topics.
By understanding, categorizing, and tagging content, Document Processing encompasses the ability to analyze, navigate, and secure data through its data processing services. There are two model types in Microsoft Syntex including Custom Models and Prebuilt Models. Custom Models consist of Structured, Freeform, and Unstructured Document Processing. Prebuilt Models include Contract, Invoice, and Receipt processing.
There are three document processing services for custom models and the custom model that will be used is determined by the types of files that are used, the structure and format of the files, and where the model is to be applied. The Custom Models can be visualized in the image below:
The Custom Models include:
Structured Document Processing: The Layout Method – Train a Model by Marking the Location of the Content to Extract
Utilizing Microsoft Power Apps AI Builder document processing, models are created and trained within Syntex. Structured Document Processing supports the widest range of languages. From example documents, it is trained to understand the layout of your form. It then learns to look for data from similar locations and extracts the data.
This type of model is best applied for structured or semi-structured documents such as invoices and forms as it is trained to automatically identify fields and table values.
Use the Layout Method option when creating a structured document processing model.
Unstructured Document Processing: The Teaching Method – Teach a Model How to Understand Text to Classify and Extract
The unstructured document processing model will automatically classify documents and extract the information. This type of model is best applied for unstructured documents such as letters or contracts, but these documents must have identifiable text based on phrases or patterns. The identified text designates both the file it is (its classification) and what is to be extracted (its extractors). The identified text, or text string, will determine the classification while the text following the text string will be the data that is to be extracted.
This model supports files using the Latin alphabet (English characters) only but supports the widest range of file types.
Use the Teaching Method when creating an Unstructured Document Processing model.
Freeform Document Processing: The Freeform Selection Method – Train a Model by Selecting Content Anywhere in a File
Using Microsoft Power Apps AI Builder, Freeform Document Processing models create and train models within Syntex. This model automates the process of extracting information and text from unstructured and freeform documents, including emails, letters, contracts, and so on. In these documents, the information that needs to be extracted can appear anywhere.
This model supports English documents in PDF format or image files that do not require automatic classification of the document type.
Use the Freeform Selection Method when creating a Freeform Document Processing model.
What is a Prebuilt Model? A prebuilt document processing model has already been trained for specific structured documents and can be used when a Custom Model is not required. The Prebuilt Models can be visualized in the image below:
The Prebuilt Models include:
Contract Processing: Trained to Extract Contract Information
Key information from contract documents can be analyzed and extracted with the contract processing model. The API analyzes Contracts in various formats are analyzed by the API while extracting key contract information including party or client name, jurisdiction, billing address, and date of expiration.
Invoice Processing: Trained to Extract Invoice Information
Key information from sales invoices is analyzed and extracted with the invoice processing model. Information such as customer name, billing address, due date, and amount due from various invoice formats are analyzed and extracted by the API.
Receipt Processing: Trained to Extract Receipt Information
Key information from sales receipts is analyzed and extracted with the receipt processing model. The API analyzes printed and handwritten receipts and extracts key receipt information including transaction date, merchant name, tax, merchant phone number, and transaction total.
Content from these processes, as well as the ones discussed in Microsoft 365: Microsoft Syntex Updates Part 1, can be electronically reviewed and signed with Syntex eSignature alongside Adobe Acrobat Sign or DocuSign. Content remains within Microsoft 365, keeping it safe and secure.
Designed to connect your content and powered by AI and technology, Syntex will help users discover and reuse content through search, efficiently streamline business processes with its integration into workflows, manage content with taxonomies and document processes, and the prevention of data loss through compliance and the capabilities of eSignature. Microsoft Syntex innovates and raises the standard of Content AI technology.