Extract Data with Azure Form Recognizer

Step 1. clone the repository
https://github.com/MicrosoftLearning/AI-102-AIEngineer/21-custom-form

Step 2. create a Form Recognizer resource in Azure Portal

Step 3. setup python env
edit C:\Hans\AI-102-AIEngineer\21-custom-form\setup.cmd with your values:

rem Set variable values 
set subscription_id=YOUR_SUBSCRIPTION_ID
set resource_group=YOUR_RESOURCE_GROUP
set location=YOUR_LOCATION_NAME

Then run the command to create a SAS URI:
(base) C:\Users\Student\miniconda3\AI-102-AIEngineer\21-custom-form>az login
(base) C:\Users\Student\miniconda3\AI-102-AIEngineer\21-custom-form>setup.cmd
Creating storage...
Uploading files...
Finished[#############################################################]  100.0000%
-------------------------------------
SAS URI: https://ai102form7685119.blob.core.windows.net/sampleforms?se=2022-01-01T00%3A00%3A00Z&sp=rwl&sv=2018-11-09&sr=c&sig=Wopn1A5klioFouoyYKV57hrFIO7SbkGJmjZV%2BIe7R6I%3D

Step 4. Train a model
pip install azure-ai-formrecognizer==3.0.0
edit train-model.py with your endpoint, key and SAS URI:

import os 

from azure.core.exceptions import ResourceNotFoundError
from azure.ai.formrecognizer import FormRecognizerClient
from azure.ai.formrecognizer import FormTrainingClient
from azure.core.credentials import AzureKeyCredential

def main(): 
     
    try: 
    
        # Get configuration settings 
        ENDPOINT='https://hansformrecognizer.cognitiveservices.azure.com/'
        KEY='f20ca70a5497484c9f239d3431df2757'
        trainingDataUrl = 'https://ai102form2397530048.blob.core.windows.net/sampleforms?se=2022-01-01T00%3A00%3A00Z&sp=rwl&sv=2018-11-09&sr=c&sig=3LQtq9KfelRXPSf6aqVN/Z3UcIN7KE1Net76W6alTGg%3D'

        # Authenticate Form Training Client
        form_recognizer_client = FormRecognizerClient(ENDPOINT, AzureKeyCredential(KEY))
        form_training_client = FormTrainingClient(ENDPOINT, AzureKeyCredential(KEY))

        # Train model 
        poller = form_training_client.begin_training(trainingDataUrl, use_training_labels=False)
        model = poller.result()

        print("Model ID: {}".format(model.model_id))
        print("Status: {}".format(model.status))
        print("Training started on: {}".format(model.training_started_on))
        print("Training completed on: {}".format(model.training_completed_on))

    except Exception as ex:
        print(ex)

if __name__ == '__main__': 
    main()
PS C:\Hans\AI-102-AIEngineer\21-custom-form\Python\train-model> python .\train-model.py
Model ID: 37951e13-645e-4364-a93e-96bb1bccdb78
Status: ready
Training started on: 2021-05-06 15:48:40+00:00
Training completed on: 2021-05-06 15:48:51+00:00

Step 5. Test the model
edit test-model.py with your Model ID generated in previous step:

import os 

from azure.core.exceptions import ResourceNotFoundError
from azure.ai.formrecognizer import FormRecognizerClient
from azure.ai.formrecognizer import FormTrainingClient
from azure.core.credentials import AzureKeyCredential

def main(): 
       
    try: 
    
        # Get configuration settings 
        ENDPOINT='https://hansformrecognizer.cognitiveservices.azure.com/'
        KEY='f20ca70a5497484c9f239d3431df2757'
         
        # Create client using endpoint and key
        form_recognizer_client = FormRecognizerClient(ENDPOINT, AzureKeyCredential(KEY))
        form_training_client = FormTrainingClient(ENDPOINT, AzureKeyCredential(KEY))

        # Model ID from when you trained your model.
        model_id = '37951e13-645e-4364-a93e-96bb1bccdb78'

        # Test trained model with a new form 
        with open('test1.jpg', "rb") as f: 
            poller = form_recognizer_client.begin_recognize_custom_forms(
                model_id=model_id, form=f)

        result = poller.result()

        for recognized_form in result:
            print("Form type: {}".format(recognized_form.form_type))
            for name, field in recognized_form.fields.items():
                print("Field '{}' has label '{}' with value '{}' and a confidence score of {}".format(
                    name,
                    field.label_data.text if field.label_data else name,
                    field.value,
                    field.confidence
                ))

    except Exception as ex:
        print(ex)

if __name__ == '__main__': 
    main()

verify the app:

C:\Hans\AI-102-AIEngineer\21-custom-form\Python\test-model> python .\test-model.py
Form type: form-0
Field 'field-0' has label 'Hero Limited' with value 'accounts@herolimited.com' and a confidence score of 0.53
Field 'field-1' has label 'Company Phone:' with value '555-348-6512' and a confidence score of 1.0
Field 'field-2' has label 'Website:' with value 'www.herolimited.com' and a confidence score of 1.0
Field 'field-3' has label 'Email:' with value '49823 Major Ave Cheer, MS, 38601' and a confidence score of 0.53
Field 'field-4' has label 'Dated As:' with value '04/04/2020' and a confidence score of 1.0
Field 'field-5' has label 'Purchase Order #:' with value '3929423' and a confidence score of 1.0
Field 'field-6' has label 'Vendor Name:' with value 'Seth Stanley' and a confidence score of 0.53
Field 'field-7' has label 'Company Name:' with value 'Yoga for You' and a confidence score of 1.0
Field 'field-8' has label 'Address:' with value '343 E Winter Road' and a confidence score of 1.0
Field 'field-9' has label 'Seattle, WA 93849 Phone:' with value '234-986-6454' and a confidence score of 0.53
Field 'field-10' has label 'Name:' with value 'Josh Granger' and a confidence score of 0.86
Field 'field-11' has label 'Company Name:' with value 'Granger Supply' and a confidence score of 0.53
Field 'field-12' has label 'Address:' with value '922 N Ebby Lane' and a confidence score of 0.53
Field 'field-13' has label 'Phone:' with value '932-294-2958' and a confidence score of 1.0
Field 'field-14' has label 'SUBTOTAL' with value '$6750.00' and a confidence score of 1.0
Field 'field-15' has label 'TAX' with value '$600.00' and a confidence score of 1.0
Field 'field-16' has label 'TOTAL' with value '$7350.00' and a confidence score of 1.0
Field 'field-17' has label 'Additional Notes:' with value 'Enjoy. Namaste. If you have any issues with your Yoga supplies please contact us directly via email or at 250-209-1294 during business hours.' and a confidence score 
of 0.53

3 Replies to “Extract Data with Azure Form Recognizer”

  1. whoah this blog is excellent i like studying your articles. Stay up the great work! You already know, lots of people are hunting around for this info, you could aid them greatly.

  2. Hello I am so happy I found your blog page, I really found you by error, while I was researching on Google for something else, Regardless I am here now and would just like to say thanks for a fantastic post and a all round exciting blog (I also love the theme/design), I donꊰ have time to browse it all at the minute but I have book-marked it and also added your RSS feeds, so when I have time I will be back to read much more, Please do keep up the superb work.

  3. Hmm it seems like your blog ate my first comment (it was super long) so I guess
    I’ll just sum it up what I had written and say, I’m thoroughly enjoying your blog.
    I too am an aspiring blog writer but I’m still new
    to the whole thing. Do you have any recommendations
    for rookie blog writers? I’d really appreciate it.

Leave a Reply

Your email address will not be published. Required fields are marked *