Design Goals 🎨
We built CardScan.ai with three guiding principles:
Security
Accuracy
Performance
Security
Our infrastructure, APIs, and UI Components have been built from the ground up with security and privacy in mind. Keeping our client's data and their user's data secure is our number one priority.
To do this we built a best-in-class security architecture, similar to what Google and Apple* use to secure data for millions.
We use defense in depth, starting with a robust authentication model which can be deployed on the end users' devices with minimal security exposure. Our authorization layer then limits access to the host, API, and resources. We are also tightly integrated with AWS's Identity and Access Management (IAM) framework allowing the use of roles and permissions boundaries adding further restrictions on a per-user basis.
Accuracy
The long tail in insurance card scanning can cause problems, especially with DIY solutions. We've solved this problem by building a large database of labeled card images to use in training our models.
There are over 950 health insurers in the US market and 8,000+ unique insurance card formats and layouts being used everyday.
Bad lighting, damaged cards, new layout, ambiguous fields -- it is essential that Cardscan.ai provides accurate scanning results in all challenging conditions.
We start with making sure the source image is as clear, complete, and as useable as possible.
After being uploaded to our servers card images are processed by multiple different OCR, NLP, and DNN models. At the end of the process, we classify all elements on the card, but only return the ones above our detection threshold. All results are returned with a list of confidence scores from each stage.
Read more about our machine learning pipeline below.
Performance
All of our production infrastructure runs on a multi-region architecture with auto-scaling resources and provisioned concurrency. For our models, we deploy on auto-scaling instances with modern GPUs that provide capacity for 1000s of executions per second.
Our API allows the end user to securely upload direct to S3 Edge locations, providing high performance in low bandwidth settings.
Individual API endpoints respond in milliseconds and the end-to-end processing should take 2-5 seconds depending on system load and card complexity.
We have built our UI Components to provide progress indications and visual feedback to the end user while the card is being processed.
Machine Learning
Our machine learning pipeline is comprised of four parts each with custom models:
The output of this pipeline is the label of each element (e.g. member_id
) the corresponding value (e.g 128845682
) and the probability. We represent the probability as a list with entries for each section of the pipeline.
The last two elements on the probability list are only included when a low score is detected in the first two.
Card Detection
All of our UI Components run a custom ML model to detect insurance cards and make sure they are in-frame and in focus. We then apply image pre-processing to reduce noise and correct bad lighting.
The detection model runs on-device and is trained on over 8000 images of cards in a variety of lighting conditions.
If you would like to use the card detection model in your own application without using our UI Components, please let us know.
Text Extraction
Once we have a high-quality card image uploaded to AWS S3 the image is run though an optical character recognition (OCR) model. This model extracts text out of the image and returns it in a machine-readable format.
Information Extraction
After the text has been extracted, it is run through a natural language processing (NLP) model which uses named-entity recognition (NER) to extract and label the results. This model connects the value 128845682
with the label member_number
and provides a probability for the match 0.9943
This model is fine-tuned with over 6000 labeled text extraction results.
Error Correction
The error correction process involved 5 custom models which are run on select elements or on an as-needed basis.
The highest importance elements on the card, group_number
and member_number
, are checked for accuracy with a custom LSTM model. This model is trained on over 600,000 cards.
Cards from one payer that represent around 5% of our volume do not perform well in our standard OCR pipeline. We've built a custom OCR-DNN to address this weakness.
The remaining 3 models are run on a small percentage of problem cards and help to fix errors in the Information Extraction process.
----
* - Members of our team have worked on security at Apple.
Last updated