The objective for this service is to enable a user to snap a photo of a paper invoice and have the service pay the invoice for them. For this to work the service needs to capture the relevant data from the paper invoice for the service to create a payment on behalf of the user with the data required. The scope of this project is _not_ to deal with the payment, but to capture, analyze, process and store the data for further use by other services.
There are 4 main items we need to capture;
The amount due to be paid
The account number the amount should be paid to
The reference number for the payment
The due date
With this data captured the service has the data needed to initiate a payment (the payment part is out of scope for this project).
The objective for this project is to create the components needed to identify the invoice data, capture them and prepare the data for further use.
The delivery needs to consist of a simple mobile application written in react native or firebase that works as the user interface. A server side application that exposes the APIs needed for the app to send and receive data with the server. A data store or big query setup for data storage.
This is the proposed functional flow of the service:
1. The user fires up the app on their phone launching the camera to scan the invoice
2. Once the whole paper is inside the frame the camera automatically captures the document
3. The first thing we do is to look for language indicators, identifying what language is used to aid the OCR in identifying where the 4 elements are located on the invoice (this refers to a model where such indicators are stored, using ML to optimize how accurately we can find the 4 elements). Once language is identified we can assume all 4 elements are named in a logical way and look for variations of the names; amount, account, due date and payment reference or invoice number.
All such processing can and should happen on the server while the app waits for the server to finalize the analysis. There are 3 possible outcomes;
A. All data is found. In this case we serve them back to the app for the user to validate. The user click Approve (or can edit the captured data manually) and the data is posted to server as a payment ready to be made with the relevant data.
B. Some data is found. In this case we serve up the data found but leave the fields not found open. The user is promoted to tap the empty fields and can then do 3 things;
C. Enter the data manually
Skip the field (the payment will be posted to server without this field)
Scroll through the scanned image to find the data needed and choose it (effectively training the model by human intervention)
If no data is found we allow the user to scan again or scroll around in the captured image to show the server where the data is locates (one by one "where is the account number?", "Where is the amount?", "Where is the due date?", "Where is the invoice number?").
The intention is that this simple scanning and validating invoice data app can be used to train the model looking for the 4 data points in any paper invoice in any language. And we atribut the model with new data as we capture it.
The project needs to written in Python (server side), deployed on Google Cloud, and I will provide you with the development environment and a series of invoices for training.
Hello I read your specification carefully. I'm an OCR expert. I have developed many OCR app. You can see some demos at my portfolio. If you are interested please contact to me. Thanks.
Hello! I am a python developer. I looked at your project and it seems interesting. I have all necessary skills required for this project. Ping me to discuss in detail.