I do have experience with opencv ocr (tesseract) and python - and building the latest version of opencv3 which may be needed for python3.6 (depending on what you are doing). I've designed opencv programs that can a) read a document from pdf (or document image from scanning, fax, etc). Then, applying template -oriented pattern recognition, the system would recognize the type or category or document (title, bill, gov't license document, etc), and OCR defined fields. The system has two parts; a OCR/opencv back-end recognition library that gets loaded with document templates for document categories to be recognized. The second part is the UI (windows C# winforms) that is used to create document recognition templates and test them. That was a few years ago and mostly C++/C#. I've done other recent projects with opencv+python (where I had to build opencv3.3 from source to get the python3.6 cv support needed.)