Create a script for data extraction from images

€250-750 EUR

クローズ

投稿日:

約5年前

€250-750 EUR

完了時にお支払い

Expected behavior: 1) User opens a web page with file upload form 2) User uploads a scanned document 3) Data is extracted in the backend via a script 4) Data is saved in MySQL db via a script Please see the link below for scanned document examples: [login to view URL] Data from example document#2 that should be extracted: Invoice Number - Pavadzīme Nr. ABT 2930 Date Supplier - Piegādātājs AB Transystems SIA Supplier address - [login to view URL] Maskavas iela 227,Rīga,LV-1019 Supplier Registration Number - Reģ. Nr. 40003741261 Vat Nr - PVN Nr - LV40003741261 Bank account Nr. - Konts - LV91HABA0551009805564 Sum without VAT - Summa bez PVN: 6.66 Sum with VAT - Summa ar PVN (EUR) 8.06 Product Fields - Preču nosaukums, Mērv, Daudz, Cena, Summa (JTH 48B M10x30 regulējošās kājiņas gabals 4 0,590 2,39) There will be multiple scanned document templates with different designs/looks. If you’re up for this job - please provide us with the necessary information: a) Which programming language/libraries will you be using? b) For multiple templates, will the same data extraction logic/pattern be applied, or will it be needed to customize for each template? c) What would be the minimum requirements for the scanned document in terms of quality and dimensions(px) for the script to work? d) When can you start work on this project? For this project - it would be best to use an already available solution. I would suggest using Apache Tika ([login to view URL])

プロジェクト ID: 19019769

プロジェクトについて

7個の提案

リモートプロジェクト

アクティブ 5年前

お金を稼ぎたいですか？

メールアドレス

Freelancerで入札する利点

予算と期間を設定してください

仕事で報酬を得る

提案をご説明ください

登録して仕事に入札するのは無料です

この仕事に7人のフリーランサーが、平均€552 EURで入札しています

@schoudhary1553

Hello, I have gone through your job posting and become very much interested to work with you. I am an expert in this field. I have already completed several projects like this. For evidence you can see my profile. Please visit : https://www.freelancer.com/u/schoudhary1553 I have excellent command over English. I am a hard worker, productive and worthy of your attention I hope, I would be the right candidate for this post. Awaiting an affirmative response from you. Kinds Regards, Sandeep

€500 EUR 6日以内

4.9

(271 レビュー)

7.8

@zekovicm

Hi there,I am Data extraction automation expert from Bosnia & Herzegovina,Europe. I have carefully gone through with your requirements and I would like to help you with this project ! I can start immediately and finish it within the agreed deadline. Check out my profile, portfolio and former clients feedback - that'll let you know everything about me. Please feel free to contact me so that we can discuss further details. Thank you for taking the time to read my proposal.I am looking forward to hearing from you. Best regards, Miljan

€338 EUR 5日以内

4.9

(134 レビュー)

7.5

@hjr122413

Hiya. How is your day? I just checked your project “Create a script for data extraction from images” and I am interested in your project. I am an experienced developer in PHP, Angular, React and Node and I can handle your design to good I am ready to provide full service from design to maintenance for you. I would like to discuss more details via chat and I hope we will make the good relationship in our project.

€500 EUR 10日以内

4.6

(82 レビュー)

7.3

@novepi

Hi there, The requirements look quite clear and straightforward to implement. However, tika is certainly not the tool you're looking for. It targets metadata and structured text. However, if I'm not missing anything here, what you need is optical character recognition aka OCR to parse the data from scan images, those 2 are very different things. Rolling up a solution from scratch is way out of the scope for this project and an overkill in the first place so I suggest to use tesseract-ocr a renowned open source engine for this type of work. I have used it several times with pretty good results. I should note that though as is the case with any ocr implementation success rate won't be 100% meaning there'll be files that it won't be able to parse, e.g a badly scanned image. About your questions, 1. I'm planning to use python utilizing tesseract 2. It won't work on different templates. The effort needed to make it work for another template is directly related to the difference between the templates. For instance, if it's a completely different template with a new layout, font, color, ect. a brand new parser should be created for it from scratch. 3. It's impossible to give any decent figure for that. Layout, font, coloring, clarity effects everything. For instance, the last page of the example document is very hard to parse if not impossible at all 4. I can start on next Wednesday, 27th of March and expect this to take 15 days. I'll need lots of these files to train the engine

€1,000 EUR 15日以内

5.0

(46 レビュー)

6.0

@goldsea808

hello,how are you. i read your bid carefully. i am ocr expert and have full experience for 10 years. c/c++, opencv is my top skill and i can build yoru project fully. i can provide most quality and high speed. if you want to success, please contact me. Then, I will give you good result to the proposals. hire me.

€500 EUR 10日以内

5.0

(10 レビュー)

5.9

@Valuesolutions

Hello, I have read the details provided and i am positive i can provide quality work,please contact me to discuss more on the project deadline and some other few things

€250 EUR 10日以内