Find Jobs
Hire Freelancers

Java expert - Improve webpage scrapping solution

$250-750 USD

クローズ
投稿日: 4年近く前

$250-750 USD

完了時にお支払い
Request details I developed a Java program to scrap information from a website. The architecture of the solution involves: 1) using Java Selenium to send requests to the webpage via Chrome Webdriver to trigger authentication and authenticated requests; 2) routing the requests from Chrome (headless) to Java BrowserMobProxy to capture three HTTP headers (Authorization, X-CSRF-TOKEN, and Cookie) and one query string (without these, the server after some requests starts responding 512); and 3) use these 4 elements in HTTPs requests from Java directly to the webpage (i.e. without Selenium, Chrome, and BrowserMobProxy involved) to retrieve the desired information. This program does the basic functionality of extracting the information but has a few problems: It depends on an external non-Java component: Chrome WebDriver It depends on Java Selenium and Java BrowserMobProxy, two dependencies that I would like to remove It is not optimized (too much refresh and too long sleep periods) relatively to the limit upon which the Webpage (Cloudfare) starts responding 429 errors. Thus, the retrieval of the information is taking much more time than needed. Deliverables You will get the current program Java code and you will need to solve the problems above. To do so, you will need to: A. Find out how to authenticate and refresh the 3 headers and the query string without depending on Selenium, Chrome Webdriver, and BrowserMobProxy. As most of this data is likely generated in JavaScript, you will need knowledge about JavaScript and how to execute JavaScript from within Java or convert the JavaScript code to Java (preferable solution). B. You will need to identify the limit upon which the Webpage (behind Cloudfare) starts responding 429 errors. You will need to tune the refresh frequency of the headers and sleep periods to the limit identified. You will need to demonstrate the benefits of your changes by extracting the information currently extracted by the program and measuring how long it takes. Note: you will need to create your own login/password in the webpage. No additional requirements exist to register.
プロジェクト ID: 26819050

プロジェクトについて

7個の提案
リモートプロジェクト
アクティブ 4年前

お金を稼ぎたいですか?

Freelancerで入札する利点

予算と期間を設定してください
仕事で報酬を得る
提案をご説明ください
登録して仕事に入札するのは無料です
この仕事に7人のフリーランサーが、平均$515 USDで入札しています
ユーザーアバター
Hi, sir. I have carefully checked your requirements and I was glad that I've already done this kind of projects before. I'd love to share more detail with you over chat and I'm sure that you'll be interested in them. I also have very much experience in all skills you wanted for this project, Java, PHP, Web Scraping, JavaScript, Software Architecture, so I think I can be the best candidate. Please contact me so that we can talk in detail. Thank you in advance!
$555 USD 6日以内
5.0 (14 レビュー)
6.8
6.8
ユーザーアバター
I have experience in scrapping various website pages including ajax pages without using selenium. Fine tune the frequency of refresh can minimize the error, but no guarantee of free from robot detection of your target website. I can do so, but If you have additional budget, i would like to provide you an option to proxy farming (you will only bear the cost charged by the proxy farming provider, I will taking no charge on the coding portion).
$350 USD 5日以内
5.0 (3 レビュー)
1.3
1.3
ユーザーアバター
Thanks for project posting   and I respect it  I recently worked on the project like yours and can provide you demo work as well  Do you want free demo ? ping me in freelancer message board  Thanks and Regards,      
$500 USD 7日以内
0.0 (0 レビュー)
0.0
0.0
ユーザーアバター
Hi, I have 11 years of experience in JAVA,J2EE Technologies and Selenium.I am Expert in Spring boot and micro services.i worked on REST service,SOAP service, JSP, HTML,CSS,BOOTSRAP,JavaScript angular js, AWS, Docker. I can deliver your project with in timeframe
$550 USD 7日以内
0.0 (0 レビュー)
0.6
0.6

クライアントについて

ROMANIAのフラグ
Băilești, Romania
5.0
1
メンバー登録日:3月 8, 2020

クライアント確認

ありがとうございます!無料クレジットを受け取るリンクをメールしました。
メールを送信中に問題が発生しました。もう一度お試しください。
登録ユーザー 投稿された仕事の合計
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
プレビューを読み込み中
位置情報へのアクセスが許可されました。
あなたのログインセッションの有効期限がきれ、ログアウトされました。もう一度ログインしてください。