Find Jobs
Hire Freelancers

Google Scholar data scrape

$30-250 USD

処理中
投稿日: 6年以上前

$30-250 USD

完了時にお支払い
I am looking for some web-scraping code written in Python (Windows 10 PC) to pull results from Google Scholar, scrape data from these results and put the data into a .csv file which is readable by Excel. This data should also be written to a local database running on Windows 10 (MySQL or MongoDB) I need the following: 1. Simple GUI which allows for a "topic" to be entered into which is to be used to scrape Google Scholar Case Law as well as the start and end year. GUI should have a "submit" button. 2. Ability to automatically (internally) create the correct search URL based upon Federal or State case, the topic keywords and the first year. It will then "step" through each year one by one until it hits the end year. The URL's always follow a standard format so this should not be difficult to implement. For example, if the topic is "Trademark" and the year range is 2012 - 2015, the program create a URL for Trademark search from year 2012-2012 (single year sub-range is more manageable data-set) and do the below steps. Once this is done, it steps to Trademark from year 2013-2013 and does the below steps. It steps up by 1 year at a time until it hits 2015-2015. This is to prevent far too many search results from showing up at once and making it unmanageable. In the above example, there would be 4 separate URL created (one for each date range) that are run separately. 3. Navigate to each URL (no visual display required for this) 4. Grab the entire list of sub-URL from the search results and navigate to each one individually and scrape and save the following data in its own CSV file (auto-name). Each sub-url would then have its own filename and data. 5. Ability to turn off saving .CSV files (this is mainly needed for testing/debugging to make it easier to see that program is working properly) SAVE: a) Exact URL of the case b) Header info - contains name of case, court name, district and year c) *** if the url contains the phrase "NOT TO BE PUBLISHED IN THE OFFICIAL REPORTS", this must be reflected in the CSV naming convention by adding _NOT at the end of the filename. d) Every sub-URL will have a section called "DISCUSSION." Inside this section we need to search for and save the following: Sub-titles inside DISCUSSION section (write sub-title to file and save text associated with sub-title) aa) Sentences within sub-title section with citations afterwards (text inside parenthesis)- save preceding sentence and citation bb) Any text inside sub-title section which is inside double quotes, including citation afterwards - save entire text inside double quotes and citation. Continue the above until end of URL, then get next URL, complete and repeat. Then step forward 1 year (if date range allows) and repeat again until all results are parsed. I have attached a scraped html image of a Google Scholar article for reference.
プロジェクト ID: 14706912

プロジェクトについて

9個の提案
リモートプロジェクト
アクティブ 7年前

お金を稼ぎたいですか?

Freelancerで入札する利点

予算と期間を設定してください
仕事で報酬を得る
提案をご説明ください
登録して仕事に入札するのは無料です
この仕事に9人のフリーランサーが、平均$193 USDで入札しています
ユーザーアバター
Hi sir, I am scraping expert, I have did more than 350+ scraping project, please check my feedback then you will know. Can we discuss more details about this project? then I will provide example data/script for you. Thanks, Kimi
$152 USD 5日以内
5.0 (147 レビュー)
7.0
7.0
ユーザーアバター
699 876 606
$155 USD 3日以内
5.0 (17 レビュー)
6.3
6.3
ユーザーアバター
I wrote a google scholar bot a more than year before using C#. It should be easy for me to rewrote that code in python Relevant Skills and Experience Already wrote google scholar bot Proposed Milestones $250 USD - Milestone
$250 USD 7日以内
5.0 (56 レビュー)
6.1
6.1
ユーザーアバター
Hello sir! I've just seen your job offer and because of my skills in web design/development I'm pretty sure that I'm able to make it done with the best quality and price! Relevant Skills and Experience Hello sir! I've just seen your job offer and because of my skills in web design/development I'm pretty sure that I'm able to make it done with the best quality and price! Proposed Milestones $111 USD - default
$111 USD 10日以内
4.8 (8 レビュー)
4.2
4.2
ユーザーアバター
A proposal has not yet been provided
$249 USD 6日以内
4.8 (5 レビュー)
3.5
3.5
ユーザーアバター
I am highly interested to work in your project. I have excellent experience in web scraping, research, data mining, extracting email address and other related contact information of any business Relevant Skills and Experience Data scraping Proposed Milestones $138 USD - Google Scholar data scrape
$138 USD 5日以内
0.0 (0 レビュー)
0.0
0.0

クライアントについて

UNITED STATESのフラグ
Los Angeles, United States
5.0
24
お支払い方法確認済み
メンバー登録日:6月 11, 2007

クライアント確認

ありがとうございます!無料クレジットを受け取るリンクをメールしました。
メールを送信中に問題が発生しました。もう一度お試しください。
登録ユーザー 投稿された仕事の合計
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
プレビューを読み込み中
位置情報へのアクセスが許可されました。
あなたのログインセッションの有効期限がきれ、ログアウトされました。もう一度ログインしてください。