Find Jobs
Hire Freelancers

HTML page scraper

$30-100 USD

処理中
投稿日: 15年近く前

$30-100 USD

完了時にお支払い
HTML files residing on a local drive will need to be scraped for data and placed into either a mySQL or SQLite table based upon a definition table. ## Deliverables I need a Delphi 7 application that will scrape data off of HTML files that reside on a local hard drive and place data into either a mySQL table or a SQLite table. The application will have a string constant that points to the location of the HTML files. The application needs to be able to search that location for *.htm files, including in any subfolders that might exist. E.g. const SourcePath : String = 'c:\data\'; If there are any subfolders under c:\data they need to be searched for *.htm files. The data that will be scraped will be defined by a table that will have 3 fields: BeginTag: EndTag: DBField: Each definition/record in this table needs to be applied to each HTM file found in SourcePath. Here's an example of this definition table: BeginTag: **Date:** EndTag: * DBField: THEDATE BeginTag: **ID:** EndTag: * DBField: IDNUMBER BeginTag: Rank: # EndTag: RANKING DBField: in Note: The actual definition table will hold more than just 3 definitions. The app needs to be able to handle all of the definition entries/records it finds. So, here's how the definition table would work. Using the 3 definitions above as an example, we would start with the "**Date:**" BeginTag. The app would search the HTML code in the first file for the first instance of "**Date:**". It would then start storing the data it finds beginning with the next character position after this BeginTag and store the characters/data into a temporary string until it reaches the EndTag, which in this case would be " * ". Whatever temporary string data has been found between the BeginTag and EndTag will be written to a different table (we'll call it the RESULTS DATA table) AFTER all of the definitions have been iterated through. So, the app would move on to the next definition record (BeginTag: **ID:** EndTag: * ) and likewise scrape the data to a temporary string. And then move on to the next definition, etc... Once all the definitions have been iterated through, the scraped data will be written to a record in the RESULTS DATA table. In the example above, 3 strings of data would be written to the fields THEDATE, IDNUMBER and RANKING. The app would then move on to the next HTM file it finds and repeats the scraping of data based upon the definitions, and saves the scraped data to another record in the RESULTS DATA table. And so on... Before writing the scraped data to a record in the results data table, the app will need to check and see if an existing record already resides in the RESULTS DATA table. We don't want duplicate records! The app only needs to check for the existence of a single field to determine if a record already exists in the RESULTS DATA table or not. That single field will be defined by a string constant: INDEXFIELD, e.g.: const IndexField : String = 'IDNUMBER'; If a record already exists, then the record will be replaced. If a record does not exist, a new one will be added to the RESULTS DATA table. Before moving on to the next HTM file, the app will rename the original HTM file by appending the extension ".processed" to its file name. A progress bar will be required, showing the current status of completion based upon how many HTM files still need to be processed. A TMemo will be placed on the main form, which will be used for output/logging/debugging purposes. Each processed HTM file will have logged into the TMemo the following: 1) The full path to the file name 2) The scraped data found within that file e.g. c:\data\[login to view URL] THEDATE: July 8, 2008 IDNUMBER: A9023 RANKING: 23 c:\data\[login to view URL] THEDATE: June 18, 2000 IDNUMBER: B1234 RANKING: 567 Before the program exits/quits, the TMemo needs to be written to disk, using the following file format: [login to view URL] in a \LOGS folder (placed under the application folder). 1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Deliverables must be in ready-to-run condition, as follows? (depending on the nature? of the deliverables): a)? For web sites or? other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment. b) For all others including desktop software or software the buyer intends to distribute: A software? installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request. 3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement). ## Platform Windows 32-bit
プロジェクト ID: 2806574

プロジェクトについて

13個の提案
リモートプロジェクト
アクティブ 15年前

お金を稼ぎたいですか?

Freelancerで入札する利点

予算と期間を設定してください
仕事で報酬を得る
提案をご説明ください
登録して仕事に入札するのは無料です
アワード者:
ユーザーアバター
See private message.
$25.50 USD 10日以内
5.0 (71 レビュー)
6.2
6.2
この仕事に13人のフリーランサーが、平均$64 USDで入札しています
ユーザーアバター
See private message.
$102 USD 10日以内
5.0 (29 レビュー)
5.6
5.6
ユーザーアバター
See private message.
$59.50 USD 10日以内
5.0 (63 レビュー)
5.1
5.1
ユーザーアバター
See private message.
$50.15 USD 10日以内
5.0 (46 レビュー)
5.0
5.0
ユーザーアバター
See private message.
$51 USD 10日以内
5.0 (8 レビュー)
3.6
3.6
ユーザーアバター
See private message.
$85 USD 10日以内
5.0 (24 レビュー)
3.6
3.6
ユーザーアバター
See private message.
$29.75 USD 10日以内
4.9 (19 レビュー)
3.5
3.5
ユーザーアバター
See private message.
$51 USD 10日以内
5.0 (4 レビュー)
2.4
2.4
ユーザーアバター
See private message.
$21.25 USD 10日以内
5.0 (2 レビュー)
1.3
1.3
ユーザーアバター
See private message.
$212.50 USD 10日以内
0.0 (0 レビュー)
0.0
0.0
ユーザーアバター
See private message.
$80.75 USD 10日以内
0.0 (0 レビュー)
0.0
0.0
ユーザーアバター
See private message.
$21.25 USD 10日以内
0.0 (0 レビュー)
0.0
0.0
ユーザーアバター
See private message.
$42.50 USD 10日以内
0.0 (1 レビュー)
0.0
0.0

クライアントについて

UNITED STATESのフラグ
Fredericksburg, United States
4.9
29
お支払い方法確認済み
メンバー登録日:3月 7, 2009

クライアント確認

ありがとうございます!無料クレジットを受け取るリンクをメールしました。
メールを送信中に問題が発生しました。もう一度お試しください。
登録ユーザー 投稿された仕事の合計
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
プレビューを読み込み中
位置情報へのアクセスが許可されました。
あなたのログインセッションの有効期限がきれ、ログアウトされました。もう一度ログインしてください。