Crawl for insurance agent & company information (json output)

キャンセルされた 投稿 May 28, 2011 着払い
キャンセルされた 着払い

Scrape all Active insurance agents & companies from Washington State's Insurance Commissioner website:

[url removed, login to view]

It seems most expedient to get a list of all agents by first getting all companies, then getting the list of agents from each company.

1) [url removed, login to view]

agree

2) [url removed, login to view]

Go to Company Search tab

3) Iterate through "Coverage Type" with all other search fields blank.

This will provide the list of all active companies to crawl.

4) For each company found, there is usually a "view agents" button. Not all companies have agents listed, but for those that do, each agent should be crawled for as much information as possible.

There are about 4,500 companies and many thousands of agents to be found.

Return the agent and company data via json, basic format provided in Detail Requirements. Detailed Requirements show exactly what data is expected.

This exercise will very like be repeated for each of the 50 states so a job well done could lead to much more business.

Deliverables:

1) Source code to be delivered at project completion

2) json output of agent and company data

## Deliverables

Basic json output format

======================

Insurance Company Information

One record per line. Expanded here for visibility:

{

'company_name' => '',

'corporate_family_group' => '',

'organization_type' => '',

'waoic' => '',

'naic' => '',

'status' => '',

'admitted_date' => '',

'ownership_status' => '',

'reg_address_street' => '',

'reg_address_city' => '',

'reg_address_state' => '',

'reg_address_zip' => '',

'reg_address_zip_4' => '', # optional four addl zip digits

'reg_address_phone' => '', # 10 digit phone

'mail_address_street' => '',

'mail_address_street' => '',

'mail_address_city' => '',

'mail_address_state' => '',

'mail_address_zip' => '',

'mail_address_zip_4' => '', # optional four addl zip digits

'mail_address_phone' => '', # 10 digit phone

'insurance_types' => [ 'insurance type 1', 'insurance type 2', ... ],

'complaint_history' => [ { 'year' => '', 'category' => '', 'number_of_complaints' => '' }, ... ],

'disciplinary_orders' => [ { 'year' => '', 'order_number' => '', 'order_url' => '' }, ... ],

# any data that is interesting but not specifically called out above.

# as long as it's consistent, additional values can be data structures themselves for

# nested / complex values.

'attributes' => { 'addl_attribute1' => 'addl_value1', 'addl_attribute2' => 'addl_value2', ... },

}

Insurance Agent Information

{

'first_name' => '',

'middle_name' => '',

'last_name' => '',

'address_street' => '',

'city' => '',

'state' => '',

'zip' => '',

'zip_4' => '', # optional four addl zip digits

'phone' => '', # 10 digit phone

'email' => '',

'status' => '',

'waoic' => '',

'npn' => '',

'doing_busines_as' => '',

'licenses' => [ { 'license' => '', 'state' => '', 'lines' => '', 'effective' => '', 'expiration' => '' }, ... ],

'companies_rep' => [ { 'name' => '', 'license' => '', 'effective' => '', 'expiration' => '' }, ... ],

'agencies_rep' => [ { 'name' => '', 'license' => '', 'effective' => '', 'expiration' => '' }, ... ],

'investigations' => [ { 'year' => '', 'number' => '', 'order_url' => '' }, ... ],

'diciplinary_orders' => [ { 'year' => '', 'number' => '', 'order_url' => '' }, ... ],

# any data that is interesting but not specifically called out above.

# as long as it's consistent, additional values can be data structures themselves for

# nested / complex values.

'attributes' => { 'addl_attribute1' => 'addl_value1', 'addl_attribute2' => 'addl_value2', ... },

}

* * *This broadcast message was sent to all bidders on Tuesday May 24, 2011 4:25:20 PM:

Hello, Based on feedback from multiple bidders, I have included much more detail about the information required for this project. It does change the scope of the project so please review the requirements section, specifically the detailed requirements to determine if you are still interested in bidding on this work.

エンジニアリング Linux プロジェクト管理 スクリプトインストール シェルスクリプト ソフトウェアアーキテクチャ ソフトウェアテスト

プロジェクトID: #3343899

プロジェクトについて

4個の提案 リモートプロジェクト アクティブ Jun 13, 2011

4人のフリーランサーが、平均$62 で、この仕事に入札しています。

MuktoSoftware

See private message.

$63.75 USD 14日以内
(413件のレビュー)
7.3
arbitbet

See private message.

$63.75 USD 14日以内
(39件のレビュー)
5.4
efrshuvo

See private message.

$55.25 USD 14日以内
(3件のレビュー)
3.0
aleemkhanvw

See private message.

$63.75 USD 14日以内
(レビュー1件)
0.0