Crawl for insurance agent & company information (json output)
$30-75 USD
着払い
Scrape all Active insurance agents & companies from Washington State's Insurance Commissioner website:
[url removed, login to view]
It seems most expedient to get a list of all agents by first getting all companies, then getting the list of agents from each company.
1) [url removed, login to view]
agree
2) [url removed, login to view]
Go to Company Search tab
3) Iterate through "Coverage Type" with all other search fields blank.
This will provide the list of all active companies to crawl.
4) For each company found, there is usually a "view agents" button. Not all companies have agents listed, but for those that do, each agent should be crawled for as much information as possible.
There are about 4,500 companies and many thousands of agents to be found.
Return the agent and company data via json, basic format provided in Detail Requirements. Detailed Requirements show exactly what data is expected.
This exercise will very like be repeated for each of the 50 states so a job well done could lead to much more business.
Deliverables:
1) Source code to be delivered at project completion
2) json output of agent and company data
## Deliverables
Basic json output format
======================
Insurance Company Information
One record per line. Expanded here for visibility:
{
'company_name' => '',
'corporate_family_group' => '',
'organization_type' => '',
'waoic' => '',
'naic' => '',
'status' => '',
'admitted_date' => '',
'ownership_status' => '',
'reg_address_street' => '',
'reg_address_city' => '',
'reg_address_state' => '',
'reg_address_zip' => '',
'reg_address_zip_4' => '', # optional four addl zip digits
'reg_address_phone' => '', # 10 digit phone
'mail_address_street' => '',
'mail_address_street' => '',
'mail_address_city' => '',
'mail_address_state' => '',
'mail_address_zip' => '',
'mail_address_zip_4' => '', # optional four addl zip digits
'mail_address_phone' => '', # 10 digit phone
'insurance_types' => [ 'insurance type 1', 'insurance type 2', ... ],
'complaint_history' => [ { 'year' => '', 'category' => '', 'number_of_complaints' => '' }, ... ],
'disciplinary_orders' => [ { 'year' => '', 'order_number' => '', 'order_url' => '' }, ... ],
# any data that is interesting but not specifically called out above.
# as long as it's consistent, additional values can be data structures themselves for
# nested / complex values.
'attributes' => { 'addl_attribute1' => 'addl_value1', 'addl_attribute2' => 'addl_value2', ... },
}
Insurance Agent Information
{
'first_name' => '',
'middle_name' => '',
'last_name' => '',
'address_street' => '',
'city' => '',
'state' => '',
'zip' => '',
'zip_4' => '', # optional four addl zip digits
'phone' => '', # 10 digit phone
'email' => '',
'status' => '',
'waoic' => '',
'npn' => '',
'doing_busines_as' => '',
'licenses' => [ { 'license' => '', 'state' => '', 'lines' => '', 'effective' => '', 'expiration' => '' }, ... ],
'companies_rep' => [ { 'name' => '', 'license' => '', 'effective' => '', 'expiration' => '' }, ... ],
'agencies_rep' => [ { 'name' => '', 'license' => '', 'effective' => '', 'expiration' => '' }, ... ],
'investigations' => [ { 'year' => '', 'number' => '', 'order_url' => '' }, ... ],
'diciplinary_orders' => [ { 'year' => '', 'number' => '', 'order_url' => '' }, ... ],
# any data that is interesting but not specifically called out above.
# as long as it's consistent, additional values can be data structures themselves for
# nested / complex values.
'attributes' => { 'addl_attribute1' => 'addl_value1', 'addl_attribute2' => 'addl_value2', ... },
}
* * *This broadcast message was sent to all bidders on Tuesday May 24, 2011 4:25:20 PM:
Hello, Based on feedback from multiple bidders, I have included much more detail about the information required for this project. It does change the scope of the project so please review the requirements section, specifically the detailed requirements to determine if you are still interested in bidding on this work.
プロジェクトID: #3343899