Scraping and developing IP-database
We are looking for a programmer for doing the following:
1. Scrape 4 regional Internet registry (RIR)
2. Add additional information from [login to view URL]
3. Develop a function for doing who-is lookup
Each of the steps are described in detail below
Step 1
Worldwide there are 5 regional Internet registry (RIR), please read more here: [login to view URL]
In this link you can find Country IP ranges by Continent: [login to view URL] The problem is, these ranges does not contain sub-ranges. Today we have data from Ripe. The job is to scrape AfriNIC, ARIN, APNIC and LACNIC on all sub-ranges, in total we estimate there are approximately 3-5 million sub-ranges.
a) Take first IP from the range and do a loop. Which means, increment last range by one IP-number, for example [login to view URL] +1 = [login to view URL]
b) Store the following data: inetnum, ip_from, ip_to, organization, isp, country
Step 2
Take the first ip-number in each sub-range from Step 1 (and also our Ripe database) and add additional information from ippages: [login to view URL]
We will purchase a subscription for high-performance lookup. Lookup subscribers have access to a separate server, [login to view URL] to provide consistent, fast lookup results
The data to store is:
area_code, org,org2, city, city2, country, country2, country_name, country_name2, eu, eu2, ip, isp, latitude, latitude2, longitude, longitude2, ipv6, ip_long, registrant, postal_code, zip_code, zip, host, host_name, domain_name
Step 3
We do not expect that web scraping in Step 1 will find all sub-ranges in the world. As we today have a web analytics solution we will ad-hoc collect ip-numbers that does not exist in the database. For these “new” ip-numbers we need a function that do the following:
a) Do a who-is lookup in ippages and store the same data as in Step 2
b) Do who-is look-up in [login to view URL] or one of the RIRs and store the same data as in Step 1. The link to domaintools is: [login to view URL]
c) Store the “new” data in our IP database. The above should be a batch job every night.
We are looking forward your proposal with solutions how we can solve this. Please feel free to ask questions. There will be 3 milestone payments, one after each completed step.
Please note: If we find a solution to our problem the commitment will be long-term.