- Configure the given websites for crawling (there will be 3 websites). The websites will be provided.
- Configure the fields for each website/page (and 2 categories of data which needs to be indexed: jobs and companies). If the website needs authentication for reaching the desired web pages, this will needs also to be configured.
- Only the pages which contains the mandatory fields are indexed / saved (if the page contains the optional fields, those will be also indexed / saved). The list of fields will be given.
- Crawling the desired data from a given xml URL instead of crawling the website. (this functionality already exists in OSS) The xml files will be provided.
- Duplicate jobs should be removed or not indexed. The search results should not contain duplicates.
- Prioritise the websites order. Each website will have a custom integer field called order (by default will be 999). Based on this field the results will be shown (eg: the results from website with order=1 will shown first and the ones from the website with order=999 will be shown last). Please add documentation with how this field can be set for all the indexed data related to an website. The sorting will not be based only on this field, but also taking the score into account.
The “Boosting subqueries” can be used but documented.
- Each indexed document should have an uniq friendly identifier based on the job title, company name and the index id (eg: Php Senior Developer at X Ltd. can have the following identifier - php-senior-developer-x-ltd-1231231233864). This will be used for generating SEO friendly urls.
- Provide documentation with how an already indexed document can be updated / removed or how can add manually a document to be indexed.
- Provide documentation how can be a special field be set to all the documents or to only a subset of them.
- All the steps will be documented so that they can be reproduced on other server as well without any other help.
Please answer to the following question when applying:
From which tab of OSS admin interface can the crawling User-Agent be overwritten?
Hi, as we have chatting already so I would like to place my bid for this project here. Thank you for your consideration. I hope we can work together.
Thanks,
Dat