Our goal is to test the efficacy of Mahout and its algorithms for three different data mining problems that we have: 1. Text Clustering - Creating groups of related products by analyzing the text of our product names and descriptions. (k-Means/Fuzzy k-Means) [login to view URL] 2. Frequent Pattern Mining ??" For each product, we want to recommend the Top X most related other products. And for each user, we want to identify the Top X most similar users. (ParallelFrequentPatternMining) [login to view URL] 3. Taste ??" In addition to knowing who bought which product, we also have a rating for each product. So, we’d like to try to recommend the Top X most similar products based on rating. So, we want to use Item-based recommendations as described here: [login to view URL] Nature of this Project: This project is an experimental prototype. Expect a few revisions of the requirements as we understand what the technology is capable of and what the results look like. Infrastructure Requirements: The Frequent Pattern Mining and Taste problems will be run against a dataset with no fewer than 500 million transactions. So, a user buys one or many products, and gives each of them a rating. Each purchase of a single product constitutes a transaction. The size of the dataset makes it necessary to run a cloud scale, which is why Mahout has been chosen. So, everything needs to be configured to run on Amazon Elastic Map Reduce. Inputs and outputs should be text files stored in Amazon S3. You bid should include costs associated with development and testing in the Amazon cloud. The Text Clustering problem operates on smaller, but still large datasets numbering into the single digit millions of rows.
## Deliverables
Please read the full requirements in the attached .zip file.