I've about 1.4TB of ngram data in S3 and run a simple Hive job on it. I need guidance with why it's so slow (with 5 machines, it would take several days to complete). Apply only if you have significant Hadoop and Hive experience.
Hello I have more than 2 years of experience of developing hadoop applications. I got the same problem in my Bizalyticks ([login to view URL]) project and I resolved that now its working faster. Please ping me and give me a chance to look at your server configuration so that I will resolve your problem. I am open to meet your timeline and budget.
Hi.. I have explored hadoop and done poc's on it. I have enough knowledge on HDFS, PIG, HIVE, MAP REDUCE , FLUME and Apache Spark. Please let me know your problem in detail, let me help you in whichever way i can.
Hello
1.4TB does not seem like a huge amount of data , so i would like to look at the query you are using and the amount of data you are getting as an output .
Edit : would like to add that my experience is with the cloudera platform .
Thank you
Amr