Find Jobs
Hire Freelancers

data project

$30-250 USD

クローズ
投稿日: 2年近く前

$30-250 USD

完了時にお支払い
In this project, you will develop an Oozie workflow to process and analyze a large volume of flight data. • Instructions: 1. Form a project team of four students (including yourself). 2. Install Hadoop/Oozie on your AWS VMs. 3. Download the Airline On-time Performance data set (flight data set) from the period of October 1987 to April 2008 on the following website: [login to view URL]:10.7910/DVN/HG7NV7 4. Design, implement and run an Oozie workflow to find out a. the 3 airlines with the highest and lowest probability, respectively, of being on schedule; b. the 3 airports with the longest and shortest average taxi time per flight (both in and out), respectively; and c. the most common reason for flight cancellations. • Requirements: 1. Your workflow must contain at least three MapReduce jobs that run in fully distributed mode. 2. Run your workflow to analyze the entire data set (total 22 years from 1987 to 2008) at one time on two VMs first and then gradually increase the system scale to the maximum allowed number of VMs for at least 5 increment steps, and measure each corresponding workflow execution time. 3. Run your workflow to analyze the data in a progressive manner with an increment of 1 year, i.e. the first year (1987), the first 2 years (1987-1988), the first 3 years (1987-1989), …, and the total of 22 years (1987-2008), on the maximum allowed number of VMs, and measure each corresponding workflow execution time. • Submission (all in a zipped file: [login to view URL]): 1. A [login to view URL] text file that lists all the commands you used to run your code and produce the required results in a fully distributed mode 2. An [login to view URL] text file that stores the final results from all the runs 3. The source code of your MapReduce programs (including the JAR files) and any other programs you might have developed and included in the workflow 4. The Oozie workflow XML file 5. A project report in PDF that includes: a. A diagram that shows the structure of your Oozie workflow b. A detailed description of the algorithm you designed to solve each of the problems c. A performance measurement plot that compares the workflow execution time in response to an increasing number of VMs used for processing the entire data set (22 years) and an in-depth discussion on the observed performance comparison results d. A performance measurement plot that compares the workflow execution time in response to an increasing data size (from 1 year to 22 years) and an in-depth discussion on the observed performance comparison results
プロジェクト ID: 33638624

プロジェクトについて

6個の提案
リモートプロジェクト
アクティブ 2年前

お金を稼ぎたいですか?

Freelancerで入札する利点

予算と期間を設定してください
仕事で報酬を得る
提案をご説明ください
登録して仕事に入札するのは無料です
この仕事に6人のフリーランサーが、平均$144 USDで入札しています
ユーザーアバター
Yo! I am an expert PHP, laravel, codeignter programmer with skills including Database Administration, Data Processing, Data Analysis, Data Science and Data Analytics. Please send a message to discuss more about this project. Thanks
$155 USD 7日以内
4.5 (6 レビュー)
4.1
4.1
ユーザーアバター
Hi, I have +5 years of experience dealing with machine learning algorithms and worked on multiple projects in this field, I absolutely can do your project as you like. Please contact me to discuss more. Have a nice day
$140 USD 7日以内
5.0 (1 レビュー)
0.2
0.2
ユーザーアバター
I am big data specialist. Contact me me please........................................... Regs Jinu
$140 USD 7日以内
0.0 (0 レビュー)
0.0
0.0
ユーザーアバター
Hi! My name is Mani, I am a student in my 3rd year at the University of St Andrews (ranked #1 in the UK this year) studying Computer Science. I have ample experience with Python, and especially with pandas and numpy, having done numerous data science projects in the past. I am familiar with Oozie, and would be more than happy to help you with this task!
$150 USD 7日以内
0.0 (0 レビュー)
0.0
0.0

クライアントについて

EGYPTのフラグ
Cairo, Egypt
4.9
39
お支払い方法確認済み
メンバー登録日:10月 25, 2018

クライアント確認

ありがとうございます!無料クレジットを受け取るリンクをメールしました。
メールを送信中に問題が発生しました。もう一度お試しください。
登録ユーザー 投稿された仕事の合計
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
プレビューを読み込み中
位置情報へのアクセスが許可されました。
あなたのログインセッションの有効期限がきれ、ログアウトされました。もう一度ログインしてください。