Find Jobs
Hire Freelancers

Data Preparation Preprocessing(Data Mining) -- 2

$30-250 AUD

クローズ
投稿日: 約4年前

$30-250 AUD

完了時にお支払い
This project is about preprocessing and preparing data for future analysis. Details are below. In this project, some steps will be done using Weka, and the others will be done using Java. The project will use a modified (red) wine dataset from UCI(search it on google to see the data), although the developed project should be able to work on any dataset involving numerical attributes plus a class attribute (which is the last attribute of the file). The modification is done by mapping tuples whose quality is <=5 to class 0 and mapping other tuples to class 1. All non-class attributes are numerical. The input file is a csv file. We will call the attributes as A1, A2, …, A11, in the left-to-right order given in the input file. There are two tasks: Task 1. Discretize all attributes as follows: Use the entropy based method to split each attribute into 2 intervals; in case Weka does not produce a split for a given attribute, use the equal-density method to split the attribute into 4 intervals. The resulting bins will be stored in a csv file called Bins.csv. This file contains rows of the form for each attribute Ai with 2 bins Ai, splitValue, countC0Bin1, countC1Bin1, countC0Bin2, countC1Bin2 This file contains a row with 12 fields (3 split values + 8 counts), for each attribute with 4 bins. Task 2: Map the dataset into the itemized form using the bins produced in Task 1. This step will produce two files, namely [login to view URL] and ItemizedData.csv. The contents of these two files should follow the format given on [login to view URL] file attached. There are two possible ways to do the project: 1) Using Weka and Java, providing a Java program producing all results specified above. 2) Using Java only, providing a Java program producing all results specified above. For option 1, you should submit your Java program, and a jar file called P1.jar. The Java program should work assuming that Weka is installed on the machine. [login to view URL] when run should take the name of the dataset file as a parameter. You should assume that the input dataset is the same folder where the jar file is run. For option 2, you should submit your Java program, and a jar file called P1.jar. [login to view URL] when run should take the name of the dataset file and a minIG value as parameters. The program will use information gain (IG) to split an attribute if the IG is >= minIG, and use equal-frequency Only the requested results are required.
プロジェクト ID: 23962307

プロジェクトについて

4個の提案
リモートプロジェクト
アクティブ 4年前

お金を稼ぎたいですか?

Freelancerで入札する利点

予算と期間を設定してください
仕事で報酬を得る
提案をご説明ください
登録して仕事に入札するのは無料です
この仕事に4人のフリーランサーが、平均$145 AUDで入札しています
ユーザーアバター
Hi, I have 4+ years of experience in Hadoop technologies like HDFS, MapReduce, Spark, Hive etc as well as data mining techniques. I can complete your project Please contact me...
$200 AUD 7日以内
4.7 (11 レビュー)
4.3
4.3
ユーザーアバター
Having 8.7+ Yrs. of experience in ears of Extensive Software Development Life Cycle experience. • Worked on View, Procedures, Function, Triggers, Indexes, Partitioning. • Good in SQL Queries. • Experience in data extraction, transmission and loading. • Experience in SSIS Package • Good explore in Data warehousing concept and Data Modeling. • Good knowledge on Dimensional data model, Star schema, snowflake schema. • Working experience in Retail. • Experience in Python. • Good Knowledge in AWS Redshift. • Experience in Hive, Sqoop • Good knowledge in Cloud (Aws/Azure) • Expertise in debugging the code. • Good knowledge in UNIX commands.
$140 AUD 7日以内
0.0 (0 レビュー)
0.0
0.0
ユーザーアバター
HI! I AM DEV AND I AM INTERESTED IN YOUR WORK BECAUSE I'VE ALREADY DONE SOMETHING LIKE THAT BEFORE. SO, I THINK I CAN DO THIS JOB EASILY...
$100 AUD 7日以内
0.0 (0 レビュー)
0.0
0.0

クライアントについて

INDIAのフラグ
faridabad, India
4.9
37
お支払い方法確認済み
メンバー登録日:3月 9, 2017

クライアント確認

ありがとうございます!無料クレジットを受け取るリンクをメールしました。
メールを送信中に問題が発生しました。もう一度お試しください。
登録ユーザー 投稿された仕事の合計
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
プレビューを読み込み中
位置情報へのアクセスが許可されました。
あなたのログインセッションの有効期限がきれ、ログアウトされました。もう一度ログインしてください。