Find Jobs
Hire Freelancers

Data Mining using R

$30-250 USD

キャンセル
投稿日: 約8年前

$30-250 USD

完了時にお支払い
Use Microsoft R Open program to answer the following questions. This is beginner level questions so should be easy for anyone with existing R data mining experience. 1. Load the [login to view URL] data set into R. It lists the outcome of 850 loans. The data variables include loan status, credit grade (from excellent to poor), loan amount, loan age (in months), borrower's interest rate and the debt to income ratio. Code loan status as a binary outcome (0 for current loans, 1 for late or default loans). Display the column names from the loan data set. Fit the loan data set using random forest function. Copy the trained random forest model and the confusion matrix from R and paste it below. 2. Randomly select 750 out of 850 loans as your training sample. Use the remaining 100 loans as your test set. Train the 2ndrandom forest model using the training set. Apply the 2ndmodel to the test set to predict loan status. Compare your predictions to the true loan statuses (using table function). Display the confusion matrix below. Based on this confusion matrix, what's the overall misclassification rate? 3. Fit the loan data set using an artificial neural network. Use six neurons in the hidden layer of the ANN. Set maxit to 1000. Use table function to compare in-sample predictions to the true loan statuses. Display the confusion matrix below. 4. Use the training sample (750 randomly selected loans) to build the 2nd artificial neural network. Use six neurons in the hidden layer of the ANN. Set maxit to 1000. Use table function to compare out-of-sample predictions to the true loan statuses (use the remaining 100 loans as your test set). Display the confusion matrix below. Use the training sample (750 randomly selected loans) to build a model of support vector machine. Use table function to compare the SVM's out-of sample predictions to the true loan statuses (use the remaining 100 loans as your test set). Display the confusion matrix below. 5. Randomly shuffle the loan data set. Run 10-fold cross-validation to evaluate the out-of-sample performance of Random Forest, ANN and SVM. Based on your cross-validation results, which model has the best out-of-sample performance? Please briefly explain why. 6. Run leave-one-out cross-validation to evaluate the performance of random forest algorithm in predicting loan status. Why does it take much longer to run leave-one-out cross-validation than to run ten-fold cross-validation? Based on the result of your leave-one-out cross-validation, how many loans are misclassified by the random forest model?
プロジェクト ID: 10289237

プロジェクトについて

リモートプロジェクト
アクティブ 8年前

お金を稼ぎたいですか?

Freelancerで入札する利点

予算と期間を設定してください
仕事で報酬を得る
提案をご説明ください
登録して仕事に入札するのは無料です

クライアントについて

UNITED STATESのフラグ
United States
0.0
0
メンバー登録日:4月 21, 2016

クライアント確認

ありがとうございます!無料クレジットを受け取るリンクをメールしました。
メールを送信中に問題が発生しました。もう一度お試しください。
登録ユーザー 投稿された仕事の合計
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
プレビューを読み込み中
位置情報へのアクセスが許可されました。
あなたのログインセッションの有効期限がきれ、ログアウトされました。もう一度ログインしてください。