DATA ANALYTIC

DATA ANALYTIC

 

Select from the datasets provided (or one designated by your instructor). Provide a brief description of the dataset to include the number of cases, description of the inputs, description of the outcomes, etc.

Examine the dataset and eliminate mistakes, bad records, data entry errors, and outliers.

Using the R programming language:

Develop a multiple linear regression model from the dataset. Describe and explain the model you are estimating. More specifically,

Set the working directory
Load the appropriate libraries
Load the data
Transform the variables if necessary
Perform exploratory data analysis (EDA)
Provide summary statistics
Split the data into training and test sets
Build and evaluate the model(s)
Develop the multiple linear regression model on the training dataset
Tweak the tuning parameters on the model
Evaluate model performance on the test dataset
Write down the linear regression model in the form of y=ß0+ ß1×1+ ß2×2+…+ ßkxk+e.
Describe your dependent (y) and independent (xk) variables.
Describe the accuracy of your model and what the accuracy measure means.
Explain why you think these independent variables are related to (or impact) the dependent variable of the model, e.g., why you think that variable X4 impacts or has an effect on the dependent variable in the model.

Discuss the results.

Include and explain the appropriate results from the model. More specifically,

What is the goodness-of-fit of the model?
Which variables are statistically significant at 5% significance level?
Discuss the direction of the impact (i.e., sign) of the independent variables on the dependent (outcome) variable and the magnitude of the impact (size of estimated ß coefficient).
Discuss any possible ideas for improving the accuracy of the model, provide specific ideas.

Alternate models: Develop at least one additional model using R and discuss. This section for each alternate model should follow the steps and activities outlined above. You may use the same dataset and develop a different type of model (e.g., neural network, decision tree), or use a different dataset and develop (and explain) another regression model.

Data Visualization: R programming language can be used to develop visual displays of data and results. Include some key and insightful displays developed with R with your submission.

 

 

PLACE THIS ORDER OR A SIMILAR ORDER WITH US TODAY AND GET A GOOD DISCOUNT