Abstract:
This research aims to measure the benefits of complex model versus those of meaningful information, through an application of credit risk prediction for mortgage loans. The neural network represents complex model and the regression model represents simple model. Two types of data are applied in this analysis: simple data and complex data. The complex data is obtained from the simple dataset using information extraction techniques and data transformation. The two specific variables constructed in our complex data are Loan-to-value and Housing Expense ratio. Applied to the monthly Single-Family Loan-Level Dataset of Freddie Mac from year 2010 to year 2018 in this experiment, the result of confusion matrix and accuracy metrics points out that the complex data constructed in this study can help model increase the accuracy, but it cannot have a huge boost. The added benefit of the complex data in both complex model and simple model is quite small. The result also points out that the complex model is more valuable than complex data.