Abstract:
This study applies reinforcement learning to credit scoring by using the logistic bandit framework. The credit scoring and the credit underwriting are modeled into a single sequential decision problem where the credit underwriter takes a sequence of actions over an indefinite number of time steps. The traditional credit scoring approach considers the model construction separately from the underwriting process. This approach is identified as a greedy algorithm in the reinforcement learning literature, which is commonly believed to be inferior to an efficient reinforcement learning approach such as Thompson sampling. This is true under the simple setting, i.e., granting credit to a single borrower per action while the pool of the borrowers is fixed. However, under the more realistic scenario where these two conditions are relaxed, the greedy approach can outperform Thompson sampling since the greedy algorithm does not commit too early to an inferior action as it does in the simple setting. Still, the efficient exploration feature of Thompson sampling is beneficial. When the borrower characteristics are captured by a large number of features, the exploration mechanism enables Thompson sampling to outperform the greedy algorithm. The results from the simulation study permit a deeper understanding of the reinforcement learning approaches towards the logistic bandits, especially in the setting of credit scoring and credit underwriting processes.