Peer-to-peer lending loan default prediction: machine learning classification algorithms applied to Lending Club data, investors’ perspective

Syzdykov, M.

Home
→
1. MNU Schools
→
MAQSUT NARIKBAYEV UNIVERSITY International School of Economics
→
Theses and Dissertations
→
View Item

ISSN: 2616-731X

Peer-to-peer lending loan default prediction: machine learning classification algorithms applied to Lending Club data, investors’ perspective

Syzdykov, M.

URI: http://repository.kazguu.kz/handle/123456789/1004

Date: 2020-05

Abstract:

Recent years have witnessed an emergence of online social lending market, also known as peer-to-peer, or P2P lending. Borrowers and lenders are allowed to interact through P2P lending platforms online without a presence of a strong intermediary such as conventional banks. Nevertheless, as P2P platforms promote wider financial inclusion, the market is also characterized by the issue of higher levels of information asymmetry than that faced by traditional banks. For said reason, this thesis studies how well can the individual investors deal with information asymmetry by the means of machine learning default prediction modelling data provided by Lending Club P2P platform. To that purpose, we first examine the findings of related literature. We then choose Random Forest and XGBoost machine learning classification algorithms for experimental part of our study, with Logistic Regression classifier as performance benchmark. Our study emphasizes the use of appropriate performance metrics in presence of class imbalance, but also fair and transparent interpretation of the classification results. Next, we conduct a thorough and transparent data preparation. In the experimental results, the performance of the chosen classifiers is compared between themselves, with no significant difference between them to justify their ranking. Additionally, the results of premier classifiers of six related works are showcased, and the similarity of these results generally coincides with those of our research. However, unlike the related literature, our study further introduces the thresholding technique for the prediction results, which is illustrated to be capable of reducing the number of misclassified loan defaults, providing the opportunity for higher and more stable portfolio returns for the individual investors. Although we demonstrate how machine learning classification algorithms combined with thresholding technique can provide reasonable results for the investors, the observable consistency of the prediction results across the field suggest that the type of data provided by Lending Club may be insufficient to build machine learning models of high predictive power. Thus, we underline the need for wider use of alternative data in P2P lending market. However, this notion raises a number of questions for further research regarding alternative data regulations, privacy, and ethics in P2P lending.

Show full item record