Abstract:
Photoelectric (PE) logging data is important in petroleum exploration due to its petrophysical implications, which can directly infer the reservoir composition. For example, the PE value of calcite is ~5 b/e, which can be used to indicate carbonates in the reservoir. However, well logging requires significant financial resources and intensive labor to acquire necessary information. Moreover, missing data at depth is a common problem during well logging surveys. This study thus aims to use three machine learning models: Extreme gradient boosting (XGBoost), Support vector regression (SVR), and Artificial neural network (ANN) to synthesize the PE log in the Anadarko basin, Kansas, USA. Over 50,000 well logging data points of 6 logging types (gamma ray, deep resistivity, spontaneous potential, density porosity, bulk density and photoelectric) from 12 wells are used to train, validate, and test the models in the ratio of 70:20:10. ANN performs poorly and shows the highest MSE at 0.197 due to its sensitiveness to imbalanced data. XGBoost shows the lowest mean square error (MSE) at 0.139 and R-square at 0.75, suggesting that XGBoost outperforms SVR and ANN. This is because XGBoost has an ability to handle imbalanced data, prioritize feature importance, and mimic human decision. Top three important features for synthesizing the PE log include depth, gamma ray log, and spontaneous potential log.