In this study, we examined the effectiveness of integrating satellite-based crop biophysical parameters, meteorological conditions, and soil properties for the end and mid-season cotton yield prediction in the continental United States (CONUS) region. We employed six machine learning algorithms: decision tree (DT), random forest (RF), adaptive boosting (Ad-aBoost), gradient boosting (GB), light gradient boosting machine (LightGBM), and extreme gradient boosting machine (XGBoost). By employing this rigorous approach to hyperparameter tuning based on Bayesian optimization, the XGBoost method was found as the best method for both mid-season and end-season cotton yield prediction. Furthermore, we investigated the global importance of temporal and static features using the Shapley Additive Global importancE (SAGE) method to understand the driving factors of cotton yield prediction. As a result of global feature importance analysis, precipitation (P), enhanced vegetation index (EVI), and leaf area index (LAI) were found as the most important temporal features, while silt and pH were found as the most important soil properties.
Add the publication’s full text or supplementary notes here. You can use rich formatting such as including code, math, and images.