A machine learning framework for missing and imbalanced data in marketing analytics
Published in Journal of Marketing Analytics, 2025
While machine learning offers powerful tools for predicting consumer behavior, the utility of marketing datasets is often undermined by two pervasive issues: substantial missing data and imbalanced class distributions across customer groups. These challenges are especially acute in binary classification problems common in marketing, such as campaign response prediction (accept/reject) and purchase decision modeling (buy/not buy), where missing values and class imbalance can markedly degrade model performance. We evaluated our framework using five machine learning models: K-Nearest Neighbors, Decision Trees, Random Forest, Multi-Layer Perceptron, and AdaBoost classifiers. The findings advance marketing analytics by demonstrating how tailored data preparation strengthens the reliability of AI-driven consumer insights, particularly for binary classification tasks with imbalanced class distributions (e.g., campaign responses and purchase conversions). In doing so, the proposed approach helps bridge the gap between theoretical machine learning methods and their practical application in marketing contexts.
Recommended citation: Zhang, C., et al. " A machine learning framework for missing and imbalanced data in marketing analytics. " Journal of Marketing Analytics. 2025. https://doi.org/10.1057/s41270-025-00432-4