This project explores the use of deep learning for image-based ethnicity classification using the publicly available UTKFace dataset. The dataset contains over 20,000 facial images labeled with age, gender, and ethnicity, captured in natural conditions with a wide variety of lighting, poses, and facial expressions. For this project, only the ethnicity labels were used.
A custom Convolutional Neural Network (CNN) was implemented in PyTorch to classify faces into five ethnicity groups: White, Black, Asian, Indian, and Other. The images were resized and preprocessed, and ethnicity labels were parsed directly from the filenames.
To ensure fair model training, the dataset was class-balanced by limiting each class to a maximum of 1000 samples and using PyTorch’s WeightedRandomSampler to upweight underrepresented classes. A manual training loop was used to provide full control over the optimization process and learning behavior.
During training:
The model used CrossEntropyLoss for multi-class classification.
The Adam optimizer updated weights with a learning rate of 0.0001.
Training was run for 30 epochs, and loss values were recorded to monitor learning.
The model was evaluated using:
Accuracy and loss plots across epochs
Confusion matrix with class-specific performance
Visual comparison of correct vs incorrect predictions for each ethnicity
A full classification report showing precision, recall, and F1-score
The project also includes preview plots to visualize sample images and predicted labels, ensuring transparency and interpretability.
This work demonstrates how convolutional architectures can effectively learn visual patterns for classification tasks while also addressing challenges like class imbalance and model fairness. Though the model performs well on test data, it is important to note that real-world deployment of such systems requires careful ethical consideration and bias mitigation strategies.