Attention-Based Multimodal Deep Learning for Subject-Independent Stress Detection from Wearable Signals

Project Overview

This project investigates stress detection from wearable physiological signals using deep learning and attention-based multimodal fusion. The goal is to develop a robust, subject-independent model capable of generalising across unseen individuals—a key requirement for real-world health and wellbeing applications.

The work focuses on learning how different physiological modalities contribute to stress, rather than treating all signals equally. To achieve this, I designed a multi-branch neural architecture with attention-based fusion, allowing the model to dynamically weight chest- and wrist-based signals depending on their relevance.

Data & Problem Setup

Dataset: WESAD (Wearable Stress and Affect Detection)
Signals used:
- Chest: ECG, EDA, Respiration
- Wrist: EDA, BVP, Skin Temperature, Accelerometer
Tasks:
- Binary classification: Stress vs Non-Stress
- Tri-class classification: Baseline vs Stress vs Amusement
Evaluation protocol:
Leave-One-Subject-Out Cross-Validation (LOSO-CV) to ensure strict subject independence

Model Architecture

The proposed model consists of:

Separate modality-specific encoders for chest and wrist signals
Multi-Head Attention layers for:
- Intra-modality feature weighting
- Cross-modality fusion
Modality Dropout, forcing the network to remain robust when one modality is noisy or missing
End-to-end training with calibrated probability outputs

This design allows the model to adaptively focus on the most informative physiological sources, rather than relying on fixed fusion rules.

Key Results

Binary Stress Classification (LOSO-CV)

Modality Setup	Accuracy	F1-Score	ROC-AUC
Chest + Wrist (Fusion)	85.8%	85.1%	0.94
Chest Only	78.3%	75.4%	0.99
Wrist Only	56.1%	58.1%	0.64

Insights:

Chest signals carry the strongest stress-related information.
Wrist signals alone are insufficient for reliable stress detection.
Attention-based multimodal fusion significantly improves robustness and overall performance.

Tri-Class Classification (Baseline / Stress / Amusement)

Modality Setup	Accuracy	Macro F1
Chest + Wrist (Fusion)	62.4%	50.2%

Tri-class classification is substantially more challenging due to overlapping physiological responses, yet the fusion model consistently outperformed unimodal alternatives.

Why This Matters

Demonstrates realistic, subject-independent evaluation
Shows how attention mechanisms improve interpretability and robustness
Highlights the limitations of wrist-only wearables for stress detection
Provides a scalable foundation for mental-health monitoring, wellbeing analytics, and digital therapeutics

Technical Stack

Python, PyTorch
NumPy, SciPy, scikit-learn
Attention-based deep learning
Advanced cross-validation & calibration strategies

Future Directions

Temporal attention over longer physiological contexts
Domain adaptation for real-world wearable noise
Extension to anxiety, cognitive load, and affective state detection
Deployment-oriented lightweight models for mobile and edge devices