Skip to content

About CURATOR

Research Objectives

Current clinical protocols are typically standardized rather than personalized, which means they fail to account for individual differences in brain dynamics, cognitive goals, and lifestyle factors. This lack of personalization makes treatment optimization slow, inefficient, and reliant on trial-and-error approaches.

The key scientific contribution of CURATOR lies in validating biomarkers as predictors of treatment response, systematically mapping modality-outcome relationships, and assessing whether multimodal AI models can outperform current heuristic approaches.

Research Hypotheses

  1. Specific EEG biomarkers (e.g., alpha peak frequency, connectivity measures) can predict treatment response with accuracy significantly above chance level.
  2. Feedback modality (visual, auditory, interactive) significantly moderates treatment outcomes, and individual-level modality matching is preferred.
  3. Multimodal integration via modern LLMs yields more accurate and interpretable treatment recommendations than current clinician-only heuristics.

Methodology

The project combines rigorous analysis and multimodal AI integration to (1) test our research hypotheses and (2) deliver a clinically useful tool. Each methodological step is directly tied to one or more hypotheses and is designed to be feasible within the fellowship timeframe while producing generalizable scientific insights.

Data preprocessing. EEG signals will be preprocessed with bandpass filtering, notch filtering, bad-channel detection, ICA or automated artifact subspace separation for ocular/muscle artifacts, and robust normalization across sessions. This approach follows best practice recommendations for reproducibility in EEG pipelines [Bigdely-Shamlo et al., 2015; Gabard-Durnam et al., 2018]. Preprocessing will include pseudonymization of raw files and secure storage, ensuring GDPR compliance from the earliest stage of the pipeline.

Feature extraction. We will extract conventional spectral biomarkers (absolute and relative power in canonical EEG bands), peak alpha frequency and band ratios (e.g., theta/alpha), and metrics of spectral variability (e.g., coefficient of variation across epochs). For task-based protocols (e.g. oddball and attention tasks) we will compute ERP amplitudes/latencies, including P300 [Arvaneh et al., 2019]. Connectivity measures (e.g., coherence, phase-locking value) and graph-theoretic summaries (clustering, modularity, dynamic reconfiguration) will also be derived.

Candidate modalities and paradigms. Prior studies show modality effects vary by task and population [Sigrist et al., 2013; Proulx et al., 2022]. We will assess visual (abstract gauges, dynamic scenes, game visuals), auditory (sonification, tonal reinforcement), and interactive (simple games where EEG control modifies gameplay) feedback modalities. Evaluation will proceed in two phases: (1) single-session laboratory learning to capture immediate engagement and (2) multi-session pilots to assess sustained efficacy.

Engagement and learning metrics. For each modality we will measure objective neurofeedback learning (change in target biomarker per unit time), behavioral indices (task performance, if applicable), physiological proxies (heart rate variability and pupilometry, where available), and subjective usability (standardized questionnaires). These metrics form the basis for per-individual modality matching rules.

Machine Learning models for biomarker selection. We will benchmark compact convolutional models such as EEGNet [Lawhern et al., 2018] and Deep4Net [Schirrmeister et al., 2017] for EEG decoding. EEGNet is particularly suited for low-latency inference [Bian et al., 2024]. Candidate models will be compared against simpler baselines (e.g., linear/logistic regression on spectral features) to ensure gains are scientifically meaningful.

Multimodal fusion. Structured inputs (EEG biomarkers, questionnaire scores, cognitive test scores) will be fused either via late-fusion (model outputs combined via meta-learner) or via early-fusion methods with deep models (concatenated embeddings), depending on the available sample sizes. Unstructured clinical texts (patient history) will be encoded using pretrained LLMs, without needing extensive feature engineering, and further processed via retrieval-augmented generation (RAG) to enforce grounding in verifiable evidence.

Model evaluation. Performance will be evaluated via nested cross-validation and temporal holdouts. Metrics will include AUC/ROC analysis for binary classification, mean absolute error for continuous outcomes, and calibration curves for clinical interpretability. We estimate that, in our setting, 20 patients (10 sessions each) is sufficient to demonstrate feasibility and provide proof-of-concept evidence for larger-scale follow-ups.

Explainability and clinical audit. All deployed models will include explainability outputs (feature-level attributions, confidence intervals, and out-of-distribution alarms). SHAP [Lundberg & Lee, 2017] or integrated gradients [Sundararajan et al., 2017] will be used for auditing our trained models.

LLM choices. For production-grade summarization we will test proprietary (e.g., OpenAI's GPT-4o, Anthropic's Claude) and open-source (Meta's LLaMA-4 variants) LLMs. All outputs will be verified by clinicians (human-in-the-loop), with a fallback to rule-based templates if LLM outputs fail audit checks.

Collaboration

CURATOR is a collaboration between the University of Luxembourg and Neurofeedback Luxembourg (Servicium SA), combining advanced research capabilities in HCI and Machine Learning with direct clinical expertise and access to patient populations.