With a COVID-19 magnified mental health crisis and growing old population (10.7% of population aged over 65 is diagnosed with Alzheimer's disease and 18% is diagnosed with mild cognitive impairment (MCI)) there is an immediate need for developing systems that can better understand and characterize cognitive and mental health (CMH) by tracking various biomarkers from functional magnetic resonance imaging (fMRI), electroencephalogram (EEG), speech, electronic health record (EHR), movement, cognitive surveys, wearable devices, structured, genomic, and epigenomic data. One of the core technical opportunities for accelerating the computational analysis of CMH lies in multimodal (MM) ML: learning representations that model the heterogeneity and interconnections between diverse input signals. MM is particularly important in CMH primarily due to the presence of noisy labels and subjectivity inherent in surveys. The utilization of multiple signals and modalities offers a potential solution to overcome these challenges.
Recently, major progress has been made in pre-trained deep and MM learning from text, speech, images, video, signals, and structured data, and there has also been initial success towards using deep learning and MM streams to improve prediction of patient status or response to treatment in CMH applications. However, there remains computational and theoretical challenges that need to be solved in machine learning for CMH, spanning