TY - JOUR
T1 - Automated Speech Markers of Alzheimer Dementia
T2 - Test of Cross-Linguistic Generalizability
AU - Pérez-Toro, Paula Andrea
AU - Ferrante, Franco J.
AU - Pérez, Gonzalo
AU - Tee, Boon Lead
AU - de Leon, Jessica
AU - Nöth, Elmar
AU - Schuster, Maria
AU - Maier, Andreas
AU - Slachevsky, Andrea
AU - Gorno-Tempini, Maria Luisa
AU - Ibáñez, Agustín
AU - Orozco-Arroyave, Juan Rafael
AU - García, Adolfo
N1 - Publisher Copyright:
© Paula Andrea Pérez-Toro, Franco J Ferrante, Gonzalo Pérez, Boon Lead Tee, Jessica de Leon, Elmar Nöth, Maria Schuster, Andreas Maier, Andrea Slachevsky, Maria Luisa Gorno-Tempini, Agustín Ibáñez, Juan Rafael Orozco-Arroyave, Adolfo García.
PY - 2025
Y1 - 2025
N2 - Background: Automated speech and language analysis (ASLA) is gaining momentum as a noninvasive, affordable, and scalable approach for the early detection of Alzheimer disease (AD). Nevertheless, the literature presents 2 notable limitations. First, many studies use computationally derived features that lack clinical interpretability. Second, a significant proportion of ASLA studies have been conducted exclusively in English speakers. These shortcomings reduce the utility and generalizability of existing findings. Objective: To address these gaps, we investigated whether interpretable linguistic features can reliably identify AD both within and across language boundaries, focusing on English- and Spanish-speaking patients and healthy controls (HCs). Methods: We analyzed speech recordings from 211 participants, encompassing 117 English speakers (58 patients with AD and 59 HCs) and 94 Spanish speakers (47 patients with AD and 47 HCs). Participants completed a validated picture description task from the Boston Diagnostic Aphasia Examination, eliciting natural speech under controlled conditions. Recordings were preprocessed and transcribed before extracting (1) speech timing features (eg, pause duration, speech segment ratios, and voice rate) and (2) lexico-semantic features (lexical category ratios, semantic granularity, and semantic variability). Machine learning classifiers were trained with data from English-speaking patients and HCs, and then tested (1) in a within-language setting (with English-speaking patients and HCs) and (2) in a between-language setting (with Spanish-speaking patients and HCs). Additionally, the features were used to predict cognitive functioning as measured by the Mini-Mental State Examination (MMSE). Results: In the within-language condition, combined speech timing and lexico-semantic features yielded maximal classification (area under the receiver operating characteristic curve [AUC]=0.88), outperforming single-feature models (AUC=0.79 for timing features; AUC=0.80 for lexico-semantic features). Timing features showed the strongest MMSE prediction (R=0.43, P<.001). In the between-language condition, speech timing features generalized well to Spanish speakers (AUC=0.75) and predicted Spanish-speaking patients’ MMSE scores (R=0.39, P<.001). Lexico-semantic features showed lower performance (AUC=0.64) and no significant MMSE prediction (R=-0.31, P=.05). The combined model did not improve results (AUC=0.65; R=0.04, P=.79). Conclusions: These results suggest that while both timing and lexico-semantic features are informative within the same language, only speech timing features demonstrate consistent performance across languages. By focusing on clinically interpretable features, this approach supports the development of clinically usable ASLA tools.
AB - Background: Automated speech and language analysis (ASLA) is gaining momentum as a noninvasive, affordable, and scalable approach for the early detection of Alzheimer disease (AD). Nevertheless, the literature presents 2 notable limitations. First, many studies use computationally derived features that lack clinical interpretability. Second, a significant proportion of ASLA studies have been conducted exclusively in English speakers. These shortcomings reduce the utility and generalizability of existing findings. Objective: To address these gaps, we investigated whether interpretable linguistic features can reliably identify AD both within and across language boundaries, focusing on English- and Spanish-speaking patients and healthy controls (HCs). Methods: We analyzed speech recordings from 211 participants, encompassing 117 English speakers (58 patients with AD and 59 HCs) and 94 Spanish speakers (47 patients with AD and 47 HCs). Participants completed a validated picture description task from the Boston Diagnostic Aphasia Examination, eliciting natural speech under controlled conditions. Recordings were preprocessed and transcribed before extracting (1) speech timing features (eg, pause duration, speech segment ratios, and voice rate) and (2) lexico-semantic features (lexical category ratios, semantic granularity, and semantic variability). Machine learning classifiers were trained with data from English-speaking patients and HCs, and then tested (1) in a within-language setting (with English-speaking patients and HCs) and (2) in a between-language setting (with Spanish-speaking patients and HCs). Additionally, the features were used to predict cognitive functioning as measured by the Mini-Mental State Examination (MMSE). Results: In the within-language condition, combined speech timing and lexico-semantic features yielded maximal classification (area under the receiver operating characteristic curve [AUC]=0.88), outperforming single-feature models (AUC=0.79 for timing features; AUC=0.80 for lexico-semantic features). Timing features showed the strongest MMSE prediction (R=0.43, P<.001). In the between-language condition, speech timing features generalized well to Spanish speakers (AUC=0.75) and predicted Spanish-speaking patients’ MMSE scores (R=0.39, P<.001). Lexico-semantic features showed lower performance (AUC=0.64) and no significant MMSE prediction (R=-0.31, P=.05). The combined model did not improve results (AUC=0.65; R=0.04, P=.79). Conclusions: These results suggest that while both timing and lexico-semantic features are informative within the same language, only speech timing features demonstrate consistent performance across languages. By focusing on clinically interpretable features, this approach supports the development of clinically usable ASLA tools.
UR - https://www.scopus.com/pages/publications/105018661128
U2 - 10.2196/74200
DO - 10.2196/74200
M3 - Article
C2 - 41091545
AN - SCOPUS:105018661128
SN - 1438-8871
VL - 27
JO - Journal of Medical Internet Research
JF - Journal of Medical Internet Research
M1 - e74200
ER -