Aims: The aim of this study was to combine nuclear magnetic resonance-based metabolomics and machine learning to find a glucose-independent molecular signature associated with future type 2 diabetes mellitus development in a subgroup of individuals from the Di@bet.es study.

Methods: The study group included 145 individuals developing type 2 diabetes mellitus during the 8-year follow-up, 145 individuals matched by age, sex and BMI who did not develop diabetes during the follow-up but had equal glucose concentrations to those who did and 145 controls matched by age and sex. A metabolomic analysis of serum was performed to obtain the lipoprotein and glycoprotein profiles and 15 low molecular weight metabolites. Several machine learning-based models were trained.

Results: Logistic regression performed the best classification between individuals developing type 2 diabetes during the follow-up and glucose-matched individuals. The area under the curve was 0.628, and its 95% confidence interval was 0.510-0.746. Glycoprotein-related variables, creatinine, creatine, small HDL particles and the Johnson-Neyman intervals of the interaction of Glyc A and Glyc B were statistically significant.

Conclusions: The model highlighted a relevant contribution of inflammation (glycosylation pattern and HDL) and muscle (creatinine and creatine) in the development of type 2 diabetes as independent factors of hyperglycemia.