The National Center for Forensic Science (NCFS) is a State of Florida Type II Research Center at the University of Central Florida. Our mission is to provide relevant and responsive forensic science research, training and operational support to communities that rely on science to achieve justice. Our team of chemists, biochemists, physicists and statisticians work individually and in synergistic teams to perform basic and applied forensic science research. The Center also develops and curates databases and provides continuing education in support of the forensic communities.
New Publication
Exploring Scientific Literature Using Topic Modeling: A Practical Framework for Discovery and Classification
Informatics 13(2) 24 (2026)
Amir Alipour Yengejeh, Larry Tang, Candice M. Bridge and Chandra Kundu
The increasing volume and diversity of scientific publications poses challenges for scalable and interpretable topic discovery and automated document categorization. This study proposes an integrated framework that combines probabilistic topic modeling with supervised classification to support large-scale scientific literature analysis. Using 3689 abstracts from the Journal of Forensic Sciences (2009–2022), Latent Dirichlet Allocation (LDA) is applied to uncover latent thematic structures, assess topic diagnosticity across forensic disciplines, and analyze temporal research trends. Bayesian model selection with repeated resampling identifies a stable topic resolution, with the number of topics T lying in the range 83–88, yielding semantically coherent and discipline-aligned topics. The resulting document–topic representations are then used for supervised abstract classification. Across multiple models and resampling scenarios, the strongest and most stable performance is achieved under a Grouped Category configuration. In particular, XGBoost attains an Accuracy of 0.754 and a Macro-averaged F1 score of 0.737 at 𝑇=88, with comparable results at neighboring topic counts, indicating robustness to topic granularity. Overall, the proposed framework provides a reproducible, interpretable, and computationally efficient pipeline for literature organization, trend analysis, and metadata enhancement in scientific domains.
