Research Article
Mathematical Modeling of an Intelligent Document Management System Based on Microservice Architecture and BERT Models
Farhod Rahimi*
,
Fayzali Saduiioevich Komiliyon,
Manuchehr Farhodovich Rahimov,
Mehrdod Rahmatullovich Yorov
Issue:
Volume 15, Issue 3, June 2026
Pages:
68-74
Received:
26 March 2026
Accepted:
25 April 2026
Published:
11 May 2026
DOI:
10.11648/j.acm.20261503.11
Downloads:
Views:
Abstract: In the context of the ongoing digital transformation of governmental and corporate information systems, the development of intelligent document management solutions capable of efficient processing, structuring, and analysis of textual data has become increasingly important. Particular challenges arise in the processing of multilingual data and low-resource languages, such as Tajik, due to the limited availability of annotated corpora. The aim of this study is to develop and formalize a mathematical model of an intelligent document management system based on microservice architecture and transformer-based natural language processing techniques. The proposed approach integrates a distributed microservice architecture using gRPC with a named entity recognition (NER) model based on multilingual BERT. To address data scarcity, a synthetic data generation mechanism is introduced to augment the training corpus. The NER task is formulated as a probabilistic sequence labeling problem, and the training procedure includes fine-tuning of the transformer model and comparison with baseline approaches, including rule-based methods, Conditional Random Fields (CRF), and BiLSTM-CRF models. Experimental evaluation is conducted on a curated corpus of Tajik-language documents, divided into training, validation, and test subsets. The results demonstrate that the proposed model achieves an F1-score of 0.93, outperforming all baseline methods. In addition, the system exhibits near-linear scalability under horizontal scaling conditions and ensures fault tolerance through a hybrid mechanism that switches to a rule-based extractor in case of service unavailability. The proposed model provides a scalable and robust framework for intelligent document processing systems and can be effectively applied in governmental and corporate environments undergoing digital transformation.
Abstract: In the context of the ongoing digital transformation of governmental and corporate information systems, the development of intelligent document management solutions capable of efficient processing, structuring, and analysis of textual data has become increasingly important. Particular challenges arise in the processing of multilingual data and low-...
Show More