In the context of the ongoing digital transformation of governmental and corporate information systems, the development of intelligent document management solutions capable of efficient processing, structuring, and analysis of textual data has become increasingly important. Particular challenges arise in the processing of multilingual data and low-resource languages, such as Tajik, due to the limited availability of annotated corpora. The aim of this study is to develop and formalize a mathematical model of an intelligent document management system based on microservice architecture and transformer-based natural language processing techniques. The proposed approach integrates a distributed microservice architecture using gRPC with a named entity recognition (NER) model based on multilingual BERT. To address data scarcity, a synthetic data generation mechanism is introduced to augment the training corpus. The NER task is formulated as a probabilistic sequence labeling problem, and the training procedure includes fine-tuning of the transformer model and comparison with baseline approaches, including rule-based methods, Conditional Random Fields (CRF), and BiLSTM-CRF models. Experimental evaluation is conducted on a curated corpus of Tajik-language documents, divided into training, validation, and test subsets. The results demonstrate that the proposed model achieves an F1-score of 0.93, outperforming all baseline methods. In addition, the system exhibits near-linear scalability under horizontal scaling conditions and ensures fault tolerance through a hybrid mechanism that switches to a rule-based extractor in case of service unavailability. The proposed model provides a scalable and robust framework for intelligent document processing systems and can be effectively applied in governmental and corporate environments undergoing digital transformation.
| Published in | Applied and Computational Mathematics (Volume 15, Issue 3) |
| DOI | 10.11648/j.acm.20261503.11 |
| Page(s) | 68-74 |
| Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
| Copyright |
Copyright © The Author(s), 2026. Published by Science Publishing Group |
Microservice Architecture, Intelligent Document Management, Mathematical Modeling, BERT, Named Entity Recognition, Distributed Systems
| [1] | Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. Attention Is All You Need. NeurIPS, 2017, pp. 5998–6008. |
| [2] | Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL, 2019, pp. 4171–4186. |
| [3] | Dragoni, N., Giallorenzo, S., Lafuente, A. L., Mazzara, M., Montesi, F., Mustafin, R., & Safina, L. Microservices: Yesterday, Today, and Tomorrow. In Present and Ulterior Software Engineering. Springer, 2017, pp. 195–216. |
| [4] | Yorov, M. R., & Komiliyon, F. S. Application of a Mass-Service System in Online Request Processing. Bulletin of the Tajik National University. Natural Sciences Series, 2023, no. 2, pp. 42–53. |
| [5] | Yorov, M. R., & Komiliyon, F. S. Ensuring Information Security of Operating Systems for Their Efficient Use. Polytechnic Bulletin. Intelligence, Innovation, Investment Series, 2022, no. 3(59), pp. 58–63. |
| [6] | Komiliyon, F. S., & Yorov, M. R. Computer Modeling of a Network Service System in Discrete Time with Inversion Order and Random Priority in the PD KOA Mode. Bulletin of the Tajik National University. Natural Sciences Series, 2020, no. 2, pp. 68–79. |
| [7] | Komiliyon, F. S., & Rahimov, M. F. Implementation of Microservice Architecture for Optimizing the Distribution of Information Resources. Science and Innovation. Geological and Technical Sciences Series, 2024, no. 2, pp. 71–79. |
| [8] | Komiliyon, F. S., & Rahimov, M. F. Microservice Architecture: From Monolith to Flexible Distributed Systems. Reports of the National Academy of Sciences of Tajikistan, 2023, vol. 66, no. 11–12, pp. 659–667. |
| [9] | Komiliyon, F. S., & Rahimov, M. F. Microservice Optimization of Information Resource Distribution Using a Clearly Defined API. In Modern Problems of Mathematical Modeling and Its Application: Proceedings of the 12th International Scientific and Practical Conference. Dushanbe, 2024, pp. 28–32. |
| [10] | Lafferty, J., McCallum, A., & Pereira, F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of ICML, 2001, pp. 282–289. |
| [11] | Mikolov, T., Chen, K., Corrado, G., & Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv: 1301.3781, 2013. |
| [12] |
Newman, S. Building Microservices: Designing Fine-Grained Systems. Sebastopol: O’Reilly Media, 2015. 280 p. Available at:
https://martinfowler.com/articles/microservices.html (accessed 15.03.2026). |
| [13] | Pennington, J., Socher, R., & Manning, C. GloVe: Global Vectors for Word Representation. Proceedings of EMNLP, 2014, pp. 1532–1543. |
| [14] | Pires, T., Schlinger, E., & Garrette, D. How Multilingual Is Multilingual BERT? ACL, 2019, pp. 4996–5001. |
| [15] | Ratner, A., Bach, S., Ehrenberg, H., et al. Snorkel: Rapid Training Data Creation. VLDB, 2017, vol. 11, no. 3, pp. 269–282. |
| [16] | Rahimov, M. F., & Komiliyon, F. S. Analysis of the Characteristics of Monolithic and Microservice Architectures. Proceedings of the National Academy of Sciences of Tajikistan. Department of Physical-Mathematical, Chemical, Geological and Technical Sciences, 2023, no. 4(193), pp. 44–54. |
| [17] | Richardson, C. Microservices Patterns: With Examples in Java. Shelter Island: Manning Publications, 2018. 520 p. |
| [18] | Tjong Kim Sang, E. F., & De Meulder, F. Introduction to the CoNLL-2003 Shared Task. Proceedings of CoNLL-2003, 2003, pp. 142–147. |
| [19] | Xu, Y., Li, M., Cui, L., et al. LayoutLM: Pre-training of Text and Layout for Document Image Understanding. KDD, 2020, pp. 1192–1200. |
| [20] | Fowler, M., & Lewis, J. Microservices: A Definition of This New Architectural Term. martinfowler.com, 2014. |
| [21] | Huang, Z., Xu, W., & Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv: 1508.01991, 2015. |
| [22] | Chiticariu, L., Li, Y., & Reiss, F. Rule-Based Information Extraction. Proceedings of EMNLP 2013, 2013, pp. 827–832. |
| [23] | Erl, T. Service-Oriented Architecture: Concepts, Technology, and Design. Upper Saddle River: Prentice Hall, 2005. 760 p. |
APA Style
Rahimi, F., Komiliyon, F. S., Rahimov, M. F., Yorov, M. R. (2026). Mathematical Modeling of an Intelligent Document Management System Based on Microservice Architecture and BERT Models. Applied and Computational Mathematics, 15(3), 68-74. https://doi.org/10.11648/j.acm.20261503.11
ACS Style
Rahimi, F.; Komiliyon, F. S.; Rahimov, M. F.; Yorov, M. R. Mathematical Modeling of an Intelligent Document Management System Based on Microservice Architecture and BERT Models. Appl. Comput. Math. 2026, 15(3), 68-74. doi: 10.11648/j.acm.20261503.11
@article{10.11648/j.acm.20261503.11,
author = {Farhod Rahimi and Fayzali Saduiioevich Komiliyon and Manuchehr Farhodovich Rahimov and Mehrdod Rahmatullovich Yorov},
title = {Mathematical Modeling of an Intelligent Document Management System Based on Microservice Architecture and BERT Models},
journal = {Applied and Computational Mathematics},
volume = {15},
number = {3},
pages = {68-74},
doi = {10.11648/j.acm.20261503.11},
url = {https://doi.org/10.11648/j.acm.20261503.11},
eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.20261503.11},
abstract = {In the context of the ongoing digital transformation of governmental and corporate information systems, the development of intelligent document management solutions capable of efficient processing, structuring, and analysis of textual data has become increasingly important. Particular challenges arise in the processing of multilingual data and low-resource languages, such as Tajik, due to the limited availability of annotated corpora. The aim of this study is to develop and formalize a mathematical model of an intelligent document management system based on microservice architecture and transformer-based natural language processing techniques. The proposed approach integrates a distributed microservice architecture using gRPC with a named entity recognition (NER) model based on multilingual BERT. To address data scarcity, a synthetic data generation mechanism is introduced to augment the training corpus. The NER task is formulated as a probabilistic sequence labeling problem, and the training procedure includes fine-tuning of the transformer model and comparison with baseline approaches, including rule-based methods, Conditional Random Fields (CRF), and BiLSTM-CRF models. Experimental evaluation is conducted on a curated corpus of Tajik-language documents, divided into training, validation, and test subsets. The results demonstrate that the proposed model achieves an F1-score of 0.93, outperforming all baseline methods. In addition, the system exhibits near-linear scalability under horizontal scaling conditions and ensures fault tolerance through a hybrid mechanism that switches to a rule-based extractor in case of service unavailability. The proposed model provides a scalable and robust framework for intelligent document processing systems and can be effectively applied in governmental and corporate environments undergoing digital transformation.},
year = {2026}
}
TY - JOUR T1 - Mathematical Modeling of an Intelligent Document Management System Based on Microservice Architecture and BERT Models AU - Farhod Rahimi AU - Fayzali Saduiioevich Komiliyon AU - Manuchehr Farhodovich Rahimov AU - Mehrdod Rahmatullovich Yorov Y1 - 2026/05/11 PY - 2026 N1 - https://doi.org/10.11648/j.acm.20261503.11 DO - 10.11648/j.acm.20261503.11 T2 - Applied and Computational Mathematics JF - Applied and Computational Mathematics JO - Applied and Computational Mathematics SP - 68 EP - 74 PB - Science Publishing Group SN - 2328-5613 UR - https://doi.org/10.11648/j.acm.20261503.11 AB - In the context of the ongoing digital transformation of governmental and corporate information systems, the development of intelligent document management solutions capable of efficient processing, structuring, and analysis of textual data has become increasingly important. Particular challenges arise in the processing of multilingual data and low-resource languages, such as Tajik, due to the limited availability of annotated corpora. The aim of this study is to develop and formalize a mathematical model of an intelligent document management system based on microservice architecture and transformer-based natural language processing techniques. The proposed approach integrates a distributed microservice architecture using gRPC with a named entity recognition (NER) model based on multilingual BERT. To address data scarcity, a synthetic data generation mechanism is introduced to augment the training corpus. The NER task is formulated as a probabilistic sequence labeling problem, and the training procedure includes fine-tuning of the transformer model and comparison with baseline approaches, including rule-based methods, Conditional Random Fields (CRF), and BiLSTM-CRF models. Experimental evaluation is conducted on a curated corpus of Tajik-language documents, divided into training, validation, and test subsets. The results demonstrate that the proposed model achieves an F1-score of 0.93, outperforming all baseline methods. In addition, the system exhibits near-linear scalability under horizontal scaling conditions and ensures fault tolerance through a hybrid mechanism that switches to a rule-based extractor in case of service unavailability. The proposed model provides a scalable and robust framework for intelligent document processing systems and can be effectively applied in governmental and corporate environments undergoing digital transformation. VL - 15 IS - 3 ER -