Research Article | | Peer-Reviewed

Mathematical Modeling of an Intelligent Document Management System Based on Microservice Architecture and BERT Models

Received: 26 March 2026     Accepted: 25 April 2026     Published: 11 May 2026
Views:       Downloads:
Abstract

In the context of the ongoing digital transformation of governmental and corporate information systems, the development of intelligent document management solutions capable of efficient processing, structuring, and analysis of textual data has become increasingly important. Particular challenges arise in the processing of multilingual data and low-resource languages, such as Tajik, due to the limited availability of annotated corpora. The aim of this study is to develop and formalize a mathematical model of an intelligent document management system based on microservice architecture and transformer-based natural language processing techniques. The proposed approach integrates a distributed microservice architecture using gRPC with a named entity recognition (NER) model based on multilingual BERT. To address data scarcity, a synthetic data generation mechanism is introduced to augment the training corpus. The NER task is formulated as a probabilistic sequence labeling problem, and the training procedure includes fine-tuning of the transformer model and comparison with baseline approaches, including rule-based methods, Conditional Random Fields (CRF), and BiLSTM-CRF models. Experimental evaluation is conducted on a curated corpus of Tajik-language documents, divided into training, validation, and test subsets. The results demonstrate that the proposed model achieves an F1-score of 0.93, outperforming all baseline methods. In addition, the system exhibits near-linear scalability under horizontal scaling conditions and ensures fault tolerance through a hybrid mechanism that switches to a rule-based extractor in case of service unavailability. The proposed model provides a scalable and robust framework for intelligent document processing systems and can be effectively applied in governmental and corporate environments undergoing digital transformation.

Published in Applied and Computational Mathematics (Volume 15, Issue 3)
DOI 10.11648/j.acm.20261503.11
Page(s) 68-74
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2026. Published by Science Publishing Group

Keywords

Microservice Architecture, Intelligent Document Management, Mathematical Modeling, BERT, Named Entity Recognition, Distributed Systems

References
[1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. Attention Is All You Need. NeurIPS, 2017, pp. 5998–6008.
[2] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL, 2019, pp. 4171–4186.
[3] Dragoni, N., Giallorenzo, S., Lafuente, A. L., Mazzara, M., Montesi, F., Mustafin, R., & Safina, L. Microservices: Yesterday, Today, and Tomorrow. In Present and Ulterior Software Engineering. Springer, 2017, pp. 195–216.
[4] Yorov, M. R., & Komiliyon, F. S. Application of a Mass-Service System in Online Request Processing. Bulletin of the Tajik National University. Natural Sciences Series, 2023, no. 2, pp. 42–53.
[5] Yorov, M. R., & Komiliyon, F. S. Ensuring Information Security of Operating Systems for Their Efficient Use. Polytechnic Bulletin. Intelligence, Innovation, Investment Series, 2022, no. 3(59), pp. 58–63.
[6] Komiliyon, F. S., & Yorov, M. R. Computer Modeling of a Network Service System in Discrete Time with Inversion Order and Random Priority in the PD KOA Mode. Bulletin of the Tajik National University. Natural Sciences Series, 2020, no. 2, pp. 68–79.
[7] Komiliyon, F. S., & Rahimov, M. F. Implementation of Microservice Architecture for Optimizing the Distribution of Information Resources. Science and Innovation. Geological and Technical Sciences Series, 2024, no. 2, pp. 71–79.
[8] Komiliyon, F. S., & Rahimov, M. F. Microservice Architecture: From Monolith to Flexible Distributed Systems. Reports of the National Academy of Sciences of Tajikistan, 2023, vol. 66, no. 11–12, pp. 659–667.
[9] Komiliyon, F. S., & Rahimov, M. F. Microservice Optimization of Information Resource Distribution Using a Clearly Defined API. In Modern Problems of Mathematical Modeling and Its Application: Proceedings of the 12th International Scientific and Practical Conference. Dushanbe, 2024, pp. 28–32.
[10] Lafferty, J., McCallum, A., & Pereira, F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of ICML, 2001, pp. 282–289.
[11] Mikolov, T., Chen, K., Corrado, G., & Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv: 1301.3781, 2013.
[12] Newman, S. Building Microservices: Designing Fine-Grained Systems. Sebastopol: O’Reilly Media, 2015. 280 p. Available at:
[13] Pennington, J., Socher, R., & Manning, C. GloVe: Global Vectors for Word Representation. Proceedings of EMNLP, 2014, pp. 1532–1543.
[14] Pires, T., Schlinger, E., & Garrette, D. How Multilingual Is Multilingual BERT? ACL, 2019, pp. 4996–5001.
[15] Ratner, A., Bach, S., Ehrenberg, H., et al. Snorkel: Rapid Training Data Creation. VLDB, 2017, vol. 11, no. 3, pp. 269–282.
[16] Rahimov, M. F., & Komiliyon, F. S. Analysis of the Characteristics of Monolithic and Microservice Architectures. Proceedings of the National Academy of Sciences of Tajikistan. Department of Physical-Mathematical, Chemical, Geological and Technical Sciences, 2023, no. 4(193), pp. 44–54.
[17] Richardson, C. Microservices Patterns: With Examples in Java. Shelter Island: Manning Publications, 2018. 520 p.
[18] Tjong Kim Sang, E. F., & De Meulder, F. Introduction to the CoNLL-2003 Shared Task. Proceedings of CoNLL-2003, 2003, pp. 142–147.
[19] Xu, Y., Li, M., Cui, L., et al. LayoutLM: Pre-training of Text and Layout for Document Image Understanding. KDD, 2020, pp. 1192–1200.
[20] Fowler, M., & Lewis, J. Microservices: A Definition of This New Architectural Term. martinfowler.com, 2014.
[21] Huang, Z., Xu, W., & Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv: 1508.01991, 2015.
[22] Chiticariu, L., Li, Y., & Reiss, F. Rule-Based Information Extraction. Proceedings of EMNLP 2013, 2013, pp. 827–832.
[23] Erl, T. Service-Oriented Architecture: Concepts, Technology, and Design. Upper Saddle River: Prentice Hall, 2005. 760 p.
Cite This Article
  • APA Style

    Rahimi, F., Komiliyon, F. S., Rahimov, M. F., Yorov, M. R. (2026). Mathematical Modeling of an Intelligent Document Management System Based on Microservice Architecture and BERT Models. Applied and Computational Mathematics, 15(3), 68-74. https://doi.org/10.11648/j.acm.20261503.11

    Copy | Download

    ACS Style

    Rahimi, F.; Komiliyon, F. S.; Rahimov, M. F.; Yorov, M. R. Mathematical Modeling of an Intelligent Document Management System Based on Microservice Architecture and BERT Models. Appl. Comput. Math. 2026, 15(3), 68-74. doi: 10.11648/j.acm.20261503.11

    Copy | Download

    AMA Style

    Rahimi F, Komiliyon FS, Rahimov MF, Yorov MR. Mathematical Modeling of an Intelligent Document Management System Based on Microservice Architecture and BERT Models. Appl Comput Math. 2026;15(3):68-74. doi: 10.11648/j.acm.20261503.11

    Copy | Download

  • @article{10.11648/j.acm.20261503.11,
      author = {Farhod Rahimi and Fayzali Saduiioevich Komiliyon and Manuchehr Farhodovich Rahimov and Mehrdod Rahmatullovich Yorov},
      title = {Mathematical Modeling of an Intelligent Document Management System Based on Microservice Architecture and BERT Models},
      journal = {Applied and Computational Mathematics},
      volume = {15},
      number = {3},
      pages = {68-74},
      doi = {10.11648/j.acm.20261503.11},
      url = {https://doi.org/10.11648/j.acm.20261503.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.20261503.11},
      abstract = {In the context of the ongoing digital transformation of governmental and corporate information systems, the development of intelligent document management solutions capable of efficient processing, structuring, and analysis of textual data has become increasingly important. Particular challenges arise in the processing of multilingual data and low-resource languages, such as Tajik, due to the limited availability of annotated corpora. The aim of this study is to develop and formalize a mathematical model of an intelligent document management system based on microservice architecture and transformer-based natural language processing techniques. The proposed approach integrates a distributed microservice architecture using gRPC with a named entity recognition (NER) model based on multilingual BERT. To address data scarcity, a synthetic data generation mechanism is introduced to augment the training corpus. The NER task is formulated as a probabilistic sequence labeling problem, and the training procedure includes fine-tuning of the transformer model and comparison with baseline approaches, including rule-based methods, Conditional Random Fields (CRF), and BiLSTM-CRF models. Experimental evaluation is conducted on a curated corpus of Tajik-language documents, divided into training, validation, and test subsets. The results demonstrate that the proposed model achieves an F1-score of 0.93, outperforming all baseline methods. In addition, the system exhibits near-linear scalability under horizontal scaling conditions and ensures fault tolerance through a hybrid mechanism that switches to a rule-based extractor in case of service unavailability. The proposed model provides a scalable and robust framework for intelligent document processing systems and can be effectively applied in governmental and corporate environments undergoing digital transformation.},
     year = {2026}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Mathematical Modeling of an Intelligent Document Management System Based on Microservice Architecture and BERT Models
    AU  - Farhod Rahimi
    AU  - Fayzali Saduiioevich Komiliyon
    AU  - Manuchehr Farhodovich Rahimov
    AU  - Mehrdod Rahmatullovich Yorov
    Y1  - 2026/05/11
    PY  - 2026
    N1  - https://doi.org/10.11648/j.acm.20261503.11
    DO  - 10.11648/j.acm.20261503.11
    T2  - Applied and Computational Mathematics
    JF  - Applied and Computational Mathematics
    JO  - Applied and Computational Mathematics
    SP  - 68
    EP  - 74
    PB  - Science Publishing Group
    SN  - 2328-5613
    UR  - https://doi.org/10.11648/j.acm.20261503.11
    AB  - In the context of the ongoing digital transformation of governmental and corporate information systems, the development of intelligent document management solutions capable of efficient processing, structuring, and analysis of textual data has become increasingly important. Particular challenges arise in the processing of multilingual data and low-resource languages, such as Tajik, due to the limited availability of annotated corpora. The aim of this study is to develop and formalize a mathematical model of an intelligent document management system based on microservice architecture and transformer-based natural language processing techniques. The proposed approach integrates a distributed microservice architecture using gRPC with a named entity recognition (NER) model based on multilingual BERT. To address data scarcity, a synthetic data generation mechanism is introduced to augment the training corpus. The NER task is formulated as a probabilistic sequence labeling problem, and the training procedure includes fine-tuning of the transformer model and comparison with baseline approaches, including rule-based methods, Conditional Random Fields (CRF), and BiLSTM-CRF models. Experimental evaluation is conducted on a curated corpus of Tajik-language documents, divided into training, validation, and test subsets. The results demonstrate that the proposed model achieves an F1-score of 0.93, outperforming all baseline methods. In addition, the system exhibits near-linear scalability under horizontal scaling conditions and ensures fault tolerance through a hybrid mechanism that switches to a rule-based extractor in case of service unavailability. The proposed model provides a scalable and robust framework for intelligent document processing systems and can be effectively applied in governmental and corporate environments undergoing digital transformation.
    VL  - 15
    IS  - 3
    ER  - 

    Copy | Download

Author Information
  • Physical-Technical Institute, National Academy of Sciences of Tajikistan, Dushanbe, Tajikistan

  • Faculty of Mathematics, Tajik National University, Dushanbe, Tajikistan

  • Institute of Mathematics, National Academy of Sciences of Tajikistan, Dushanbe, Tajikistan

  • Faculty of Mathematics, Tajik National University, Dushanbe, Tajikistan

  • Sections