Sedentary work culture in corporate environments often results in musculoskeletal discomfort, eye strain, and repetitive stress injuries. To address these challenges, we present a novel Human-Computer Interaction (HCI) system that replaces the conventional mouse with eye blinks, head movements, and hand gestures using MediaPipe and OpenCV. Our system enables employees to perform micro-exercises during regular computer use without interrupting productivity—eye blinks are mapped to clicks, head nods to scrolling, and hand gestures to cursor control, drawing, and erasing. The proposed solution was benchmarked against five widely used pose detection frameworks—MediaPipe, OpenPose (CMU), MoveNet (TensorFlow.js), DeepLabCut, and AlphaPose—on parameters including tracking accuracy, processing speed, ease of deployment, and computational load. Results show that MediaPipe achieved the most balanced performance with 92% tracking accuracy, ~25 fps real-time responsiveness, and minimal system overhead on standard webcams, making it ideal for office deployment. OpenPose provided higher accuracy (~95%) but required GPU acceleration, resulting in significant processing delays (~8-10 fps). MoveNet delivered lightweight, browser-compatible performance (20-22 fps) but with limited keypoint coverage. DeepLabCut offered high customizability and ~93% accuracy but required dataset-specific training, making it less suitable for generic office applications. AlphaPose performed competitively (~94% accuracy, ~18 fps) but required higher-end hardware for stable operation. To validate ergonomic benefits, a post-usage ergonomic comfort chart (System Usability Scale + self-reported musculoskeletal feedback) was collected from 10 IT professionals over a two-week trial. Findings indicated 40% reduction in reported neck stiffness, 35% lower hand fatigue, and 25% improved eye engagement compared to baseline mouse usage. Users rated the system at 78/100 on usability, supporting its dual role in productivity and workplace health. This work demonstrates that webcam-based multimodal mouse replacement can be a low-cost, scalable, and health-oriented alternative to conventional input devices. Beyond corporate ergonomics, the approach has strong potential for accessibility in assistive technologies and as a CSR initiative for inclusive digital workplaces.
| Published in | American Journal of Artificial Intelligence (Volume 9, Issue 2) |
| DOI | 10.11648/j.ajai.20250902.24 |
| Page(s) | 242-257 |
| Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
| Copyright |
Copyright © The Author(s), 2025. Published by Science Publishing Group |
Human-Computer Interaction (HCI), MediaPipe, OpenPose, MoveNet, Ergonomics, Corporate Health
| [1] | M. E. T. Widanarko, A. Legg, and D. Stevenson, “The impact of computer use on musculoskeletal symptoms among office workers: A systematic review,” Applied Ergonomics, vol. 45, no. 6, pp. 1706-1717, 2014. |
| [2] | P. Buckle and J. Devereux, “The nature of work-related neck and upper limb musculoskeletal disorders,” Applied Ergonomics, vol. 33, no. 3, pp. 207-217, 2002. |
| [3] | K. Wahlström, “Ergonomics, musculoskeletal disorders and computer work,” Occupational Medicine, vol. 55, no. 3, pp. 168-176, 2005. |
| [4] | J. A. Jacko, Human-Computer Interaction: Design Issues, Solutions, and Applications. CRC Press, 2009. |
| [5] | F. Lugaresi et al., “MediaPipe: A framework for building perception pipelines,” arXiv preprint arXiv: 1906.08172, 2019. |
| [6] | Z. Cao, G. Hidalgo, T. Simon, S. E. Wei, and Y. Sheikh, “OpenPose: Realtime multi-person 2D pose estimation using part affinity fields,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172-186, 2021. |
| [7] | A. Google AI Blog, “MoveNet: Ultra fast and accurate pose detection model,” Google AI Blog, 2021. [Online]. Available: |
| [8] | A. Mathis et al., “DeepLabCut: Markerless pose estimation of user-defined body parts with deep learning,” Nature Neuroscience, vol. 21, no. 9, pp. 1281-1289, 2018. |
| [9] | H. Fang et al., “AlphaPose: Whole-body regional multi-person pose estimation and tracking in real-time,” IEEE Trans. Image Process., vol. 29, pp. 6966-6981, 2020. |
| [10] | E. Dennerlein and D. Johnson, “Integrating ergonomics and health: A model for workplace intervention,” Occupational Health Science, vol. 1, pp. 3-22, 2017. |
| [11] | Z. Gao et al., “A systematic survey on human pose estimation: Upstream and downstream tasks,” Artif. Intell. Rev., 2025. SpringerLink. |
| [12] | A. F. R. Nogueira et al., “Markerless multi-view 3D human pose estimation: A survey,” Image and Vision Computing, 2025. ScienceDirect. |
| [13] | F. Roggio et al., “A comprehensive analysis of machine-learning pose-estimation methods,” Frontiers in Bioengineering and Biotechnology, 2024. PMC. |
| [14] | G. Goyal et al., “MoveEnet: Online high-frequency human pose estimation with an event camera,” in CVPR Workshops, 2023. (MoveNet context and efficiency notes.) CVF Open Access. |
| [15] | S. Ye et al., “SuperAnimal pretrained pose estimation models for multiple species,” Nature Communications, 2024. (DeepLabCut Model Zoo.) Nature. |
| [16] | M. Gil-Martín et al., “Hand gesture recognition using MediaPipe landmarks with reduced feature sets,” in VISIGRAPP 2025, SciTePress, 2025. Scitepress. |
| [17] | A. Gupta et al., “AI-Enhanced Virtual Mouse Using Hand Gesture Recognition,” in Advances in Intelligent Systems & Computing, Springer, 2025. SpringerLink. |
| [18] | M. Hemalatha, V. Sreeja, and S. Aswathi, “Hand Gesture Controlled Virtual Mouse,” International Journal of Advanced Research in Computer and Communication Engineering, vol. 13, no. 3, pp. 885-888, Mar. 2024, |
| [19] | Manisha Kasar, Pranoti Kavimandan, Trupti Suryawanshi & Sudarshana Abbad (12 Feb 2024): AI-based real-time hand gesture-controlled virtual mouse, Australian Journal of Electrical and Electronics Engineering, |
| [20] | C. Dewi et al., “Adjusting eye aspect ratio for strong eye blink detection based on facial landmarks,” PeerJ Computer Science, 2022. PeerJ. |
| [21] | D. Santos et al., “Real-time eye blink detection using computer vision and machine learning,” Preprints.org, 2024. (MediaPipe Face Mesh + EAR pipeline.) Preprints. |
| [22] | D. F. Santos, “Real-time Eye Blink Detection using Computer Vision and Machine Learning,” Preprints.org, 03 Oct. 2024. [Online]. Available: |
| [23] | P. Tian et al., “Head Zoom: Hands-free zooming and panning for 2D content,” arXiv, 2025. (Head-based navigation with user study.) arXiv. |
| [24] | D. Verma et al., “Human-Computer Interaction Through Eye Ball,” 2025. (Multimodal eye/mouth/head control; nose-based scrolling.) |
APA Style
Darji, B. P., Parikh, P. (2025). Work + Micro-Exercise: A Multi-Modal Human-Computer Interaction System Using MediaPipe for Mouse Replacement and Ergonomic Well-Being. American Journal of Artificial Intelligence, 9(2), 242-257. https://doi.org/10.11648/j.ajai.20250902.24
ACS Style
Darji, B. P.; Parikh, P. Work + Micro-Exercise: A Multi-Modal Human-Computer Interaction System Using MediaPipe for Mouse Replacement and Ergonomic Well-Being. Am. J. Artif. Intell. 2025, 9(2), 242-257. doi: 10.11648/j.ajai.20250902.24
@article{10.11648/j.ajai.20250902.24,
author = {Bhavya Pinakin Darji and Priyam Parikh},
title = {Work + Micro-Exercise: A Multi-Modal Human-Computer Interaction System Using MediaPipe for Mouse Replacement and Ergonomic Well-Being
},
journal = {American Journal of Artificial Intelligence},
volume = {9},
number = {2},
pages = {242-257},
doi = {10.11648/j.ajai.20250902.24},
url = {https://doi.org/10.11648/j.ajai.20250902.24},
eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajai.20250902.24},
abstract = {Sedentary work culture in corporate environments often results in musculoskeletal discomfort, eye strain, and repetitive stress injuries. To address these challenges, we present a novel Human-Computer Interaction (HCI) system that replaces the conventional mouse with eye blinks, head movements, and hand gestures using MediaPipe and OpenCV. Our system enables employees to perform micro-exercises during regular computer use without interrupting productivity—eye blinks are mapped to clicks, head nods to scrolling, and hand gestures to cursor control, drawing, and erasing. The proposed solution was benchmarked against five widely used pose detection frameworks—MediaPipe, OpenPose (CMU), MoveNet (TensorFlow.js), DeepLabCut, and AlphaPose—on parameters including tracking accuracy, processing speed, ease of deployment, and computational load. Results show that MediaPipe achieved the most balanced performance with 92% tracking accuracy, ~25 fps real-time responsiveness, and minimal system overhead on standard webcams, making it ideal for office deployment. OpenPose provided higher accuracy (~95%) but required GPU acceleration, resulting in significant processing delays (~8-10 fps). MoveNet delivered lightweight, browser-compatible performance (20-22 fps) but with limited keypoint coverage. DeepLabCut offered high customizability and ~93% accuracy but required dataset-specific training, making it less suitable for generic office applications. AlphaPose performed competitively (~94% accuracy, ~18 fps) but required higher-end hardware for stable operation. To validate ergonomic benefits, a post-usage ergonomic comfort chart (System Usability Scale + self-reported musculoskeletal feedback) was collected from 10 IT professionals over a two-week trial. Findings indicated 40% reduction in reported neck stiffness, 35% lower hand fatigue, and 25% improved eye engagement compared to baseline mouse usage. Users rated the system at 78/100 on usability, supporting its dual role in productivity and workplace health. This work demonstrates that webcam-based multimodal mouse replacement can be a low-cost, scalable, and health-oriented alternative to conventional input devices. Beyond corporate ergonomics, the approach has strong potential for accessibility in assistive technologies and as a CSR initiative for inclusive digital workplaces.
},
year = {2025}
}
TY - JOUR T1 - Work + Micro-Exercise: A Multi-Modal Human-Computer Interaction System Using MediaPipe for Mouse Replacement and Ergonomic Well-Being AU - Bhavya Pinakin Darji AU - Priyam Parikh Y1 - 2025/11/12 PY - 2025 N1 - https://doi.org/10.11648/j.ajai.20250902.24 DO - 10.11648/j.ajai.20250902.24 T2 - American Journal of Artificial Intelligence JF - American Journal of Artificial Intelligence JO - American Journal of Artificial Intelligence SP - 242 EP - 257 PB - Science Publishing Group SN - 2639-9733 UR - https://doi.org/10.11648/j.ajai.20250902.24 AB - Sedentary work culture in corporate environments often results in musculoskeletal discomfort, eye strain, and repetitive stress injuries. To address these challenges, we present a novel Human-Computer Interaction (HCI) system that replaces the conventional mouse with eye blinks, head movements, and hand gestures using MediaPipe and OpenCV. Our system enables employees to perform micro-exercises during regular computer use without interrupting productivity—eye blinks are mapped to clicks, head nods to scrolling, and hand gestures to cursor control, drawing, and erasing. The proposed solution was benchmarked against five widely used pose detection frameworks—MediaPipe, OpenPose (CMU), MoveNet (TensorFlow.js), DeepLabCut, and AlphaPose—on parameters including tracking accuracy, processing speed, ease of deployment, and computational load. Results show that MediaPipe achieved the most balanced performance with 92% tracking accuracy, ~25 fps real-time responsiveness, and minimal system overhead on standard webcams, making it ideal for office deployment. OpenPose provided higher accuracy (~95%) but required GPU acceleration, resulting in significant processing delays (~8-10 fps). MoveNet delivered lightweight, browser-compatible performance (20-22 fps) but with limited keypoint coverage. DeepLabCut offered high customizability and ~93% accuracy but required dataset-specific training, making it less suitable for generic office applications. AlphaPose performed competitively (~94% accuracy, ~18 fps) but required higher-end hardware for stable operation. To validate ergonomic benefits, a post-usage ergonomic comfort chart (System Usability Scale + self-reported musculoskeletal feedback) was collected from 10 IT professionals over a two-week trial. Findings indicated 40% reduction in reported neck stiffness, 35% lower hand fatigue, and 25% improved eye engagement compared to baseline mouse usage. Users rated the system at 78/100 on usability, supporting its dual role in productivity and workplace health. This work demonstrates that webcam-based multimodal mouse replacement can be a low-cost, scalable, and health-oriented alternative to conventional input devices. Beyond corporate ergonomics, the approach has strong potential for accessibility in assistive technologies and as a CSR initiative for inclusive digital workplaces. VL - 9 IS - 2 ER -