Research Article | | Peer-Reviewed

Structure-guided Genome-wide Association Analysis of ALK Variants with GWAS Data Using R

Received: 5 November 2025     Accepted: 18 November 2025     Published: 11 December 2025
Views:       Downloads:
Abstract

Anaplastic lymphoma kinase (ALK) has been linked to several hematological malignancies; however, its comprehensive genetic variability and potential disease associations are not fully understood. In this study, a structure-guided genome-wide association analysis (GWAS) of ALK variants was performed using publicly available summary statistics and R-based analytical pipelines. The GWAS datasets were acquired, filtered, and ranked based on sample size to ensure sufficient statistical power. A focused analysis on two distinct datasets, which were selected based on sample size and phenotypic diversity: one representing lymphoma-related genetic traits from the UK Biobank, and another capturing ALK-associated proteomic variation. Rigorous quality control and comprehensive data visualization were performed using a set of diagnostic and analytical plots, including volcano plots, QQ plots, histograms, size effects, and a correlation matrix heatmap of numerical variables. Regional Manhattan plots highlighted distinct, highly significant associations at the ALK locus in both datasets, enabling the identification of independent lead variants. Interpretation of the QQ plots and histograms confirmed adequate control for population stratification and minimal inflation of test statistics. Integration of insights from the effect size distribution and SE versus Beta plots provided a clear assessment of the precision and reliability of estimated genetic effects. By mapping genetic variants onto the ALK protein structure, single-nucleotide polymorphisms (SNPs) with potential functional relevance and evaluating their associations with disease phenotypes across populations were prioritized. This strategy facilitates the identification of variants likely to influence protein structure and function, thereby enhancing the interpretability of GWAS findings in a protein-centric context. This approach demonstrates the power of integrating structural bioinformatics with statistical genetics to reveal novel genotype-phenotype relationships, offering valuable insights for precision medicine and targeted ALK-directed therapies. Overall, this integrative methodology establishes a reproducible framework for detailed regional GWAS analyses, successfully pinpointing strong ALK locus associations and identifying candidate variants for subsequent functional validation relevant to the phenotypes, and assessing their potential role in therapeutic investigation for hematological malignancies.

Published in Computational Biology and Bioinformatics (Volume 13, Issue 2)
DOI 10.11648/j.cbb.20251302.13
Page(s) 72-85
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

GWAS, ALK Gene Variants, Multiomics Data Integration, BioR Database, Bioinformatics Integration, Variant Prioritization

References
[1] Kiełbowski, K., Żychowska, J., & Becht, R. (2023). Anaplastic lymphoma kinase inhibitors—a review of anticancer properties, clinical efficacy, and resistance mechanisms. Frontiers in Pharmacology, 14.
[2] Rosswog, C., Fassunke, J., Ernst, A., et al. (2023). Genomic ALK alterations in primary and relapsed neuroblastoma. British Journal of Cancer, 128(8), 1559–1571.
[3] Yadav, V., Reang, J., Vinita, N., Sharma, P. C., Sharma, K., Kumar, D., & Tonk, R. K. (2024). Insight into systematic development of ALK (anaplastic lymphoma kinase) inhibitors towards NSCLC treatment. European Journal of Medicinal Chemistry Reports, 10, 100142.
[4] Shreenivas, A., Janku, F., Gouda, M. A., Chen, H., George, B., Kato, S., & Kurzrock, R. (2023). ALK fusions in the pan-cancer setting: another tumor-agnostic target? Npj Precision Oncology, 7(1), 101.
[5] Saifullah, N., & Tsukahara, T. (2022). Integrated analysis of the clinical consequence and associated gene expression of ALK in ALK-positive human cancers. Heliyon, 8(7), e09878.
[6] Kontou, P. I., & Bagos, P. G. (2024). The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Mining, 17(1), 31.
[7] Huang, J., McLean, G. R., & Franke, A. (2025b). Twenty years of genome-wide association studies: Health translation challenges and AI opportunities. European Journal of Human Genetics.
[8] Xue, C., & Zhou, M. (2025). Integrating Proteomics and GWAS to Identify Key Tissues and Genes Underlying Human Complex Diseases. Biology, 14(5), 554.
[9] Korshevniuk, M., Westra, H., Oelen, R., et al. (2025). Optimized summary-statistic-based single-cell eQTL meta-analysis. Scientific Reports, 15(1), 28407.
[10] Tang, L. (2023). GWAS and eQTL disparity. Nature Methods, 20(12), 1873.
[11] Chirmade, S., Wang, Z., Mastromatteo, S., et al. (2025). GWAS SVatalog: a visualization tool to aid fine-mapping of GWAS loci with structural variations. Heredity.
[12] He, J., & Gai, J. (2023). Genome-Wide Association Studies (GWAS). Methods in Molecular Biology, 2638, 123–146.
[13] Kaivola, K., Chia, R., Ding, J., Rasheed, M., et al. (2023). Genome-wide structural variant analysis identifies risk loci for non-Alzheimer’s dementias. Cell Genomics, 3(6), 100316.
[14] Harris, L., McDonagh, E. M., Zhang, X., Fawcett, K., Foreman, A., et al. (2024). Genome-wide association testing beyond SNPs. Nature Reviews Genetics.
[15] Villa, M., Malighetti, F., Sala, E., et al. (2024). New pan-ALK inhibitor-resistant EML4:: ALK mutations detected by liquid biopsy in lung cancer patients. Npj Precision Oncology, 8(1).
[16] Du, J., Gong, X., Huang, R., Zheng, B., Chen, C., & Yang, Z. (2025). Harnessing CRISPR/Cas9 to overcome targeted therapy resistance in non?small cell lung cancer: Advances and challenges (Review). Oncology Reports, 54(3).
[17] Villa, M., Malighetti, F., Sala, E. et al. (2024b). New pan-ALK inhibitor-resistant EML4:: ALK mutations detected by liquid biopsy in lung cancer patients. Npj Precision Oncology, 8(1), 29.
[18] Wu, Y., Zheng, Z., Thibaut, L., Goddard, M. E., Wray, N. R., Visscher, P. M.,& Zemg, J. (2024). Genome-wise fine-mapping improves identification of causal variants. bioRxiv (Cold Spring Harbor Laboratory).
Cite This Article
  • APA Style

    Bandbe, T., Johri, V., Kumari, U. (2025). Structure-guided Genome-wide Association Analysis of ALK Variants with GWAS Data Using R. Computational Biology and Bioinformatics, 13(2), 72-85. https://doi.org/10.11648/j.cbb.20251302.13

    Copy | Download

    ACS Style

    Bandbe, T.; Johri, V.; Kumari, U. Structure-guided Genome-wide Association Analysis of ALK Variants with GWAS Data Using R. Comput. Biol. Bioinform. 2025, 13(2), 72-85. doi: 10.11648/j.cbb.20251302.13

    Copy | Download

    AMA Style

    Bandbe T, Johri V, Kumari U. Structure-guided Genome-wide Association Analysis of ALK Variants with GWAS Data Using R. Comput Biol Bioinform. 2025;13(2):72-85. doi: 10.11648/j.cbb.20251302.13

    Copy | Download

  • @article{10.11648/j.cbb.20251302.13,
      author = {Tanmay Bandbe and Vineeta Johri and Uma Kumari},
      title = {Structure-guided Genome-wide Association Analysis of ALK Variants with GWAS Data Using R},
      journal = {Computational Biology and Bioinformatics},
      volume = {13},
      number = {2},
      pages = {72-85},
      doi = {10.11648/j.cbb.20251302.13},
      url = {https://doi.org/10.11648/j.cbb.20251302.13},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cbb.20251302.13},
      abstract = {Anaplastic lymphoma kinase (ALK) has been linked to several hematological malignancies; however, its comprehensive genetic variability and potential disease associations are not fully understood. In this study, a structure-guided genome-wide association analysis (GWAS) of ALK variants was performed using publicly available summary statistics and R-based analytical pipelines. The GWAS datasets were acquired, filtered, and ranked based on sample size to ensure sufficient statistical power. A focused analysis on two distinct datasets, which were selected based on sample size and phenotypic diversity: one representing lymphoma-related genetic traits from the UK Biobank, and another capturing ALK-associated proteomic variation. Rigorous quality control and comprehensive data visualization were performed using a set of diagnostic and analytical plots, including volcano plots, QQ plots, histograms, size effects, and a correlation matrix heatmap of numerical variables. Regional Manhattan plots highlighted distinct, highly significant associations at the ALK locus in both datasets, enabling the identification of independent lead variants. Interpretation of the QQ plots and histograms confirmed adequate control for population stratification and minimal inflation of test statistics. Integration of insights from the effect size distribution and SE versus Beta plots provided a clear assessment of the precision and reliability of estimated genetic effects. By mapping genetic variants onto the ALK protein structure, single-nucleotide polymorphisms (SNPs) with potential functional relevance and evaluating their associations with disease phenotypes across populations were prioritized. This strategy facilitates the identification of variants likely to influence protein structure and function, thereby enhancing the interpretability of GWAS findings in a protein-centric context. This approach demonstrates the power of integrating structural bioinformatics with statistical genetics to reveal novel genotype-phenotype relationships, offering valuable insights for precision medicine and targeted ALK-directed therapies. Overall, this integrative methodology establishes a reproducible framework for detailed regional GWAS analyses, successfully pinpointing strong ALK locus associations and identifying candidate variants for subsequent functional validation relevant to the phenotypes, and assessing their potential role in therapeutic investigation for hematological malignancies.},
     year = {2025}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Structure-guided Genome-wide Association Analysis of ALK Variants with GWAS Data Using R
    AU  - Tanmay Bandbe
    AU  - Vineeta Johri
    AU  - Uma Kumari
    Y1  - 2025/12/11
    PY  - 2025
    N1  - https://doi.org/10.11648/j.cbb.20251302.13
    DO  - 10.11648/j.cbb.20251302.13
    T2  - Computational Biology and Bioinformatics
    JF  - Computational Biology and Bioinformatics
    JO  - Computational Biology and Bioinformatics
    SP  - 72
    EP  - 85
    PB  - Science Publishing Group
    SN  - 2330-8281
    UR  - https://doi.org/10.11648/j.cbb.20251302.13
    AB  - Anaplastic lymphoma kinase (ALK) has been linked to several hematological malignancies; however, its comprehensive genetic variability and potential disease associations are not fully understood. In this study, a structure-guided genome-wide association analysis (GWAS) of ALK variants was performed using publicly available summary statistics and R-based analytical pipelines. The GWAS datasets were acquired, filtered, and ranked based on sample size to ensure sufficient statistical power. A focused analysis on two distinct datasets, which were selected based on sample size and phenotypic diversity: one representing lymphoma-related genetic traits from the UK Biobank, and another capturing ALK-associated proteomic variation. Rigorous quality control and comprehensive data visualization were performed using a set of diagnostic and analytical plots, including volcano plots, QQ plots, histograms, size effects, and a correlation matrix heatmap of numerical variables. Regional Manhattan plots highlighted distinct, highly significant associations at the ALK locus in both datasets, enabling the identification of independent lead variants. Interpretation of the QQ plots and histograms confirmed adequate control for population stratification and minimal inflation of test statistics. Integration of insights from the effect size distribution and SE versus Beta plots provided a clear assessment of the precision and reliability of estimated genetic effects. By mapping genetic variants onto the ALK protein structure, single-nucleotide polymorphisms (SNPs) with potential functional relevance and evaluating their associations with disease phenotypes across populations were prioritized. This strategy facilitates the identification of variants likely to influence protein structure and function, thereby enhancing the interpretability of GWAS findings in a protein-centric context. This approach demonstrates the power of integrating structural bioinformatics with statistical genetics to reveal novel genotype-phenotype relationships, offering valuable insights for precision medicine and targeted ALK-directed therapies. Overall, this integrative methodology establishes a reproducible framework for detailed regional GWAS analyses, successfully pinpointing strong ALK locus associations and identifying candidate variants for subsequent functional validation relevant to the phenotypes, and assessing their potential role in therapeutic investigation for hematological malignancies.
    VL  - 13
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Sections