Case study on data simulation for academic research using Generative Artificial Intelligence and Google Colab
DOI:
https://doi.org/10.62697/rmiie.v4iS1.143Keywords:
Simulated data, generative artificial intelligence, data validation, Python, ChatGPT, prompts, t-StudentAbstract
A case study on data simulation for academic research is presented, where with Generative Artificial Intelligence (IAG) and the ChatGPT model it is possible to obtain a set of simulated data with the appropriate prompts. Google Colab for Python was used for data validation. The methodology used focuses on comparing two sets of data, one that is real and another that is not, but constructed under parameters of the primary data such as the mean, standard deviation and amount of data. Specific Paython libraries such as numpy and ttest_ind were used for the analysis of statistics such as T-Student, and others such as matplotlib and seaborn for density plots. The results show that the simulated data was closely related to the real data. It was shown that the statistical differences were not significant, demonstrating the usefulness of the methodology used.
Downloads
References
Apellániz, P. A., Jiménez, A., Borja Arroyo, G., Parras, J., & Zazo, S. (2024). Synthetic Tabular Data Validation: A Divergence-Based Approach. IEEE Access, 12. https://doi.org/10.1109/ACCESS.2024.3434582
Feuerriegel, S., Hartmann, J., Janiesch, C., & Zschech, P. (2024). Generative AI. Business & Information Systems Engineering, 66(1), 111-126. https://doi.org/10.1007/s12599-023-00834-7
Garrido Elustondo, S., Cabello Ballesteros, L., Galende Domínguez, I., Riesgo Fuertes, R., Rodríguez Barrientos, R., & Polentinos Castro, E. (2012). Investigación y protección de datos personales en atención primaria. Atención Primaria, 44(3), 172-177. https://doi.org/10.1016/j.aprim.2011.02.009
Gordon, B. G. (2020). Vulnerability in Research: Basic Ethical Concepts and General Approach to Review. Ochsner Journal, 20(1), 34-38. https://doi.org/10.31486/toj.19.0079
Kwok, C. S., Muntean, E.-A., Mallen, C. D., & Borovac, J. A. (2022). Data Collection Theory in Healthcare Research: The Minimum Dataset in Quantitative Studies. Clinics and Practice, 12(6), 832-844. https://doi.org/10.3390/clinpract12060088
López Guerrero, M. M., López Guerrero, G., & Rojano Ramos, S. (2018). Uso de un simulador para facilitar el aprendizaje de las reacciones de óxido-reducción. Estudio de caso Universidad de Málaga. Educación Química, 29(3), 79-98. https://doi.org/10.22201/fq.18708404e.2018.3.63728
López, R. G. (2024). Biblioguías: Datos de investigación: Los datos de investigación. Datos de investigación: Los datos de investigación. https://uah-es.libguides.com/c.php?g=664167&p=5165387
Machuca Martínez, F. (2020). Importancia de los datos científicos y su publicación como artículo de datos. Ingeniería y Competitividad, 22(1). https://doi.org/10.25100/iyc.v22i1.8843
Marín Arraiza, P., Puerta Díaz, M., & Vidotti, S. G. (2019). Gestión de datos de investigación y bibliotecas: Preservando los nuevos bienes científicos. Hipertext.net, 19, 13-31. https://doi.org/10.31009/hipertext.net.2019.i19.02
Marin, J. (2024). Evaluating Synthetically Generated Data from Small Sample Sizes: An Experimental Study, arXiv. https://doi.org/10.48550/arXiv.2211.10760
Morillo Moreno, J. C. (2024). Guías de la BUH: Datos de investigación: Beneficios de la gestión de datos de investigación. https://guiasbuh.uhu.es/datosinvestigacion/beneficios
Sánchez Vásquez, U., Daniel Guerrero, A. B., Méndez Gutiérrez, E., Morales López, S., Tovar Lozano, I. I., Martínez-Rodríguez, M. A., Uribe-Campos, I. E., Sánchez-Vásquez, U., Daniel-Guerrero, A. B., Méndez-Gutiérrez, E., Morales-López, S., Tovar-Lozano, I. I., Martínez-Rodríguez, M. A., & Uribe-Campos, I. E. (2021). Diseño, elaboración y validación de un simulador realista y de bajo costo para exploración cardiaca. Gaceta médica de México, 157(1), 25-29. https://doi.org/10.24875/gmm.20005688
Sengar, S. S., Hasan, A. B., Kumar, S., & Carroll, F. (2024). Generative Artificial Intelligence: A Systematic Review and Applications. arXiv. https://doi.org/10.48550/arXiv.2405.11029
Vélez Torres, Á. (2019). Modelación y simulación basada en agentes en ciencias sociales: Una aproximación al estado del arte. Polis (Santiago), 18(53), 282-308. https://doi.org/10.32735/s0718-6568/2019-n53-1392
Vilches, C. (2024). Biblioguias: Gestión de datos de investigación: Protección de los datos. Gestión de Datos de Investigación. https://biblioguias.cepal.org/c.php?g=495473&p=4398118
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Miguel Ángel Fernández-Marín, Jordy Rafael Montero-Murillo, Débora González-Tolmo

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who publish in Revista Mexicana de Investigación e Intervención Educativa (RMIIE), of Universidad Pablo Latapí Sarre agree to the following terms:
1. Copyright
Authors retain unrestricted copyright to their work. Authors grant the journal the right of first publication. To this end, they assign the journal non-exclusive exploitation rights (reproduction, distribution, public communication, and transformation). Authors may enter into additional agreements for the non-exclusive distribution of the version of the work published in the journal, provided that acknowledgment of its initial publication in this journal is given.
© The authors.
2. License
The articles are published in the journal under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). The terms can be found at: https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en
This license allows:
- Sharing: Copying and redistributing the material in any medium or format.
- Adapting: Remixing, transforming, and building upon the material.
Under the following terms:
- Attribution: You must give appropriate credit, provide a link to the license, and indicate if any changes were made. You may do this in any reasonable manner, but not in any way that suggests the licensor endorses or sponsors your use.
- NonCommercial: You may not use the material for commercial purposes.
- ShareAlike: If you remix, transform, or build upon the material, you must distribute your creation under the same license as the original work.
There are no additional restrictions. You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.