Using classification and prediction algorithms to process surveys in the Altair® RapidMiner data mining software
DOI:
https://doi.org/10.47606/ACVEN/PH0319Palabras clave:
Data, productivity, big data, marketing, population behaviorResumen
Currently, analyzing and processing surveys to look for patterns has become essential to solve and find new product distribution strategies. For this reason, surveys have been analyzed using classification and prediction algorithms to look for patterns. To do this, the free software Altair® RapidMiner Studio Version 2024 was applied to the information extracted from surveys carried out through Google Forms and distributed online throughout the country. The surveys consisted of 30 questions, most of which were multiple choice. AdaBoost, Naive Bayes and deep learning algorithms were used to classify, analyze and find patterns between the questions. Thus, the vaccines used varied according to the age groups and the media in which the advertisements were shown. In conclusion, this tool is considered easy to use due to its simplicity, as it offers algorithms that allow accurate classification and prediction of surveys, as well as the search and visualization of patterns.
Descargas
Citas
Andronie, M., Lazaroiu, G., Iatagan, M., Hurloiu, I., Stefanescu, R., Dijmarescu, A., & Dijmarescu, I. (2023). Big data management algorithms, deep learning-based object detection technologies, and geospatial simulation and sensor fusion tools in the Internet of Robotic Things. ISPRS International Journal of Geo-Information, 12(2), 35. https://doi.org/10.3390/ijgi12020035
Batt, S., Grealis, T., Harmon, O., & Tomolonis, P. (2020). Learning Tableau: A data visualization tool. The Journal of Economic Education, 51(3–4), 317–328. https://doi.org/10.1080/00220485.2020.1804503
Batool, S., Rashid, J., Nisar, M. W., Kim, J., Kwon, H.-Y., & Hussain, A. (2023). Educational data mining to predict students’ academic performance: A survey study. Education and Information Technologies, 28(1), 905–971. https://doi.org/10.1007/s10639-022-11152-y
Delgado-Demera, M. H., Proaño–Morales, J. J., Delgado-Demera, M. M., Burgos–Briones, G. A., & Cedeño–Palacios, C. A. (2023). Evaluación de riesgos sanitarios en el Centro de Faenamiento Municipal de Portoviejo – Manabí, Ecuador. Revista Científica De La Facultad De Ciencias Veterinarias De La Universidad Del Zulia, 33(2), 1-7. https://doi.org/10.52973/rcfcv-e33256
Dong, Z., Fang, Y., Wang, X., Zhao, Y., & Wang, Q. (2015). Hydrophobicity classification of polymeric insulators based on embedded methods. Materials Research, 18(1), 127–137. https://doi.org/10.1590/1516-1439.286414
Egert, J., & Kreutz, C. (2023). Rcall: An R interface for MATLAB. SoftwareX, 21(101276), 101276. https://doi.org/10.1016/j.softx.2022.101276
Estrada-Danell, R. I., Zamarripa-Franco, R. A., Zúñiga-Garay, P. G., & Martínez-Trejo, I. (2016). Aportaciones desde la minería de datos al proceso de captación de matrícula en Instituciones de Educación Superior particulares. Revista Electrónica Educare, 20(3), 1. https://doi.org/10.15359/ree.20-3.11
Fernández Morales, M. E., & Bonilla Carrión, R. (2020). Bibliominería, datos y el proceso de toma de decisiones. Revista interamericana de bibliotecologia, 43(2), eI8. https://doi.org/10.17533/udea.rib.v43n2ei8
Fernández-Fontelo, A., Kieslich, P. J., Henninger, F., Kreuter, F., & Greven, S. (2023). Predicting question difficulty in web surveys: A machine learning approach based on mouse movement features. Social Science Computer Review, 41(1), 141–162. https://doi.org/10.1177/08944393211032950
Hao, L., & Huang, G. (2023). An improved AdaBoost algorithm for identification of lung cancer based on electronic nose. Heliyon, 9(3), e13633. https://doi.org/10.1016/j.heliyon.2023.e13633
Heaton, J. (2018). Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep learning: The MIT Press, 2016, 800 pp, ISBN: 0262035618. Genetic Programming and Evolvable Machines, 19(1–2), 305–307. https://doi.org/10.1007/s10710-017-9314-z
Irvine, A., Luke, D., Harrild, F., Gandy, S., & Watts, R. (2023). Transpersonal ecodelia: Surveying psychedelically induced biophilia. Psychoactives, 2(2), 174–193. https://doi.org/10.3390/psychoactives2020012
Ji, H., Wu, G., Zhao, Y., Wang, S., Wang, G., & Yuan, G. Y. (2023). joinTree: A novel join-oriented multivariate operator for spatio-temporal data management in Flink. GeoInformatica, 27(1), 107–132. https://doi.org/10.1007/s10707-022-00470-5
Khan, A., Khan, S. H., Saif, M., Batool, A., Sohail, A., & Waleed Khan, M. (2023). A Survey of Deep Learning Techniques for the Analysis of COVID-19 and their usability for Detecting Omicron. Journal of Experimental & Theoretical Artificial Intelligence: JETAI, 1–43. https://doi.org/10.1080/0952813x.2023.2165724
Kotu, V., & Deshpande, B. (2014). Predictive Analytics and Data Mining: Concepts and practice with Altair® RapidMiner. Morgan Kaufmann. ISBN: 9780128016503
Latif, A., Fairdous, R., Akhtar, R., & Ambreen, M. (2023). Exploring the impact of Big Data Analytics on organizational decision-making and performance: Insights from Pakistan’s industrial sector. Pakistan Journal of Humanities and Social Sciences, 11(2). https://doi.org/10.52131/pjhss.2023.1102.0475
Liu, B., Blasch, E., Chen, Y., Shen, D., & Chen, G. (2013). Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier. 2013 IEEE International Conference on Big Data. DOI: 10.1109/BigData.2013.6691740
Mosquera, R., Castrillón, O. D., & Parra, L. (2018). Máquinas de Soporte Vectorial, Clasificador Naïve Bayes y Algoritmos Genéticos para la Predicción de Riesgos Psicosociales en Docentes de Colegios Públicos Colombianos. CIT Informacion Tecnologica, 29(6), 153–162. https://doi.org/10.4067/s0718-07642018000600153
Ongena, Y., & Unger, S. (2020). The effects of task difficulty and conversational cueing on answer formatting problems in surveys. In Advances in Questionnaire Design, Development, Evaluation and Testing (pp. 259–286). Wiley. https://doi.org/10.1002/9781119263685.ch11
Orellana Cordero, M. P., & Cedillo, P. (2020). Detección de valores atípicos con técnicas de minería de datos y métodos estadísticos. Enfoque UTE, 11(1), 56–67. https://doi.org/10.29019/enfoque.v11n1.584
Prabu, M. V., & Rahini, M. (2023). Application of Kuratowski’s closure operator in Python program. 5th International Conference On Current Scenario In Pure And Applied Mathematics (ICCSPAM-2022). https://doi.org/10.1063/5.0137779
Rahman, A., & Muktadir, M. G. (2021). SPSS: An imperative quantitative data analysis tool for social science research. International Journal of Research and Innovation in Social Science, 05(10), 300–302. https://doi.org/10.47772/ijriss.2021.51012
Ramírez, P. E., & Grandón, E. E. (2018). Predicción de la Deserción Académica en una Universidad Pública Chilena a través de la Clasificación basada en Árboles de Decisión con Parámetros Optimizados. Formación Universitaria, 11(3), 3–10. https://doi.org/10.4067/s0718-50062018000300003
Sandeep, V., & Vindhya, A. S. (2023). Lack of accuracy in ascertaining nature of users based on Naive Bayes algorithm comparing K-means algorithm. The 6th International Conference On Energy, Environment, Epidemiology And Information System (ICENIS) 2021: Topic of Energy, Environment, Epidemiology, and Information System. https://doi.org/10.1063/5.0124446
Saputra, R. A., Puspitasari, D., Wahyudi, M., Ramdhani, L. S., & Ramanda, K. (2023). Optimization the Naive Bayes algorithm using particle swarm optimization feature selection and bagging techniques for detection of Alzheimer’s disease. AIP Conference Proceedings. https://doi.org/10.1063/5.0128553
Senthil Kumar, V. S., & Shahraz, S. (2023). Intraclass correlation for reliability assessment: the introduction of a validated program in SAS (ICC6). Health Services & Outcomes Research Methodology. https://doi.org/10.1007/s10742-023-00299-x
Siregar, K. (2023). Testing the c4.5 algorithm with Rapid Miner to determine decisions for implementing sports activities. INFOKUM, 11(04), 40–47. https://doi.org/10.58471/infokum.v11i04.1790
Shamshad, F., Khan, S., Zamir, S. W., Khan, M. H., Hayat, M., Khan, F. S., & Fu, H. (2023). Transformers in medical imaging: A survey. Medical Image Analysis, 88(102802), 102802. https://doi.org/10.1016/j.media.2023.102802
Vanegas, D. A., Tarazona Bermudez, G. M., & Rodriguez Rojas, L. A. (2020). Mejora de la toma de decisiones en ciclo de ventas del subsistema comercial de servicios en una empresa de IT. Revista Científica, 38(2), 174–183. https://doi.org/10.14483/23448350.15241
Vidiya, E. C., & Testiana, G. (2023). Analisis Pola Pembelian di Lathansa Cafe & Ramen dengan Menggunakan Algoritma FP-Growth Berbantuan Altair® RapidMiner. G-Tech: Jurnal Teknologi Terapan, 7(3), 1118–1126. https://doi.org/10.33379/gtech.v7i3.2739
Wang, J., He, Z., Ji, J., Zhao, K., & Zhang, H. (2019). IoT-based measurement system for classifying cow behavior from tri-axial accelerometer. Ciencia Rural, 49(6). https://doi.org/10.1590/0103-8478cr20180627
Zhang, G., Mariano, B., Shen, X., & Dillig, I. (2023). Automated translation of functional big data queries to SQL. Proceedings of the ACM on Programming Languages, 7(OOPSLA1), 580–608. https://doi.org/10.1145/3586047
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2025 Gabriela Pazmiño Moreira, Olga Lilian Mendoza Talledo, Homero Mendoza Rodríguez, María José Pazmiño Moreira, Jonathan Josué Proaño Morales

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-CompartirIgual 4.0.












