Utilização de um LLM local com RAG para auxiliar a fase de extração de dados de uma revisão sistemática da literatura

Souza Neto, Nelson Henrique de

dc.creator	Souza Neto, Nelson Henrique de
dc.date.accessioned	2026-05-29T00:52:26Z
dc.date.available	2026-05-29T00:52:26Z
dc.date.issued	2026-03-17
dc.identifier.citation	SOUZA NETO, Nelson Henrique de. Utilização de um LLM local com RAG para auxiliar a fase de extração de dados de uma revisão sistemática da literatura. 2026.39f. TCC (Curso de Licenciatura em Geografia), Instituto Federal de Ciência e Tecnologia de Pernambuco. Recife. 2026.	pt_BR
dc.identifier.uri	https://repositorio.ifpe.edu.br/xmlui/handle/123456789/2201
dc.description.abstract	Systematic Literature Review (SLR) is a research methodology that follows specific protocols and is widely used in academic work to summarize and synthesize evidence on a given topic of study, with its application growing in the field of Software Engineering. However, conducting an SLR is laborious, as it requires significant time and human resources. With recent advances in Artificial Intelligence, tools such as Large Language Models (LLMs), Generative Pre-trained Transformer (GPT), for example, and Retrieval-Augmented Generation (RAG) offer opportunities to reduce the manual effort in conducting these reviews. This study aims to investigate whether the use of a local LLM augmented with RAG can assist the data extraction phase of a systematic review. To this end, the Llama 3.2 model was used to extract data from a systematic mapping study containing 22 SLR articles whose contents were provided to the LLM using the RAG technique, and the responses generated by the model were compared with those already extracted by the authors of the mapping. The local LLM augmented with RAG achieved approximately 42% correct answers, demonstrating that it offers limited assistance to the researcher in relation to the data extraction phase of the RSL; however, most of the correct answers concerned bibliographic data from the articles, suggesting that the model can be used to obtain this data more easily.	pt_BR
dc.format.extent	39f.	pt_BR
dc.language	pt_BR	pt_BR
dc.relation	ARSLAN, M. et al. A Survey on RAG with LLMs. Procedia Computer Science, [S.l.], v. 246, p. 3781-3790, 2024. Disponível em: https://doi.org/10.1016/j.procs.2024.09.178. Acesso em: 26 nov. 2025. BORAH, R. et al. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open, [s.l.], v. 7, n. 2, e012545, 2017. Disponível em: https://doi.org/10.1136/bmjopen-2016-012545. Acesso em: 24 nov. 2025. BURGER, B. et al. On the use of AI-based tools like ChatGPT to support management research. European Journal of Innovation Management, v. 26, n. 7, p. 233-241, 2023. Disponível em: https://doi.org/10.1108/EJIM-02-2023-0156. Acesso em: 12 jul. 2025. CARVER, J. C. et al. Identifying Barriers to the Systematic Literature Review Process. In: ACM / IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT, 2013. Proceedings [...]. [S. l.]: IEEE, 2013. p. 203-212. DOI: https://doi.org/10.1109/ESEM.2013.28. CHANG, Yupeng et al. A survey on evaluation of large language models. ACM transactions on intelligent systems and technology, v. 15, n. 3, p. 1-45, 2024. EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH; OPENAIRE. Zenodo. [S. l.]: CERN, 2013. Disponível em: https://www.zenodo.org/. Acesso em: 17 dez. 2025. DOI: https://doi.org/10.25495/7GXK-RD71. FELIZARDO, K. R. et al. ChatGPT application in Systematic Literature Reviews in Software Engineering: an evaluation of its accuracy to support the selection activity. In: ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT, 18., 2024, Barcelona, Espanha. Proceedings [...]. New York, NY, USA: Association for Computing Machinery, 2024a. p. 25-36. Disponível em: https://doi.org/10.1145/3674805.3686666. Acesso em: 16 jul. 2025. GOODFELLOW, Ian; BENGIO, Yoshua; COURVILLE, Aaron. Deep Learning. Cambridge: MIT Press, 2016. Disponível em: http://www.deeplearningbook.org. HAENLEIN, Michael; KAPLAN, Andreas. A Brief History of Artificial Intelligence: On the Past, Present, and Future of Artificial Intelligence. California Management Review, [S. l.], v. 61, n. 4, p. 5-14, 2019. Disponível em: https://doi.org/10.1177/0008125619864925. Acesso em: 18 dez. 2025. KITCHENHAM, B. A.; CHARTERS, S. Guidelines for performing systematic literature reviews in software engineering. Keele, UK: School of Computer Science and Mathematics, Keele University, 2007. Disponível em: https://legacyfileshare.elsevier.com/promis_misc/525444systematicreviewsguide.pdf. Acesso em: 4 set. 2025. LANGCHAIN4J. Langchain4j. [Biblioteca de software]. 2025. Disponível em: https://docs.langchain4j.dev/. Acesso em: 28 jul. 2025. LIANG, Weixin et al. Mapping the increasing use of LLMs in scientific papers. arXiv preprint arXiv:2404.01268, 2024. META. Llama 3.2. 2025. Disponível em: https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2/. Acesso em: 23 jul. 2025. MICHELSON, M.; REUTER, K. The significant cost of systematic reviews and meta-analyses: A call for greater involvement of machine learning to assess the promise of clinical trials. Contemporary Clinical Trials Communications, [s.l.], v. 16, p. 100443, 2019. Disponível em: https://doi.org/10.1016/j.conctc.2019.100443. Acesso em: 24 nov. 2025. MINAEE, Shervin et al. Large language models: A survey. arXiv preprint arXiv:2402.06196, 2024. NEPOMUCENO, V.; SOARES, S. On the need to update systematic literature reviews. Information and Software Technology, v. 109, p. 40-42, 2019. Disponível em: https://doi.org/10.1016/j.infsof.2019.01.005. Acesso em: 18 jul. 2025.NG, K. K. Y.; MATSUBA, I.; ZHANG, P. C. RAG in Health Care: A Novel Framework for Improving Communication and Decision-Making by Addressing LLM Limitations. NEJM AI, [S.l.], v. 2, n. 1, p. AIra2400380, 2025. Disponível em: https://doi.org/10.1056/AIra2400380. Acesso em: 26 nov. 2025. OLLAMA. nomic-embed-text. [Modelo de embedding]. 2025. Disponível em: https://ollama.com/library/nomic-embed-text. Acesso em: 28 jul. 2025. OLLAMA. Ollama. 2023. Disponível em: https://ollama.com/. Acesso em: 23 jul. 2025. PETERSEN, Kai; VAKKALANKA, Sairam; KUZNIARZ, Ludwik. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology, v. 64, p. 1-18, 2015. Disponível em: https://doi.org/10.1016/j.infsof.2015.03.007. Acesso em: 5 set. 2025.RESNIK, D. B.; HOSSEINI, M. The ethics of using artificial intelligence in scientific research: new guidance needed for a new tool. AI and Ethics, [s.l.], v. 5, n. 2, p. 1499-1521, 2024. Disponível em: https://doi.org/10.1007/s43681-024-00493-8. Acesso em: 25 nov. 2025. RUSSELL, Stuart J.; NORVIG, Peter. Artificial Intelligence: a modern approach. 4. ed. Boston: Pearson, 2021. SANTOS, V. dos et al. Towards Sustainability of Systematic Literature Reviews. In: INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT, 15., 2021, Bari, Itália. Proceedings [...]. New York: Association for Computing Machinery, 2021. p. 34. Disponível em: https://doi.org/10.1145/3475716.3484192. Acesso em: 24 nov. 2025. SPRING BOOT. Spring Boot. Versão 3.3.4. 2024. Disponível em: https://spring.io/projects/spring-boot. Acesso em: 28 jul. 2025. SQLITE. SQLite. 2025. Disponível em: https://www.sqlite.org/. Acesso em: 28 jul. 2025. THIRUNAVUKARASU, A. J. et al. Large language models in medicine. Nature Medicine, [S.l.], v. 29, n. 8, p. 1930-1940, 2023. Disponível em: https://doi.org/10.1038/s41591-023-02448-8. Acesso em: 26 nov. 2025. WU, Shangyu et al. Retrieval-augmented generation for natural language processing: A survey. arXiv preprint arXiv:2407.13193, 2024. ZHAO, Wayne Xin et al. A survey of large language models. arXiv preprint arXiv:2303.18223, v. 1, n. 2, 2023. ZHOU, Zhi-hua. Machine Learning. Singapore: Springer, 2021. Disponível em: https://doi.org/10.1007/978-981-15-1967-3. Acesso em: 14 ago. 2025.	pt_BR
dc.rights	Acesso Aberto	pt_BR
dc.subject	Sistemas de computação	pt_BR
dc.subject	LLM	pt_BR
dc.subject	RAG	pt_BR
dc.subject	Revisão sistemática da literatura	pt_BR
dc.subject	Extração de dados	pt_BR
dc.title	Utilização de um LLM local com RAG para auxiliar a fase de extração de dados de uma revisão sistemática da literatura	pt_BR
dc.type	TCC	pt_BR
dc.creator.Lattes	http://lattes.cnpq.br/4528738765793118	pt_BR
dc.contributor.advisor1	Nepomuceno, Vilmar Santos
dc.contributor.advisor1Lattes	http://lattes.cnpq.br/1493013358874325	pt_BR
dc.contributor.referee1	Nepomuceno, Vilmar Santos
dc.contributor.referee2	Neves, Renata Freire de Paiva
dc.contributor.referee3	Azevedo, Ivanildo Monteiro de
dc.contributor.referee1Lattes	http://lattes.cnpq.br/1493013358874325	pt_BR
dc.contributor.referee2Lattes	http://lattes.cnpq.br/9029559122700209	pt_BR
dc.contributor.referee3Lattes	http://lattes.cnpq.br/6070296879669887	pt_BR
dc.publisher.department	Recife	pt_BR
dc.publisher.country	Brasil	pt_BR
dc.subject.cnpq	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO	pt_BR
dc.description.resumo	Revisão Sistemática da Literatura (RSL) é uma metodologia de pesquisa que segue protocolos específicos, muito utilizada em trabalhos acadêmicos para resumir e sintetizar evidências sobre um determinado tópico de estudo, havendo um crescimento de sua aplicação na área de Engenharia de Software. Porém, sua condução é trabalhosa, exigindo muito tempo e recursos humanos. Com os avanços recentes da Inteligência Artificial, ferramentas como os Large Language Models (LLMs), Generative Pre-trained Transformer (GPT), por exemplo, e Retrieval-Augmented Generation (RAG), oferecem oportunidades para diminuir o esforço manual na condução dessas revisões. Este estudo tem como objetivo investigar se a utilização de um LLM local com RAG pode auxiliar na fase de extração de dados de uma revisão sistemática. Para isso, foi utilizado o modelo Llama 3.2 na extração de dados de um estudo de mapeamento sistemático contendo 22 artigos de RSL cujos conteúdos foram fornecidos ao LLM por meio da técnica RAG e as respostas geradas pelo modelo foram comparadas com as já extraídas pelos autores do mapeamento. Esse uso do LLM local com RAG alcançou aproximadamente 42% de respostas corretas, mostrando-se pouco capaz de auxiliar o pesquisador significativamente em relação à fase de extração de dados da RSL, porém a maior parte dos acertos aconteceu sobre dados bibliográficos dos artigos, o que sugere que o modelo pode ser utilizado para obter esses dados com mais facilidade.	pt_BR

Arquivos deste item

Nome:: Utilização de um LLM local com ...
Tamanho:: 775.5Kb
Formato:: PDF
Descrição:: Trabalho de Conclusão de Curso

Visualizar/Abrir

Este item aparece na(s) seguinte(s) coleção(s)

Tecnólogo em Análise e Desenvolvimento de Sistemas

Mostrar registro simples