Avaliação do uso de grandes modelos de linguagem para detecção de alucinações.

José, Gracielle do Nascimento

dc.creator	José, Gracielle do Nascimento
dc.date.accessioned	2026-03-03T14:13:02Z
dc.date.available	2026-03-03T14:13:02Z
dc.date.issued	2025-12-18
dc.identifier.citation	JOSÉ, Gracielle do Nascimento. Avaliação do uso de grandes modelos de linguagem para detecção de alucinações. 2025. 18 f. Trabalho de Conclusão de Curso (Tecnologia em Análise e Desenvolvimento de Sistemas) – Instituto Federal de Educação, Ciência e Tecnologia de Pernambuco, Paulista, 2025.	pt_BR
dc.identifier.uri	https://repositorio.ifpe.edu.br/xmlui/handle/123456789/2017
dc.description.abstract	Recently, generative artificial intelligence is quickly advancing. However, despite great investment in improvements, the output generated still obtains an elevated level of hallucinations, compromising the reliability of the content. This work aims to mitigate this problem by offering a comparison between models Llama and GPT when used to detect hallucinations. After checking accuracy values, both LLMs performed similarly, with the two models displaying 84% and 68% for input and context hallucinations. For factual hallucinations, there was a 16% difference in results. Finally, results suggest that using GenAI generated content without further human analysis is not recommended for complex activities.	pt_BR
dc.format.extent	18 p.	pt_BR
dc.language	pt_BR	pt_BR
dc.relation	ALAMMAR, Jay; GROOTENDORST, Maarten. Hands-on large language models. Sebastopol: O'Reilly Media, 2024. BANG, Yejin; JI, Ziwei; SCHELTEN, Alan; et al. HalluLens: LLM hallucination benchmark. arXiv, 2025. GARTNER. Gartner forecasts worldwide GenAI spending to reach $ 644 billion in 2025. Stamford, 31 mar. 2025. KADAVATH, Saurav; et al. Language models (mostly) know what they know. arXiv, 2022. KALAI, Adam Tauman; et al. Why language models hallucinate. arXiv, 2025. LANGCHAIN. Vector stores. [S.l.]: LangChain, [s.d.]. MAYNEZ, Joshua; NARAYAN, Shashi; BOHNET, Bernd; McDONALD, Ryan. On faithfulness and factuality in abstractive summarization. In: ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 58., 2020. Proceedings [...]. [S.l.]: ACL, 2020. p. 1906-1919. MINAEE, Shervin; et al. Large language models: a survey. arXiv, 2024. OLLAMA. nomic-embed-text. [S.l.]: Ollama, [s.d.]. SENGAR, Sandeep Singh; HASAN, Affan Bin; KUMAR Sanjay; CARROLL Fiona. Generative Artificial Intelligence: A Systematic Review and Applications. arXiv preprint arXiv:2405.11029v1, 2024. TOMMOY, S. M. Towhidul Islam; et al. A comprehensive survey of hallucination mitigation techniques in large language models. arXiv, 2024. WALKER, Shelley; LUNDGREN, Amy. Integrating generative AI into legal education: from casebooks to code, opportunities and challenges. Law, Technology and Humans, v. 5, n. 1, p. 27-40, 2023. XU, Ziwei; JAIN, Sanjay; KANKANHALLI, Mohan. Hallucination is inevitable: an innate limitation of large language models. arXiv, 2025. ZHANG, Yue; et al. Siren's song in the AI ocean: a survey on hallucination in large language models. arXiv, 2023.	pt_BR
dc.rights	Acesso Aberto	pt_BR
dc.rights	An error occurred on the license name.	*
dc.rights.uri	An error occurred getting the license - uri.	*
dc.subject	Inteligência artificial generativa	pt_BR
dc.subject	Detecção de alucinações	pt_BR
dc.subject	Código penal brasileiro	pt_BR
dc.title	Avaliação do uso de grandes modelos de linguagem para detecção de alucinações.	pt_BR
dc.title.alternative	Evaluation of the use of large language models for hallucination detection.	pt_BR
dc.type	Article	pt_BR
dc.creator.Lattes	http://lattes.cnpq.br/9103837698019996	pt_BR
dc.contributor.advisor1	Oliveira, Flávio Rosendo da Silva
dc.contributor.advisor1Lattes	http://lattes.cnpq.br/6828380394080049	pt_BR
dc.contributor.referee1	Silva, Rodrigo Cesar Lira da
dc.contributor.referee2	Cordeiro, Paulo Roger Gomes
dc.contributor.referee1Lattes	http://lattes.cnpq.br/2442224050349612	pt_BR
dc.contributor.referee2Lattes	http://lattes.cnpq.br/7671177677866299	pt_BR
dc.publisher.department	Paulista	pt_BR
dc.publisher.country	Brasil	pt_BR
dc.subject.cnpq	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO	pt_BR
dc.description.resumo	A Inteligência Artificial Generativa vem avançando rapidamente durante os últimos tempos. No entanto, apesar do recorrente investimento na melhora da tecnologia, as saídas geradas por tais modelos ainda contém um nível elevado de alucinações, comprometendo a confiabilidade no conteúdo produzido. Visando encontrar formas de mitigar este problema, esse trabalho apresenta uma comparação entre os modelos Llama e GPT como ferramenta de detecção de alucinações de input, contexto ou factuais. Ao verificar os cálculos de acurácia, ambos modelos apresentaram 84% e 68% para alucinações de input e contexto, respectivamente. Para as alucinações factuais, houve uma diferença de 16% nos resultados dos modelos, com vantagem para o GPT. Por fim, os resultados sugerem que o uso de Inteligência Artificial Generativa sem avaliação humana após a geração do conteúdo não é recomendado para atividades complexas.	pt_BR

Arquivos deste item

Nome:: TCC_GRACIELLE_2.pdf
Tamanho:: 1.360Mb
Formato:: PDF
Descrição:: Artigo principal

Visualizar/Abrir

Nome:: license_rdf
Tamanho:: 0bytes
Formato:: application/rdf+xml

Visualizar/Abrir

Este item aparece na(s) seguinte(s) coleção(s)

Tecnólogo em Análise e Desenvolvimento de Sistemas

Mostrar registro simples