Estudo do efeito de variáveis de estado e funções de recompensa no desempenho de algoritmos de enxames combinados com aprendizagem por reforço.

Pimentel, Kamila Rocha

dc.creator	Pimentel, Kamila Rocha
dc.date.accessioned	2025-12-03T13:35:02Z
dc.date.available	2025-12-03T13:35:02Z
dc.date.issued	2025-11-18
dc.identifier.citation	PIMENTEL, Kamila Rocha; SILVA, Yasminn Patricia Souza; SILVA, Rodrigo Cesar Lira da. Estudo do efeito de variáveis de estado e funções de recompensa no desempenho de algoritmos de enxames combinados com aprendizagem por reforço. 2025. Artigo (Tecnologia em Análise e Desenvolvimento de Sistemas) – Instituto Federal de Educação, Ciência e Tecnologia de Pernambuco, Campus Paulista, Paulista, 2025.	pt_BR
dc.identifier.uri	https://repositorio.ifpe.edu.br/xmlui/handle/123456789/1928
dc.description.abstract	This work investigates the integration between Reinforcement Learning and Swarm Intelligence applied to optimization problems, focusing on the analysis of the impact of state variables and reward functions on the performance of agent combinations. Swarm Intelligence, inspired by the collective behavior of animals, seeks solutions through the decentralized cooperation of agents, while Reinforcement Learning teaches an agent to make decisions by trial and error, optimizing rewards accumulated in the interaction with the environment. The study adopts an approach in which a Proximal Policy Optimization agent is responsible for dynamically selecting between three swarm metaheuristics: Global Particle Swarm Optimization, Local Particle Swarm Optimization, and Grey Wolf Optimizer. The experimental environment was developed by incorporating variables associated with swarm behavior and two reward functions: Reward 1, which already exists and is based on incremental fitness improvement, and Reward 2, proposed in this study to penalize stagnation. The methodology involved applying the ablation technique, allowing the evaluation of the relevance of groups of state variables in learning. The experiments were conducted on benchmark functions, named F1 and F2, under different dimensionalities (10, 30, and 50), in order to identify how the configurations of observables and rewards influence the adaptation and convergence of the agent in optimization scenarios. The results showed that Reward 1 stood out for its stability and consistent performance, while removing fitness variables reduced the computational cost without compromising convergence.	pt_BR
dc.format.extent	40 p.	pt_BR
dc.language	pt_BR	pt_BR
dc.relation	BAU, D. et al. Network Dissection: Quantifying Interpretability of Deep Visual Representations. Researchgate, jul. 2017. Disponível em: https://www.researchgate.net/publication/320971142. Acesso em: 10 ago. 2025. BRATTON, Daniel; KENNEDY, James. Defining a Standard for Particle Swarm Optimization. Researchgate, mai. 2007. Disponível em: https://www.researchgate.net/publication/4251818. Acesso em: 10 ago. 2025. ECHCHAHED, Ayoub; CASTRO, Pablo Samuel. A Survey of State Representation Learning for Deep Reinforcement Learning. ArXiv, v. 1, 20 jun. 2025. Disponível em: https://arxiv.org/abs/2506.17518v1. Acesso em: 18 out. 2025. FOSTIROPOULOS, I. et al. How to Ablate? A Computational Framework for Designing and Interpreting Ablation Studies. In: INTERNATIONAL CONFERENCE ON MACHINE LEARNING, 40., 2023. Proceedings [...]. PMLR, 2023. v. 224. Instituto Federal de Pernambuco. Campus Paulista. Curso de Tecnologia em Análise e Desenvolvimento de Sistemas. 18 de Novembro de 2025. 39 GATTAMI, A. et al. Reinforcement Learning in the Wild: Scalable System Design and Implementation. In: INTERNATIONAL CONFERENCE ON MACHINE LEARNING, 38., 2021. Proceedings [...]. PMLR, 2021. v. 130. HAMAD, Q. S.; SAMMA, H.; SUANDI, S. A. Q-learning based metaheuristic optimization algorithms: A short review and perspectives. Researchgate, jan. 2023. Disponível em: https://www.researchgate.net/publication/366905818. Acesso em: 11 ago. 2025. KENNEDY, J.; EBERHART, R. C. Particle Swarm Optimization. In: IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, 1995. Proceedings [...]. IEEE, 1995. p. 1942-1948. LIRA, R. C. et al. Applying Reinforcement Learning to Combine Multiple Swarm-based Algorithms. Researchgate, out. 2023. Disponível em: https://www.researchgate.net/publication/377733312. Acesso em: 11 ago. 2025. LIRA, R. C. et al. Exploring Social Dynamics in a Reinforcement Learning-based Metaheuristic: A study using Improvement Frequency and Population Turnover. Researchgate, nov. 2024. Disponível em: https://www.researchgate.net/publication/386218633. Acesso em: 18 out. 2025. LIRA, R. C. et al. Integrating Reinforcement Learning and Optimization Task: Evaluating an Agent to Dynamically Select PSO Communication Topology. Researchgate, jul. 2023. Disponível em: https://www.researchgate.net/publication/372200258. Acesso em: 6 set. 2025. MIRJALILI, S.; MIRJALILI, S. M.; LEWIS, A. Grey Wolf Optimizer. Researchgate, mar. 2014. Disponível em: https://www.researchgate.net/publication/260010809. Acesso em: 2 mar. 2025. MOHAMMED, M. N. et al. Swarm Intelligence: A Review of Algorithms, Applications, and Open Issues. AI and Optimization, v. 4, n. 4, p. 64, 2024. PARSOPOULOS, K. E.; VRAHATIS, M. N. Unified particle swarm optimization in dynamic environments. Researchgate, jan. 2005. Disponível em: https://www.researchgate.net/publication/285896532. Acesso em: 30 jul. 2025. RAY RLlib. RLlib: Aprendizado por Reforço Escalável e de Nível Industrial. Disponível em: https://docs.ray.io/en/latest/rllib/index.html. Acesso em: 25 nov. 2025. SADIKU, M. N. O.; MUSA, S. M. A Primer on Multiple Intelligences. Springer, Capítulo 17, p. 211-220, 25 jul. 2021. SANTOS, R. A.; SILVA, L. C. Otimização por Lobos Cinzentos (GWO) e suas Variantes: Uma Revisão Sistemática da Literatura. In: SIMPÓSIO BRASILEIRO DE REDES DE COMPUTADORES E SISTEMAS DISTRIBUÍDOS, 41., 2023. Anais [...]. 2023. Instituto Federal de Pernambuco. Campus Paulista. Curso de Tecnologia em Análise e Desenvolvimento de Sistemas. 18 de Novembro de 2025. 40 SAPUTRA, N. J. et al. A Systematic Literature Review on Swarm Intelligence for Optimization: Algorithms and Applications. INFORM: Journal of Computer Science, v. 1, n. 2, p. 1-12, 2024. SCHUCHARDT, J.; GOLKOV, V.; CREMERS, D. Learning to Evolve. ArXiv, v. 1, 8 mai. 2019. Disponível em: https://doi.org/10.48550/ARXIV.1905.03389. Acesso em: 17 de agosto de 2025. SCHULMAN, J.; WOLSKI, F.; DHARIWAL, P.; RADFORD, A.; KLIMOV, O. Proximal policy optimization algorithms. ArXiv, v. 2, 28 ago. 2017. Disponível em: https://arxiv.org/abs/1707.06347. Acesso em: 17 ago. 2025. SHARMA, A.; SHARMA, A.; PANDEY, J. K.; RAM, M. Swarm Intelligence: Theory and Applications. 1. ed. Londres: CRC Press, 2021. SHARMA, M. et al. Deep Reinforcement Learning Based Parameter Control in Differential Evolution. ArXiv, v. 1, 20 mai. 2019. Disponível em: https://doi.org/10.48550/arXiv.1905.08006. Acesso em: 18 out. 2025. SILVA, J. F. da; LOPES, M. A.; SOUZA, S. R. de. An analysis of reward shaping for reinforcement learning in a multi-agent framework for combinatorial optimization. Researchgate, nov. 2022. Disponível em: https://www.researchgate.net/publication/367120936. Acesso em: 10 ago. 2025. SINGH, N.; SINGH, S. B. Swarm intelligence algorithms: a survey of their theoretical foundation and application. International Journal of System Assurance Engineering and Management, 2024. SMITH, A. et al. Penalty functions. Researchgate, jul. 1998. Disponível em: https://www.researchgate.net/publication/2509987. Acesso em: 29 jul. 2025. SUTTON, R. S.; BARTO, A. G. Reinforcement Learning: An Introduction. 2. ed. Londres: Bradford Book, 2018.	pt_BR
dc.rights	Acesso Aberto	pt_BR
dc.rights	An error occurred on the license name.	*
dc.rights.uri	An error occurred getting the license - uri.	*
dc.subject	Aprendizagem por Reforço	pt_BR
dc.subject	Funções Benchmark	pt_BR
dc.subject	Inteligência de Enxames	pt_BR
dc.title	Estudo do efeito de variáveis de estado e funções de recompensa no desempenho de algoritmos de enxames combinados com aprendizagem por reforço.	pt_BR
dc.title.alternative	Study of the effect of state variables and reward functions on the performance of swarm-based algorithms combined with reinforcement learning.	pt_BR
dc.type	Article	pt_BR
dc.creator.Lattes	http://lattes.cnpq.br/9153626529135319	pt_BR
dc.contributor.advisor1	Silva, Rodrigo Cesar Lira da
dc.contributor.advisor1Lattes	http://lattes.cnpq.br/2442224050349612	pt_BR
dc.contributor.referee1	Silva, João Gabriel Rocha
dc.contributor.referee2	Oliveira, Flávio Rosendo da Silva
dc.contributor.referee1Lattes	http://lattes.cnpq.br/4555578101519491	pt_BR
dc.contributor.referee2Lattes	http://lattes.cnpq.br/6828380394080049	pt_BR
dc.publisher.department	Paulista	pt_BR
dc.publisher.country	Brasil	pt_BR
dc.subject.cnpq	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO	pt_BR
dc.description.resumo	Este trabalho investiga a integração entre Aprendizagem por Reforço e Inteligência de Enxames aplicada a problemas de otimização, com foco na análise do impacto de variáveis de estado e funções de recompensa no desempenho da combinação de agentes. A Inteligência de Enxames, inspirada em comportamentos coletivos de animais, busca soluções por meio da cooperação descentralizada dos agentes, enquanto a Aprendizagem por Reforço ensina um agente a tomar decisões por tentativa e erro, otimizando recompensas acumuladas na interação com o ambiente. O estudo adota uma abordagem na qual um agente Proximal Policy Optimization é responsável por selecionar dinamicamente entre três metaheurísticas de enxame: Global Particle Swarm Optimization, Local Particle Swarm Optimization e Grey Wolf Optimizer. O ambiente experimental foi desenvolvido com a incorporação de variáveis associadas ao comportamento dos enxames e de duas funções de recompensa: a Recompensa 1, já existente e baseada na melhoria incremental do fitness, e a Recompensa 2, proposta neste estudo para penalizar a estagnação. A metodologia contemplou a aplicação da técnica de ablação, permitindo avaliar a relevância de grupos de variáveis de estado no aprendizado. Os experimentos foram conduzidos em funções benchmark, denominadas F1 e F2, sob diferentes dimensionalidades (10, 30 e 50), a fim de identificar como as configurações de observáveis e recompensas influenciam a adaptação e a convergência do agente em cenários de otimização. Os resultados mostraram que a Recompensa 1 destacou-se pela estabilidade e desempenho consistente, enquanto a remoção das variáveis de fitness reduziu o custo computacional sem comprometer a convergência.	pt_BR
dc.creator.name2	Silva, Yasminn Patricia Souza
dc.creator.Lattes2	http://lattes.cnpq.br/9945118727257991	pt_BR

Arquivos deste item

Nome:: TCC_Artigo_IFPE-Campus Paulist ...
Tamanho:: 1.973Mb
Formato:: PDF
Descrição:: Artigo principal

Visualizar/Abrir

Nome:: license_rdf
Tamanho:: 0bytes
Formato:: application/rdf+xml

Visualizar/Abrir

Este item aparece na(s) seguinte(s) coleção(s)

Tecnólogo em Análise e Desenvolvimento de Sistemas

Mostrar registro simples