本篇文章是博主强化学习RL领域学习时,用于个人学习、研究或者欣赏使用,并基于博主对相关等领域的一些理解而记录的学习摘录和笔记,若有不当和侵权之处,指出后将会立即改正,还望谅解。文章分类在强化学习专栏: 【强化学习】(10)---《A* 算法在多智能体强化学习中的应用》
A*算法是一种启发式搜索算法,广泛应用于路径规划和状态空间搜索问题。其核心思想是在搜索过程中结合代价函数和启发式函数,从而实现较高效的最短路径求解。在多智能体强化学习(MARL, Multi-Agent Reinforcement Learning)的场景下,A算法也常被用于智能体之间的路径规划和动作选择,帮助智能体找到最优的策略和路径。
A* 算法的基本目标是在一个图或网格上寻找从起点到终点的最短路径。它综合了两部分信息:
A*算法的核心是通过评价函数

来选择下一个扩展的节点。其中,g(n) 是当前路径的代价,h(n) 是启发式估计,f(n) 则是总估计代价。算法每次扩展当前 f(n) 最小的节点,直至找到目标节点。

A*算法的目标是在图(或网格)中找到从起点到终点的最短路径。其核心思想是结合实际代价(路径长度)和启发式估计(到目标的估计代价),通过优先扩展最有前途的节点来加速路径搜索。
相关公式:


在多智能体强化学习中,A*算法主要应用于如下几个场景:
为了提高多智能体系统中的学习效率,A*算法可以结合多智能体强化学习中的策略学习。以下是一些常见的结合方式:
A*算法在多智能体强化学习场景下是一个强大的工具,特别适用于路径规划和短期决策。然而,面对多智能体复杂交互和动态环境,A*算法的局限性也显而易见。将A*与强化学习结合,既可以利用A*的高效搜索能力,又能通过强化学习提升智能体的长期决策水平,进而在复杂任务中表现更优。
# A* Algorithm
function A_star(start, goal):
# Initialize open and closed lists
open_list = [start] # Open list to store nodes to be evaluated
closed_list = [] # Closed list to store already evaluated nodes
# g and f maps
g = {} # g(n) represents the cost from start to node n
f = {} # f(n) = g(n) + h(n), total estimated cost
# Initialize g and f values for the start node
g[start] = 0
f[start] = heuristic(start, goal) # h(n), the estimated cost from start to goal
# Create a map to track the parent of each node for path reconstruction
came_from = {}
while open_list is not empty:
# Get the node in open_list with the lowest f(n)
current = node with lowest f value in open_list
# If the goal is reached, reconstruct and return the path
if current == goal:
return reconstruct_path(came_from, current)
# Move current node from open_list to closed_list
open_list.remove(current)
closed_list.append(current)
# Explore neighbors of the current node
for each neighbor of current:
if neighbor is in closed_list:
continue # Ignore the neighbor which is already evaluated
# Calculate tentative g value
tentative_g = g[current] + cost(current, neighbor)
# If neighbor is not in open_list, add it
if neighbor not in open_list:
open_list.append(neighbor)
# If the new path to neighbor is not better, skip it
if tentative_g >= g.get(neighbor, infinity):
continue # This is not a better path
# This path is the best until now, record it
came_from[neighbor] = current
g[neighbor] = tentative_g
f[neighbor] = g[neighbor] + heuristic(neighbor, goal)
# If the goal was never reached
return failure
# Helper function to reconstruct the path from start to goal
function reconstruct_path(came_from, current):
total_path = [current]
while current in came_from:
current = came_from[current]
total_path.append(current)
return total_path reversed
# Example heuristic function (Manhattan Distance)
function heuristic(node, goal):
return abs(node.x - goal.x) + abs(node.y - goal.y)open_list 初始时只包含起点,闭合列表 closed_list 为空。每个节点的 g 值记录从起点到该节点的路径代价,f 值记录总代价。
f 值最小的节点进行扩展。
reconstruct_path 函数,回溯路径并返回。
g 和 f 值,若找到更优路径则更新邻居节点的父节点。
文章若有不当和不正确之处,还望理解与指出。由于部分文字、图片等来源于互联网,无法核实真实出处,如涉及相关争议,请联系博主删除。如有错误、疑问和侵权,欢迎评论留言联系作者