A* algorithm

#programming #algorithms #graphs

A* is the successor to Dijkstra's Algorithm.

Abstract

Dijkstra wastes time by exploring in all directions unnecessarily. If the target is a fixed node, this is often not required.

A* improves this by adding something of a "directional intelligence".

Where Dijkstra uses a priority function solely comprised of start from source, A* attempts to make it better by including with it, an heuristic of cost from destination.
So the optimization, or the priority function looks something like $$f(n) = g(n) + h(n)$$where g is cost from start, and h is heuristic of cost from destination.

By attempting to minimize their sum, we encode directional intelligence, reducing our search space of unnecessary nodes which are far from destination.

Therefore it becomes something like

node = extract_min(queue); 
for neighbours of node:
	g_new = g[node] + cost(node, neighbour)
	f_new = g_new + h(neighbour);
	
	if neighbour not visited OR g_new < g[neighbour]:
		Update
		Relax
		Push
Note

The relaxation logic (the decision on whether to relax or not) is still made based off of g(n). The priority of who to process first is done using score of f(n).

Critical properties of heuristic, h(n)

Admissibility $$h(n) \leq \pu{trust cost from n \to goal}$$

This means that the heuristic never overestimates. You're never likely to know the true cost, but your heuristic should never exceed it.

This is needed, because admissibility ensures optimal path

Why does admissibility guarantee optimality

Consistency $$h(n) \leq \pu{cost}(n\to m) + h(m)$$

This is equivalent to triangle inequality

Why consistency?

Consistency guarantees that nodes do not have to be reprocessed. Without consistency, this criteria would fail, and we would be unable to finalize a node upon reaching it, often requiring several visits to finalize a node.
Why does consistency guarantee that nodes do not have to be reprocessed?

Example intuition

Think of it like this
Dijkstra : Explores outwards in a circle
A with Manhattan distance* : Explores towards goal in a cone.

Time complexity Analysis

The time complexity for A cannot be simply represented by E and V, since essentially the heuristic "prunes" the graph, and some nodes and edges need not ever be visited at all.

Zero heuristic $$h(n) = 0$$

This is painfully obvious, and in this case A* collapses to Dijsktra, with a runtime complexity of $$O(E\log V)$$

Ideal heuristic

The ideal heuristic, h(n) is the exact amount of distance left to the goal.
In the face if this heuristic, the time complexity becomes $$O(d)$$where d is the depth of the optimal solution.
This is because the total estimated cost of any node, can never be less than the optimal node at that step.
So, the search functions as an effective linked list traversal.

Consistent heuristic

In this case, we can think of it as a Dijsktra search on a modified graph. The modification is due to the heuristic.

In this case, the time complexity is also O(ElogV), but that is not a useful metric sometimes.

Instead we define it by "search-space".
In this case we talk about the effective branching factor.
If d is the depth of the solution, then number of nodes expanded is given by $$N = O((b^*)^d)$$


Powered by Forestry.md