introduction to search algorithms

27.08.2022 — ai, notes, omscs — 5 min read

Intro
Definition
Abstraction
State Space
Best-First
Frontier
Redundant Paths
Measuring performance

Introduction

Why is search important?

Regular programming helps us solve problems where we know what to do and AI helps us solve problems where we don’t know exactly what to do.

When the correct action to take is not immediately obvious, an agent may need to plan ahead: to consider a sequence of actions that form a path to a goal state. Such an agent is called a problem-solving agent, and the computational process it undertakes is called search.

We are interested in problem-solving. The complexity of different problems can arise from different reasons and examine the two following scenarios in the context of navigation:

Using street signs: requires a sequence of decisions ultimately leading to the destination. Complexity arises from the many possible states and the decisions required.
Fog in the forest: complexity comes from the lack of visiblity, rendering the actions we can take not very clear.

A route-finding problem:

the agent is tasked to find a path from Arad city A to Bucharest city B
actions of the agent: drive

Can the agent find a solution?

Formal Definition of a Search Problem

Definition of a problem:

a set of possible states that the environment can be in ⇒ the state space
the initial state that the agent starts in
a set of one or more goal states: {sg1,sg2..}
the actions available to the agent. Given a state s, action(s) returns a finite set of action that can be executed in s : actions(s) => {a1,a2,a3}
a transition model, which describes what each action does: result(s,a) => s'
an action cost function, denoted by action-cost(s,a,s’), that gives the numeric cost of applying action a in state s to reach state s’.
goaltest(s) -> T|F: a function that takes a state as an input and determines if the goal state has been reached
pathcost(s-a->s-a->s) -> n: function that takes a sequence a of state transitions through actions and produces the cost 'n'
- stepcost(s,a,s1) -> n

A sequence of actions forms a path, and a solution is a path from the initial state to a goal state. Assuming the costs are additive, the cost of a path is the sum of the individual action costs. The path with the lowest lost is the optimal solution. Search problems can be modelled as a graph, with the vertices representing states and the directed edges as actions.

The right level of abstraction

When we are interested in solving real-world problems, it’s important for us to use the right level of abstraction to develop our algorithms. Modeling all of the world is not entirely useful, and often impossible.

For example, in a search problem for finding the best route between two geographical locations, we need to decide on the level of abstraction for the action of movement. If the goal state is multiple cities away from the initial state, would driving 1km be a reasonable level of abstraction for the action? Most likely not.

We want our abstract models to have the following properties:

valid - possible to elaborate the abstract solution into a more detailed world
useful - carrying out each of the actions in the solution is easier than the original problem

Searching the State Space

A search algorithm takes a search problem as an input and returns the optimal solution or an indication of failure.

We’ll learn about algorithms which does this by superimposing a search tree over the state space graph. Each node in the search tree corresponds to the state in the state-space graph and the edges correspond to the actions. The root of the tree correspond to the initial state of the problem.

As the algorithm seeks the optimal solution, it explores the nodes in the search tree. A node can be expanded by considering all the available actions for that state, using the Result function to see where those actions lead to and generating a new node for each of the resulting children. This exploration requires a decision of which nodes to expand next, and which ones to disregard if any, the very essence of a search algorithm.

This exploration divides the state space into three regions:

Explored/reached: where every state has been expanded
Frontier: the very last states explored in a possible path. This separated the explored from the unexplored
Unexplored: no nodes have been generated for these states yet

Best-first search

A generic algorithm for deciding which node to expand next is to choose a node n with a minimum value of some evaluation function. The algorithm is as follows:

On each iteration:

choose a node on the frontier with a minimum f(n) value
return if its state is a goal state: goaltest(s)
apply expand to generate child nodes
add each child node to frontier list if not reached before, or has now lower cost

By using different f(n) for determining the minimum cost, we can develop different algorithms.

Pseudocode:

1function best-first-search(problem,f) return  {solution node, failure}
2    node <- Node(State=problem.Initial) 
3    frontier <- priority queue ordered by f
4    reached <- visited nodes, initialized with initial state
5
6    while frontier is not empty:
7        node <- Pop(frontier)
8        if goaltest(node.State) return node
9        
10        for each child in expand(problem, node):
11            s <- child.State
12            if s is not in reached or s.PathCost < reached[s].PathCost:
13                 reached[s] <-child
14                 add child to frontier
15    
16    return failure
17
18function expand(problem, node) yields nodes
19    s <- node.State
20    for each action in problem.Actions(s):
21            s' <- result(s,action)
22            cost <- node.PathCost + stepcost(s,action,s')
23            yield Node(State=s',Parent=node,Action=action,PathCost=cost)

Data structure for the frontier

The frontiers are best stored in a queue due to the nature of operations required to perform on them eg. pop, add etc.

Kinds of queues:

a priority queue- pops node with the lowest cost first based on a heuristic function, used in best-first search
a FIFO queue- pops node added first, used in breadth-first search
a LIFO queue (stack) - last node in gets popped first, used in depth-first search

Dealing with redundant paths

As we expand and explore our search tree, we could run into redundant paths that do not contribute to reaching our goal state. There are three strategies to deal with them:

Remember all previously reached states: as done in a best-first search algorithm, the visited states can be stored in memory and accessed using a lookup such as a hash table. This ensures we do not explore them again. This is suitable when there are many redundant paths and the reached states will fit in memory
Do not care: for some search problems, redundant paths are not possible. For example, in an assembly line problem, going backwards is not a valid action. In such scenarios, we use tree-like search algorithms, ones which do not check for redundant paths. Algorithms which do check, are defined as graph search algorithms.
A compromise: only check for cycles but not for redundant paths in general. As each node has pointer to its ancestors, it makes it possible to traverse up the links to determine if a certain state appeared before. Certain implementations detect short cycles this way and rely on other mechanism for longer cycles.

Measuring performance

We can evaluate the algorithm’s performance in four ways:

Completeness: is it guaranteed to find an solution when one exists and fail when none does?
Cost optimality: is it able to find the lowest path cost solution?
Time complexity: speed of finding optimal solution
Space complexity: memory cost of running the algorithm

Measuring completeness: to guarantee a search algorithm will find a solution in an infinite space, it must be systematic in the way it explores an infinite state space, making sure it can eventually reach any state that is connected to the initial state.

References:

Russel, S., & Norvig, P. (2018). ARTIFICIAL INTELLIGENCE : a modern approach. Prentice Hall.

Georgia Tech Lectures:

Informed search algorithms
Problem solving and search