Skip to content# introduction to search algorithms

## Introduction

## Formal Definition of a Search Problem

## The right level of abstraction

## Searching the State Space

## Best-first search

## Data structure for the frontier

## Dealing with redundant paths

## Measuring performance

— ai, notes, omscs — 5 min read

Intro

Definition

Abstraction

State Space

Best-First

Frontier

Redundant Paths

Measuring performance

Why is search important?

Regular programming helps us solve problems where we know what to do and AI helps us solve problems where we **don’t** know exactly what to do.

When the correct action to take is not immediately obvious, an agent may need to plan ahead: to consider a

sequence of actionsthat form a path to agoal state. Such an agent is called a problem-solving agent, and the computational process it undertakes is calledsearch.

We are interested in problem-solving. The complexity of different problems can arise from different reasons and examine the two following scenarios in the context of navigation:

- Using street signs: requires a sequence of decisions ultimately leading to the destination. Complexity arises from the many possible states and the decisions required.
- Fog in the forest: complexity comes from the lack of visiblity, rendering the actions we can take not very clear.

A route-finding problem:

- the agent is tasked to find a path from Arad city A to Bucharest city B
- actions of the agent: drive

Can the agent find a solution?

Definition of a problem:

- a set of possible states that the environment can be in ⇒
**the state space** - the
**initial state**that the agent starts in - a set of one or more
**goal states:**`{sg1,sg2..}`

- the
**actions**available to the agent. Given a state`s`

,`action(s)`

returns a finite set of action that can be executed in`s`

:`actions(s) => {a1,a2,a3}`

- a
**transition model**, which describes what each action does:`result(s,a) => s'`

- an
**action cost function,**denoted by action-cost(s,a,s’), that gives the numeric cost of applying action a in state s to reach state s’. - goaltest(s) -> T|F: a function that takes a state as an input and determines if the goal state has been reached
- pathcost(s-a->s-a->s) -> n: function that takes a sequence a of state transitions through actions and produces the cost 'n'
- stepcost(s,a,s1) -> n

A sequence of actions forms a path, and a solution is a path from the initial state to a goal state. Assuming the costs are additive, the cost of a path is the sum of the individual action costs. The path with the lowest lost is the optimal solution. Search problems can be modelled as a graph, with the vertices representing states and the directed edges as actions.

When we are interested in solving real-world problems, it’s important for us to use the right level of abstraction to develop our algorithms. Modeling all of the world is not entirely useful, and often impossible.

For example, in a search problem for finding the best route between two geographical locations, we need to decide on the level of abstraction for the action of movement. If the goal state is multiple cities away from the initial state, would driving 1km be a reasonable level of abstraction for the action? Most likely not.

We want our abstract models to have the following properties:

- valid - possible to elaborate the abstract solution into a more detailed world
- useful - carrying out each of the actions in the solution is easier than the original problem

A search algorithm takes a search problem as an input and returns the optimal solution or an indication of failure.

We’ll learn about algorithms which does this by superimposing a search tree over the state space graph. Each node in the search tree corresponds to the state in the state-space graph and the edges correspond to the actions. The root of the tree correspond to the initial state of the problem.

As the algorithm seeks the optimal solution, it explores the nodes in the search tree. A node can be expanded by considering all the available actions for that state, using the Result function to see where those actions lead to and generating a new node for each of the resulting children. This exploration requires a decision of which nodes to expand next, and which ones to disregard if any, the very essence of a search algorithm.

This exploration divides the state space into three regions:

- Explored/reached: where every state has been expanded
- Frontier: the very last states explored in a possible path. This separated the explored from the unexplored
- Unexplored: no nodes have been generated for these states yet

A generic algorithm for deciding which node to expand next is to choose a node n with a minimum value of some evaluation function. The algorithm is as follows:

On each iteration:

choose a node on the frontier with a minimum f(n) value

return if its state is a goal state: `goaltest(s)`

apply expand to generate child nodes

add each child node to frontier list if not reached before, or has now lower cost

By using different f(n) for determining the minimum cost, we can develop different algorithms.

Pseudocode:

```
1function best-first-search(problem,f) return {solution node, failure}2 node <- Node(State=problem.Initial) 3 frontier <- priority queue ordered by f4 reached <- visited nodes, initialized with initial state5
6 while frontier is not empty:7 node <- Pop(frontier)8 if goaltest(node.State) return node9 10 for each child in expand(problem, node):11 s <- child.State12 if s is not in reached or s.PathCost < reached[s].PathCost:13 reached[s] <-child14 add child to frontier15 16 return failure17
18function expand(problem, node) yields nodes19 s <- node.State20 for each action in problem.Actions(s):21 s' <- result(s,action)22 cost <- node.PathCost + stepcost(s,action,s')23 yield Node(State=s',Parent=node,Action=action,PathCost=cost)
```

The frontiers are best stored in a queue due to the nature of operations required to perform on them eg. pop, add etc.

Kinds of queues:

- a priority queue- pops node with the lowest cost first based on a heuristic function, used in best-first search
- a FIFO queue- pops node added first, used in breadth-first search
- a LIFO queue (stack) - last node in gets popped first, used in depth-first search

As we expand and explore our search tree, we could run into redundant paths that do not contribute to reaching our goal state. There are three strategies to deal with them:

- Remember all previously reached states: as done in a best-first search algorithm, the visited states can be stored in memory and accessed using a lookup such as a hash table. This ensures we do not explore them again. This is suitable when there are many redundant paths and the reached states will fit in memory
- Do not care: for some search problems, redundant paths are not possible. For example, in an assembly line problem, going backwards is not a valid action. In such scenarios, we use tree-like search algorithms, ones which do not check for redundant paths. Algorithms which do check, are defined as graph search algorithms.
- A compromise: only check for cycles but not for redundant paths in general. As each node has pointer to its ancestors, it makes it possible to traverse up the links to determine if a certain state appeared before. Certain implementations detect short cycles this way and rely on other mechanism for longer cycles.

We can evaluate the algorithm’s performance in four ways:

- Completeness: is it guaranteed to find an solution when one exists and fail when none does?
- Cost optimality: is it able to find the lowest path cost solution?
- Time complexity: speed of finding optimal solution
- Space complexity: memory cost of running the algorithm

Measuring completeness: to guarantee a search algorithm will find a solution in an infinite space, it must be systematic in the way it explores an infinite state space, making sure it can eventually reach any state that is connected to the initial state.

References:

Russel, S., & Norvig, P. (2018). ARTIFICIAL INTELLIGENCE : a modern approach. Prentice Hall.

Georgia Tech Lectures: