Searching Algorithms

Searching Algorithms

Searching algorithms belong to a category of widely used algorithms in Computer Science. Searching algorithms are mainly used to lookup an entry from data structures, such as linked lists, trees, graphs, etc. In this blog, we will discuss the working, types, and a few implementations of searching algorithms.

Searching Algorithm:

A searching algorithm is used to locate and retrieve an element – say x – from a data structure where it is stored – say a list. The algorithm returns the location of the searched element or indicates if it is not present in the data structure.

Searching algorithms are categorized into two types depending upon their search strategy.

Sequential Search: These algorithms linearly search the whole data structure by visiting every element. These algorithms are, therefore, comparatively slower. An example is a linear search algorithm.

Interval Search: these algorithms perform searching in a sorted data structure. The idea is to target the center of the data structure and then perform a search operation on the two halves. These algorithms are efficient. An example is a binary search algorithm.

Next, let’s discuss some commonly used searching algorithms.

Linear Search:

The very basic searching algorithm traverses the data structure sequentially. The algorithm visits every element one by one until the match is found. If the element is not present in the data structure, -1 is returned. Linear search applies to both sorted and unsorted data structures. Although the implementation is quite easy, linear search has a time complexity of O(n), where n is the size of the data structure. It means it is linearly dependent on the size and quite slow in terms of efficiency.

Binary Search:

Binary search works on a sorted data structure. It finds the middle of the list. If the middle element is the required element, it is returned as a result. Otherwise, it is compared with the required element. If the middle element is smaller than the required element, the right half is considered for further search, else the left half is considered. Again, the middle of the selected half is selected and compared with the required value. The process is repeated until the required value is found.

Binary search with time complexity of O(nlogn) is an efficient search algorithm, especially for large and sorted data records.

Interpolation Search:

Interpolation search is a combination of binary and linear search algorithms. Also called an extrapolation search, this algorithm first divides the data structure into two halves (like binary search) and then sequentially searches in the required half (linear search). In this way, it saves from dividing the data structure into two halves every time as in a binary search.

For an array, arr[], x is the element to be searched. The ‘start’ is the starting index of the array and ‘end’ is the last index. The position of the element x is calculated by the formula:

position = start + [ (x-arr[start])*(end-start) / (arr[end]-list[start]) ]

The position is calculated and returned if it is a match. If the result is less than the required element, the position in the left sub-array is calculated. Else, the position is calculated in the right sub-array. The process is repeated until the element is found or the array reduces to zero.

Interpolation search works when the data structure is sorted. When the elements are evenly distributed, the average run time of interpolation search is O(log(log(n))).

Breadth-First Search Algorithm:

BFS is used to search for a value in a graph or tree data structure. It is a recursive algorithm and traverses the vertices to reach the required value. The algorithm maintains two lists of visited vertices and non-visited vertices. It is implemented using a queue data structure. The idea is to visit every node at one level (breadth-wise) before moving to the next level of the graph.

The algorithm selects the first element from the queue and adds it to the visited list. All the adjacent vertices (neighbors) of this element are traversed. If a neighbor is not present in the visited list, it means it is not traversed yet. It is placed at the back of the queue and in the visited list. If a neighbor is present in the visited list already, it means it has been traversed and is ignored. The process is repeated until the required element is found or the queue is empty.

The following example will make the concept clear of breadth-first traversal.

 

 

 

Depth First Search Algorithm:

This is another recursive search algorithm used for graphs and trees. In contrast to the BFS algorithm, DFS works on the strategy of backtracking. The algorithm keeps on moving to the next levels (depth-wise) until there are no more nodes. The algorithm then backtracks to the upper level and repeats for the other node. The DFS algorithm is implemented using a stack data structure.

The algorithm pops an element from the top of the stack and places it in the visited list. All the nodes adjacent to this element are also selected. If any adjacent node value is not present in the visited list, it means it is not traversed yet, and it is placed at the stack top. Steps are repeated until the element is found or the stack is empty.

See the following example for a better understanding of depth-first traversal.

 

 

 

Next, let’s discuss some common algorithm solutions for searching problems.

Problem 1: Find the largest element in an array of unique elements.

Algorithm: We can use Linear Search to find the largest element in an array of size n. The time complexity would be O(n).

Initialize an integer value as max = 0. In a loop, visit each element in an array and compare it with the max value. If the element is greater than the max, swap it with the max value. When the loop ends, the max will contain the largest element of the array.

Problem 2: Find the position of an element in an infinite sorted array

Algorithm: A sorted array indicates that a binary search algorithm can work here. Another point to note is that the array is infinite. It means we don’t know the maximum index or upper bound of the array. To solve such a problem, we will first find the upper index and then apply the binary algorithm on the array.

To find the upper bound/index of the infinite array:

Initialize integer min, max, and value for minimum, maximum, and first array element. Set min, max, and value as 0, 1, and arr[0].

Run a loop until the value is less than the element we are looking for. If the value is less than the element swap min with the max, double the max, and store arr[max] in value. At the end of the loop, we get lower and upper bounds of the array in min and max. We can call binary search on these values.

For binary search:

If the max value is greater than 1, find mid of the array by min + (max -1)/2.

If mid is the value we are looking for, return mid. If mid is less than the search value, call binary search on the left half of the array. Take mid-1 as the maximum bound. Else, call binary search on the right half of the array, and take mid+1 as the minimum bound.

 

Problem 3: Find the number of occurrence for an element in a sorted array

Algorithm: To find how many times the element appears in a sorted array, we can use both linear and binary search.

In linear search, maintain two integer variables result and key. Loop over the entire array, on every occurrence of the key, increment the result by 1. At the end of the loop, the result contains the number of occurrences of the key.

Problem 4: Find if the given array is a subset of another array:

Algorithm: For this problem, we don’t know if the array is sorted or not.

The algorithm for linear search includes running two loops for array 1 and array 2. The outer loop selects elements of array 2 one by one. The inner loop performs a linear search in array 1 for array 2’s element. The complexity of this algorithm is O(mxn), where m and n are the sizes of array 1 and array 2, respectively.

The problem can also be solved using binary search after sorting the array. First sort array 1. The time complexity of sorting is O(m log m). For each element of array 2, call binary search on array 1. Time complexity is O(n log m). The total time complexity for the algorithm is O(m logm + n logm), where m and n are the sizes of arrays 1 and 2, respectively.

Ending Note:

In this blog on searching algorithms, we discussed a very basic algorithm type. Search algorithms are the most commonly used algorithms, as the concept of searching data and records is quite common in software development. The records from databases, servers, files, and other storage locations are searched and retrieved frequently. Efficient searching algorithms with efficient time and storage complexity play a key role here.

Get one step closer to your dream job!

 

Prepare for your programming interview with Data Structures & Algorithms Interview Questions You’ll Most Likely Be Asked. This book has a whopping 200 technical interview questions on lists, queues, stacks, and algorithms and 77 behavioral interview questions crafted by industry experts and interviewers from various interview panels.