Sorting Searching Algorithms

Table Of Content
please excuse the mess, my TOC cannot handle more than 2 tags, need to fix that before i try to organize this mess of a notes page :')
Sorting Algorithms
Divide and Conquer Algorithms (to-do)
Merge Sort
Intro
Merge Sort is a divide and conquer sorting algorithm that sorts an array by recursion:
- Given an array. Split into two halves
- Split until you get base case of list of 1 element per list
- Recurse, merging and sorting 2 lists at a time
- Return to original length now sorted list
Diagram
Splitting Phase:
Level 0: [8, 4, 7, 3, 2, 6, 5, 1] 2^0 = 1 subarrays (n elements, O(n))
/ \
Level 1: [8, 4, 7, 3] [2, 6, 5, 1] 2^1 = 2 subarrays (n/2 elements, O(n/2))
/ \ / \
Level 2: [8, 4] [7, 3] [2, 6] [5, 1] 2^2 = 4 subarrays (n/4 elements, O(n/4))
/ \ / \ / \ / \
Level 3: [8] [4] [7] [3] [2] [6] [5] [1] 2^3 = 8 subarrays (n/8 element, O(1))
- In merge sort, the splitting does not contribute to O(n) per level, since we simply pass indices indicating bounds of the subarray which is O(1).
- Recursive Memory Lifecycle:
Merging Phase:
Level 3: [8] [4] [7] [3] [2] [6] [5] [1] (base case, single elements)
\ / \ / \ / \ /
Level 2: [4, 8] [3, 7] [2, 6] [1, 5] (merge and sort pairs)
\ / \ /
Level 1: [3, 4, 7, 8] [1, 2, 5, 6] (merge sorted halves)
\ /
Level 0: [1, 2, 3, 4, 5, 6, 7, 8] (final merge)
- In merge sort, the merging step is where all the real work happens.
- At each level, you have 2level subarrays.
- Each subarray's size of n/2level
- Total merge work done at each level = num of subarrays * size per subarray = 2level * n/2level = n
- Total merge work done = merge work at each level * num of levels = n * log(n) = O(n log n)
Recursion Phase 1:
Level 0: [8, 4, 7, 3, 2, 6, 5, 1] 2^0 = 1 subarrays (n elements, O(n))
/ \
Level 1: [8, 4, 7, 3] [2, 6, 5, 1] 2^1 = 2 subarrays (n/2 elements, O(n/2))
/
Level 2: [8, 4] 2^2 = 4 subarrays (n/4 elements, O(n/4))
/ \
Level 3: [8] [4] 2^3 = 8 subarrays (n/8 element, O(1))
- We can also go step by step to demonstrate why space complexity is O(n)
Recursion Phase 2:
Level 0: [8, 4, 7, 3, 2, 6, 5, 1] 2^0 = 1 subarrays (n elements, O(n))
/ \
Level 1: [8, 4, 7, 3] [2, 6, 5, 1] 2^1 = 2 subarrays (n/2 elements, O(n/2))
/ \
Level 2: [4, 8] [7, 3] 2^2 = 4 subarrays (n/4 elements, O(n/4))
/ \
Level 3: [7] [3] 2^3 = 8 subarrays (n/8 element, O(1))
Recursion Phase 3:
Level 0: [8, 4, 7, 3, 2, 6, 5, 1] 2^0 = 1 subarrays (n elements, O(n))
/ \
Level 1: [8, 4, 7, 3] [2, 6, 5, 1] 2^1 = 2 subarrays (n/2 elements, O(n/2))
/ \
Level 2: [4, 8] [3, 7] 2^2 = 4 subarrays (n/4 elements, O(n/4))
Level 3: 2^3 = 8 subarrays (n/8 element, O(1))
Recursion Phase 4:
Level 0: [8, 4, 7, 3, 2, 6, 5, 1] 2^0 = 1 subarrays (n elements, O(n))
/ \
Level 1: [3, 4, 7, 8] [2, 6, 5, 1] 2^1 = 2 subarrays (n/2 elements, O(n/2)) 2^1 = 2 subarrays (n/2 elements, O(n/2))
Level 2: 2^2 = 4 subarrays (n/4 elements, O(n/4))
Level 3: 2^3 = 8 subarrays (n/8 element, O(1))
- Notice that how peak memory usage at any moment involves only a few active arrays sums roughly to n.
- Therefore, memory is not cumulative across all calls, but localized to active recursive paths
Code
def isAnagram(self, s: str, t: str) -> bool:
if len(s) != len(t):
return False
# Splitting list and recursively sorting
def mergeSort(string):
if len(string) <= 1:
return string
mid = len(string) // 2
left = mergeSort(string[:mid])
right = mergeSort(string[mid:])
return merge(left, right)
# merging and sorting left + right sorted arrays
def merge(left, right):
sorted = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]:
sorted.append(left[i])
i += 1
else:
sorted.append(right[j])
j += 1
# Explicitly handle any remaining elements
if i < len(left): # If elements remain in `left`
sorted.extend(left[i:])
else: # If elements remain in `right`
sorted.extend(right[j:])
return sorted
# Sorting both s and t strings and comparing
sorted_s = mergeSort(list(s))
sorted_t = mergeSort(list(t))
return sorted_s == sorted_t
Space & Time Complexity
Aspect | Time Complexity | Space Complexity | Time Explanation | Space Explanation |
---|---|---|---|---|
Dividing the Array | O(log n) | O(log n) | The recursive functional calls use stack memory proportional to the depth of recursion, which is O(log n) | |
Merging at 1 level | O(n) | O(n) | A merge operation at a single level combines two sorted subarrays, by iterating over both sorted arrays and adding elements in order to the result array. For a merge at level l, the work is (num of subarrays * work per subarray) = O(2t * n/2t) = O(n) | . |
Merging across all levels | O(n log n) | O(n) | Over log n levels of recursion, each level processes n elements, leading to O(n log n) | While we do create arrays of size n/4, n/8, and so forth, there will only actually exist two of such arrays at any given time. For example, we start by creating two arrays of size n/2. Then, we recurse down only one one of the arrays and create two arrays of size n/4. Then we recurse down again and create two arrays of size n/8... And once we return from a recursive call, we end up destroying the temporary arrays we created in that call. |
Overall | O(n log n) | O(n) | Merging across all levels dominates time complexity. Leading to O(n log n) | Memory allocation for temporary arrays during merging at each level dominate space complexity. Leading to O(n) |
Best, Average, Worst Case
Time Complexity of merge sort is O(n log n) across the all 3 cases.
This is because Merge sort still recursively divides and merges the array, regardless if the initial array is sorted or unsorted.
So it still goes through all the steps and requirements of dividing and merging across all levels leading to the same time and space complexity for best, average, and worst case.
Summary
Merge sort's behavior does not depend on the input order. It is a predicable sorting algorithm with its main drawback being the space requirement O(n) when compared to in-place sorting algorithms like quicksort or heap sort.
Insertion Sort (to-do)
Intro
Insertion sort is a comparison-based sorting algorithm that builds that final sorted array one element at a time.
- Set a sorted partition with the first element in the array
- Work from left to right, compare each new element to the right most element in the sorted partition
- If a new element is sorted given the partition, traverse the partition right to left until the element's correct position is found
- Continue and will generate an array of sorted partition
Diagram
Code
def insertion_sort(arr):
# time complexity: iterate over int array O(n)
for i in range(1, len(arr)):
key = arr[i]
j = i - 1
# compare new element to sorted partition list right to left, until correct position is found
# Best time complexity: O(1) int array is sorted and no shifting is necessary
# Average time complexity: O(n) = O(n/2) on average, compares with half of total elements in sorted partition
# Worst time complexity: O(n), compares with all elements in sorted partition
while j >= 0 and arr[j] > key:
# Shift elements sorted partition to the right if they're greater than the key
arr[j + 1] = arr[j]
j -= 1
# Insert the key into its correct position
arr[j + 1] = key
# overall: time complexity - Best O(n) - Average O(n^2) - Worst O(n^2)
# overall: space complexity - Best O(1) - Average O(1) - Worst O(1)
return arr
Space & Time Complexity
Best, Average, Worst Case
Aspect | Time Complexity | Space Complexity | Time Explanation | Space Explanation |
---|---|---|---|---|
Best Case | ||||
Average Case | ||||
Worst |
Summary
Tim Sort
Intro
TimSort is a hybrid sorting algorithm derived from merge sort and insertion sort. It uses small arrays of length 'run' typically 32 or 64 to sort and merge them efficiently. We use insertion sort of the sub run arrays and then we merge each of the sub run arrays, leveraging benefits of both insertion and merge sort.
Diagram
Run Identification and Sorting Phase:
Level 0: [8, 4, 7, 3, 2, 6, 5, 1] 1 array (n elements, O(n))
/ \
Step 1: [8, 4, 7, 3] [2, 6, 5, 1] Identify runs (min_run size, e.g., 4)
\ /
Step 2: [3, 4, 7, 8] [1, 2, 5, 6] Runs (sorted via insertion sort)
Level 1: [3, 4, 7, 8] [1, 2, 5, 6] Merge pairs of runs (O(n))
\ /
Level 0: [1, 2, 3, 4, 5, 6, 7, 8] Final merge (O(n))
Code
# sorts array into place using runs of 32
def timSort(arr: List[int], run: int = 32) -> None:
n = len(arr)
# Sort small runs using Insertion Sort
for i in range(0, n, run):
insertionSort(arr, i, min(i + run - 1, n - 1))
# Iterate over array and merge sorted small runs
size = run
while size < n:
# merge neighbor pairs of runs of length 'size'
for start in range(0, n, size * 2):
mid = min(start + size - 1, n - 1)
end = min(start + size * 2 - 1, n - 1)
# merge if two valid runs exists (mid < end)
if mid < end:
merge(arr, start, mid, end)
# merged two runs of length size
# update new run length = old run len + old run len
size *= 2
# sorts small segment of array using Insertion Sort
def insertionSort(arr: List[int], start: int, end: int) -> None:
# insertion sort algo on small segment [start+1, end]
for i in range(start + 1, end + 1):
key = arr[i]
j = i - 1
while j >= start and arr[j] > key:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
# merges two sorted subarrays into a single sorted segment using Merge Sort
def merge(arr: List[int], start: int, mid: int, end: int) -> None:
# temporary arrays for left and right halves
left = arr[start:mid + 1]
right = arr[mid + 1:end + 1]
# pointer for left and right arrays
i = j = 0
# pointer for main array where merged elements are placed
k = start
# merge sort algo elements into arr[start...end]
while i < len(left) and j < len(right):
if left[i] <= right[j]:
arr[k] = left[i]
i += 1
else:
arr[k] = right[j]
j += 1
k += 1
# place left over elements within the array, cannot use extend()
while i < len(left):
arr[k] = left[i]
i += 1
k += 1
while j < len(right):
arr[k] = right[j]
j += 1
k += 1
Space & Time Complexity
Best, Average, Worst Case
Bucket Sort (to-do)
Quick Sort (to-do)
Searching Algorithms
Binary Search
Intro
Binary Search is a divide and conquer algorithm for finding a target element in a sorted array. It halves the search space in each iteration.
- Start with two pointers, left and right, at the beginning and end of the array
- Calculate the middle index, mid, in between left and right
- Compare the target with the element at mid:
- if the target equals num[mid], target has been found
- if the target is smaller, move the right pointer to mid - 1
- if the target is larger, move the left pointer to mid + 1
- Repeat until left > right or target has been found
Diagram
Find Target = 7
---------
Initial Array: [2, 3, 5, 7, 9, 11, 15]
Initial L + R: L^ R^ (left = 0, right = 6)
mid = (L + R) // 2 -> (0 + 6) // 2 = 3
Compare: numbers[mid] 7 == target 7
Match found!
Find Target = 9
---------
Initial Array: [2, 3, 5, 7, 9, 11, 15]
Initial L & R: L^ R^ (left = 0, right = 6)
mid = (L + R) // 2 -> (0 + 6) // 2 = 3
compare: num[mid] ? target -> 7 < 9
result: mid less than target, update L -> mid + 1 = 4
---------
Array [2, 3, 5, 7, 9, 11, 15]
L & R: L^ R^ (left = 4, right = 6)
mid = (4 + 6) // 2 = 5
compare: 11 > 9
result: mid greater than target, update R -> mid - 1 = 4
---------
Array [2, 3, 5, 7, 9, 11, 15]
L & R: L/R^ (left = 4, right = 4)
mid = (4 + 4) // 2 = 4
compare: 9 == 9
result: target found!
Code
def binarySearch(numbers: List[int], target: int, left: int, right: int) -> int:
# base case: target not found
if left > right:
return null
# calculate mid index
mid = (left + right) // 2
# time complexity: O(1) for each comparison
if numbers[mid] == target:
return mid # Target found
# time complexity: O(log n)
elif numbers[mid] < target:
return binarySearch(numbers, target, mid + 1, right)
# time complexity: O(log n)
else:
return binarySearch(numbers, target, left, mid - 1)
# overall: time complexity O(log n)
# overall: space complexity O(log n)
Space & Time Complexity
Aspect | Time Complexity | Space Complexity | Time Explanation | Space Explanation |
---|---|---|---|---|
Best, Average, Worst Case
Summary
Graph Algorithms
Performance
Time & Space Complexity
Amortized Complexity (to-do)
Caching (to-do)
Dynamic Programming (to-do)
Double For Loop
Code
def example(self, nums: List[int]) -> bool:
# compare every element, with every element
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] == nums[j]:
return True
return False
Diagram
Outer Loop (i) | Inner Loop Start: (j = i + 1) | Inner Loop End: (j) | Total Inner Loop Iterations | Total Outer Loop Iterations (Remaining) |
---|---|---|---|---|
i = 0 | j = 1 | j = n - 1 | n - 1 | n |
i = 1 | j = 2 | j = n - 1 | n - 2 | n - 1 |
i = 2 | j = 3 | j = n - 1 | n - 3 | n - 2 |
... | ... | ... | ... | ... |
i = n - 2 | j = n - 1 | j = n - 1 | 1 | 2 |
i = n - 1 | j = n | None (out of bounds) | 0 | 1 |
i = n | j = n + 1 | None (out of bounds) | 0 | 0 |
Time Complexity Derivation
The total number of iterations in the inner loop across each iteration of the outer loop is: (n − 1) + (n − 2) + (n − 3) + ... + 3 + 2 + 1
We can write this series backwards: 1 + 2 + 3 + ... + (n - 3) + (n - 2) + (n − 1)
Now if we add 2 of the same series:
[Forward series] + [Backward series] =
[(n − 1) + (n − 2) + (n − 3) + ... + 3 + 2 + 1] + [1 + 2 + 3 + ... + (n - 3) + (n - 2) + (n − 1)] =
We can merge the same pairs:
[(n − 1) + 1, (n − 2) + 2,(n − 3) + 3, ... ] =
Each individual pairs cancel out n, and we left n-1 number of pairs. So the sum of the two series is:
n * (n-1)
And since we added the series twice, divide by 2 we get:
[n * (n-1)] / 2
Now if we simplify and apply Big(O) then we arrive at O(n2) time complexity.
Combinatorics
We can also interpret the double for loop as counting all pairs (i, j). Which would be equivalent to n choose 2.
c(n 2) = n(n - 1)/2
Triple For Loop
Code
def example(self, nums: List[int]) -> bool:
# trying to find triplets
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
for k in range(j + 1, len(nums)):
if nums[i] == nums[j] and nums[j] == nums[k] :
return True
return False
Diagram
to much to show!
Time Complexity Derivation
Similar to double for loop, we can interpret a triple for loop as counting all triples (i, j, k). Which would be equivalent to n choose 3.
c(n 3) = n(n - 1)(n - 2)/3