Sorting Searching Algorithms

please excuse the mess, my TOC cannot handle more than 2 tags, need to fix that before i try to organize this mess of a notes page :')

Divide and Conquer Algorithms (to-do)

Merge Sort

Intro

Merge Sort is a divide and conquer sorting algorithm that sorts an array by recursion:

Given an array. Split into two halves
Split until you get base case of list of 1 element per list
Recurse, merging and sorting 2 lists at a time
Return to original length now sorted list

Diagram

Splitting Phase:

Level 0:       [8, 4, 7, 3, 2, 6, 5, 1]        2^0 = 1 subarrays  (n elements, O(n))
                   /              \
Level 1:    [8, 4, 7, 3]      [2, 6, 5, 1]     2^1 = 2 subarrays  (n/2 elements, O(n/2))
               /     \          /     \
Level 2:    [8, 4] [7, 3]     [2, 6] [5, 1]    2^2 = 4 subarrays  (n/4 elements, O(n/4))
             /  \    /   \     /   \   /  \
Level 3:    [8] [4] [7] [3]   [2] [6] [5] [1]  2^3 = 8 subarrays  (n/8 element, O(1))

In merge sort, the splitting does not contribute to O(n) per level, since we simply pass indices indicating bounds of the subarray which is O(1).
Recursive Memory Lifecycle:

Merging Phase:

Level 3:   [8] [4]   [7] [3]    [2] [6]   [5] [1]   (base case, single elements)
            \  /      \  /        \  /      \  /
Level 2:   [4, 8]   [3, 7]       [2, 6]   [1, 5]    (merge and sort pairs)
               \       /             \       /
Level 1:     [3, 4, 7, 8]         [1, 2, 5, 6]      (merge sorted halves)
                   \                  /
Level 0:        [1, 2, 3, 4, 5, 6, 7, 8]            (final merge)

In merge sort, the merging step is where all the real work happens.
At each level, you have 2^level subarrays.
Each subarray's size of n/2^level
Total merge work done at each level = num of subarrays * size per subarray = 2^level * n/2^level = n
Total merge work done = merge work at each level * num of levels = n * log(n) = O(n log n)

Recursion Phase 1:

Level 0:       [8, 4, 7, 3, 2, 6, 5, 1]        2^0 = 1 subarrays  (n elements, O(n))
                   /            \     
Level 1:    [8, 4, 7, 3]     [2, 6, 5, 1]      2^1 = 2 subarrays  (n/2 elements, O(n/2)) 
               /                      
Level 2:    [8, 4]                             2^2 = 4 subarrays  (n/4 elements, O(n/4))
             /  \                      
Level 3:    [8] [4]                            2^3 = 8 subarrays  (n/8 element, O(1))

We can also go step by step to demonstrate why space complexity is O(n)

Recursion Phase 2:

Level 0:       [8, 4, 7, 3, 2, 6, 5, 1]        2^0 = 1 subarrays  (n elements, O(n))
                   /             \ 
Level 1:    [8, 4, 7, 3]     [2, 6, 5, 1]      2^1 = 2 subarrays  (n/2 elements, O(n/2)) 
               /     \               
Level 2:    [4, 8] [7, 3]                      2^2 = 4 subarrays  (n/4 elements, O(n/4))
                    /   \             
Level 3:           [7] [3]                     2^3 = 8 subarrays  (n/8 element, O(1))

Recursion Phase 3:

Level 0:       [8, 4, 7, 3, 2, 6, 5, 1]        2^0 = 1 subarrays  (n elements, O(n))
                   /             \ 
Level 1:    [8, 4, 7, 3]     [2, 6, 5, 1]      2^1 = 2 subarrays  (n/2 elements, O(n/2)) 
               /     \               
Level 2:    [4, 8] [3, 7]                      2^2 = 4 subarrays  (n/4 elements, O(n/4))
                                      
Level 3:                                       2^3 = 8 subarrays  (n/8 element, O(1))

Recursion Phase 4:

Level 0:       [8, 4, 7, 3, 2, 6, 5, 1]        2^0 = 1 subarrays  (n elements, O(n))
                   /              \
Level 1:    [3, 4, 7, 8]     [2, 6, 5, 1]      2^1 = 2 subarrays  (n/2 elements, O(n/2))                  2^1 = 2 subarrays  (n/2 elements, O(n/2))
                                   
Level 2:                                       2^2 = 4 subarrays  (n/4 elements, O(n/4))
                                      
Level 3:                                       2^3 = 8 subarrays  (n/8 element, O(1))

Notice that how peak memory usage at any moment involves only a few active arrays sums roughly to n.
Therefore, memory is not cumulative across all calls, but localized to active recursive paths

Code

    def isAnagram(self, s: str, t: str) -> bool:
        if len(s) != len(t):
            return False
        
        # Splitting list and recursively sorting
        def mergeSort(string):
            if len(string) <= 1:
                return string
            
            mid = len(string) // 2
            left = mergeSort(string[:mid])
            right = mergeSort(string[mid:])
            
            return merge(left, right)
        
        # merging and sorting left + right sorted arrays
        def merge(left, right):
            sorted = []
            i = j = 0
            while i < len(left) and j < len(right):
                if left[i] <= right[j]:
                    sorted.append(left[i])
                    i += 1
                else:
                    sorted.append(right[j])
                    j += 1
            
            # Explicitly handle any remaining elements
            if i < len(left):   # If elements remain in `left`
                sorted.extend(left[i:])
            else:               # If elements remain in `right`
                sorted.extend(right[j:])
            return sorted

        # Sorting both s and t strings and comparing
        sorted_s = mergeSort(list(s))
        sorted_t = mergeSort(list(t))

        return sorted_s == sorted_t

Space & Time Complexity

Aspect	Time Complexity	Space Complexity	Time Explanation	Space Explanation
Dividing the Array	O(log n)	O(log n)	The recursive functional calls use stack memory proportional to the depth of recursion, which is O(log n)
Merging at 1 level	O(n)	O(n)	A merge operation at a single level combines two sorted subarrays, by iterating over both sorted arrays and adding elements in order to the result array. For a merge at level l, the work is (num of subarrays * work per subarray) = O(2^t * n/2^t) = O(n)	.
Merging across all levels	O(n log n)	O(n)	Over log n levels of recursion, each level processes n elements, leading to O(n log n)	While we do create arrays of size n/4, n/8, and so forth, there will only actually exist two of such arrays at any given time. For example, we start by creating two arrays of size n/2. Then, we recurse down only one one of the arrays and create two arrays of size n/4. Then we recurse down again and create two arrays of size n/8... And once we return from a recursive call, we end up destroying the temporary arrays we created in that call.
Overall	O(n log n)	O(n)	Merging across all levels dominates time complexity. Leading to O(n log n)	Memory allocation for temporary arrays during merging at each level dominate space complexity. Leading to O(n)

Time Complexity of merge sort is O(n log n) across the all 3 cases. This is because Merge sort still recursively divides and merges the array, regardless if the initial array is sorted or unsorted.
So it still goes through all the steps and requirements of dividing and merging across all levels leading to the same time and space complexity for best, average, and worst case.

Summary

Merge sort's behavior does not depend on the input order. It is a predicable sorting algorithm with its main drawback being the space requirement O(n) when compared to in-place sorting algorithms like quicksort or heap sort.

Insertion Sort (to-do)

Intro

Insertion sort is a comparison-based sorting algorithm that builds that final sorted array one element at a time.

Set a sorted partition with the first element in the array
Work from left to right, compare each new element to the right most element in the sorted partition
If a new element is sorted given the partition, traverse the partition right to left until the element's correct position is found
Continue and will generate an array of sorted partition

Diagram

Code

    def insertion_sort(arr):

        # time complexity: iterate over int array O(n)
        for i in range(1, len(arr)):

            key = arr[i]
            j = i - 1

            # compare new element to sorted partition list right to left, until correct position is found
            # Best time complexity: O(1) int array is sorted and no shifting is necessary
            # Average time complexity: O(n) = O(n/2) on average, compares with half of total elements in sorted partition
            # Worst time complexity: O(n), compares with all elements in sorted partition
            while j >= 0 and arr[j] > key:
                
                # Shift elements sorted partition to the right if they're greater than the key
                arr[j + 1] = arr[j]
                j -= 1

            # Insert the key into its correct position
            arr[j + 1] = key

        # overall: time complexity - Best O(n) - Average O(n^2) - Worst O(n^2)
        # overall: space complexity - Best O(1) - Average O(1) - Worst O(1)
        return arr

Space & Time Complexity

Best, Average, Worst Case

Aspect	Time Complexity	Space Complexity	Time Explanation	Space Explanation
Best Case
Average Case
Worst

Summary

Tim Sort

Intro

TimSort is a hybrid sorting algorithm derived from merge sort and insertion sort. It uses small arrays of length 'run' typically 32 or 64 to sort and merge them efficiently. We use insertion sort of the sub run arrays and then we merge each of the sub run arrays, leveraging benefits of both insertion and merge sort.

Diagram

Run Identification and Sorting Phase:

Level 0:       [8, 4, 7, 3, 2, 6, 5, 1]        1 array (n elements, O(n))
                     /                \
Step 1:        [8, 4, 7, 3]      [2, 6, 5, 1]  Identify runs (min_run size, e.g., 4)   
                     \                /
Step 2:        [3, 4, 7, 8]    [1, 2, 5, 6]     Runs (sorted via insertion sort)

Level 1:      [3, 4, 7, 8]      [1, 2, 5, 6]    Merge pairs of runs (O(n))
                     \                /
Level 0:            [1, 2, 3, 4, 5, 6, 7, 8]    Final merge (O(n))

Code

        # sorts array into place using runs of 32
        def timSort(arr: List[int], run: int = 32) -> None:
            
            n = len(arr)

            # Sort small runs using Insertion Sort
            for i in range(0, n, run):
                insertionSort(arr, i, min(i + run - 1, n - 1))

            # Iterate over array and merge sorted small runs
            size = run
            while size < n:

                # merge neighbor pairs of runs of length 'size'
                for start in range(0, n, size * 2):
                    mid = min(start + size - 1, n - 1)
                    end = min(start + size * 2 - 1, n - 1)

                    # merge if two valid runs exists (mid < end)
                    if mid < end:
                        merge(arr, start, mid, end)

                # merged two runs of length size
                # update new run length = old run len + old run len
                size *= 2

        # sorts small segment of array using Insertion Sort
        def insertionSort(arr: List[int], start: int, end: int) -> None:
            
            # insertion sort algo on small segment [start+1, end]
            for i in range(start + 1, end + 1):
                key = arr[i]
                j = i - 1

                while j >= start and arr[j] > key:
                    arr[j + 1] = arr[j]
                    j -= 1
                arr[j + 1] = key

        # merges two sorted subarrays into a single sorted segment using Merge Sort
        def merge(arr: List[int], start: int, mid: int, end: int) -> None:
            
            # temporary arrays for left and right halves
            left = arr[start:mid + 1]
            right = arr[mid + 1:end + 1]

            # pointer for left and right arrays
            i = j = 0

            # pointer for main array where merged elements are placed
            k = start

            # merge sort algo elements into arr[start...end]
            while i < len(left) and j < len(right):
                if left[i] <= right[j]:
                    arr[k] = left[i]
                    i += 1
                else:
                    arr[k] = right[j]
                    j += 1
                k += 1

            # place left over elements within the array, cannot use extend()
            while i < len(left):
                arr[k] = left[i]
                i += 1
                k += 1
            while j < len(right):
                arr[k] = right[j]
                j += 1
                k += 1

Space & Time Complexity

Best, Average, Worst Case

Bucket Sort (to-do)

Quick Sort (to-do)

Searching Algorithms

Binary Search

Intro

Binary Search is a divide and conquer algorithm for finding a target element in a sorted array. It halves the search space in each iteration.

Start with two pointers, left and right, at the beginning and end of the array
Calculate the middle index, mid, in between left and right
Compare the target with the element at mid:

if the target equals num[mid], target has been found
if the target is smaller, move the right pointer to mid - 1
if the target is larger, move the left pointer to mid + 1

Repeat until left > right or target has been found

Diagram

    Find Target = 7

    ---------
    Initial Array:  [2, 3, 5, 7, 9, 11, 15]
    Initial L + R:  L^                 R^       (left = 0, right = 6)
                    
    mid = (L + R) // 2 -> (0 + 6) // 2 = 3
    Compare: numbers[mid] 7 == target 7
    Match found!

    Find Target = 9
    
    ---------
    Initial Array:  [2, 3, 5, 7, 9, 11, 15]
    Initial L & R:  L^                 R^       (left = 0, right = 6)

    mid = (L + R) // 2 -> (0 + 6) // 2 = 3
    compare: num[mid] ? target -> 7 < 9
    result: mid less than target, update L -> mid + 1 = 4 

    ---------
    Array          [2, 3, 5, 7, 9, 11, 15]
    L & R:                     L^     R^        (left = 4, right = 6)

    mid = (4 + 6) // 2 = 5
    compare: 11 > 9
    result: mid greater than target, update R -> mid - 1 = 4

    ---------
    Array          [2, 3, 5, 7, 9, 11, 15]
    L & R:                   L/R^            (left = 4, right = 4)

    mid = (4 + 4) // 2 = 4
    compare: 9 == 9
    result: target found!

Code

    def binarySearch(numbers: List[int], target: int, left: int, right: int) -> int:
        
        # base case: target not found
        if left > right:
            return null  
        
        # calculate mid index 
        mid = (left + right) // 2
        
        # time complexity: O(1) for each comparison
        if numbers[mid] == target:
            return mid  # Target found

        # time complexity: O(log n)
        elif numbers[mid] < target:
            return binarySearch(numbers, target, mid + 1, right)
        
        # time complexity: O(log n)
        else:
            return binarySearch(numbers, target, left, mid - 1)

        # overall: time complexity  O(log n)
        # overall: space complexity O(log n)

Space & Time Complexity

Aspect	Time Complexity	Space Complexity	Time Explanation	Space Explanation

Best, Average, Worst Case

Summary

Graph Algorithms

Performance

Time & Space Complexity

Amortized Complexity (to-do)

Caching (to-do)

Dynamic Programming (to-do)

Double For Loop

Code

    def example(self, nums: List[int]) -> bool:

        # compare every element, with every element

        for i in range(len(nums)):
            for j in range(i + 1, len(nums)):
                if nums[i] == nums[j]:
                    return True

        return False

Diagram

Outer Loop (i)	Inner Loop Start: (j = i + 1)	Inner Loop End: (j)	Total Inner Loop Iterations	Total Outer Loop Iterations (Remaining)
i = 0	j = 1	j = n - 1	n - 1	n
i = 1	j = 2	j = n - 1	n - 2	n - 1
i = 2	j = 3	j = n - 1	n - 3	n - 2
...	...	...	...	...
i = n - 2	j = n - 1	j = n - 1	1	2
i = n - 1	j = n	None (out of bounds)	0	1
i = n	j = n + 1	None (out of bounds)	0	0

Time Complexity Derivation

The total number of iterations in the inner loop across each iteration of the outer loop is: (n − 1) + (n − 2) + (n − 3) + ... + 3 + 2 + 1

We can write this series backwards: 1 + 2 + 3 + ... + (n - 3) + (n - 2) + (n − 1)

Now if we add 2 of the same series:

[Forward series] + [Backward series] =

[(n − 1) + (n − 2) + (n − 3) + ... + 3 + 2 + 1] + [1 + 2 + 3 + ... + (n - 3) + (n - 2) + (n − 1)] =

We can merge the same pairs:

[(n − 1) + 1, (n − 2) + 2,(n − 3) + 3, ... ] =

Each individual pairs cancel out n, and we left n-1 number of pairs. So the sum of the two series is:

n * (n-1)

And since we added the series twice, divide by 2 we get:

[n * (n-1)] / 2

Now if we simplify and apply Big(O) then we arrive at O(n²) time complexity.

    def example(self, nums: List[int]) -> bool:

        # trying to find triplets

        for i in range(len(nums)):
            for j in range(i + 1, len(nums)):
                for k in range(j + 1, len(nums)):
                    if nums[i] == nums[j] and nums[j] == nums[k] :
                    return True

        return False

Diagram

to much to show!

Time Complexity Derivation

Similar to double for loop, we can interpret a triple for loop as counting all triples (i, j, k). Which would be equivalent to n choose 3.

c(n 3) = n(n - 1)(n - 2)/3