DSA Notes in Python

A complete guide for students, interviewees, and enthusiasts. This document covers major Data Structures and Algorithms with explanations, Python implementations, and complexity analysis.

Table of Contents

Introduction to DSA & Big O

Data Structures are formats for organizing, managing, processing, and storing data. Algorithms are step-by-step procedures for solving problems or accomplishing tasks.

Complexity Analysis (Big O Notation)

Big O notation is used to describe the performance or complexity of an algorithm. It specifically describes the worst-case scenario and can be used to describe the execution time required or the space used (e.g., in memory or on disk) by an algorithm.

O(1) - Constant: The algorithm takes the same amount of time regardless of the input size. Ex: Accessing an array element by index.
O(log n) - Logarithmic: Time increases logarithmically with input size. Common in algorithms that divide the problem in half, like Binary Search.
O(n) - Linear: Time is directly proportional to the input size. Ex: Iterating through a list.
O(n log n) - Log-Linear: Common in efficient sorting algorithms like Merge Sort and Quick Sort.
O(n2) - Quadratic: Time is proportional to the square of the input size. Ex: Nested loops, Bubble Sort.
O(2n) - Exponential: Time doubles with each addition to the input data set. Often seen in recursive solutions like finding all subsets.

Arrays

An array is a collection of items stored at contiguous memory locations. In Python, the `list` type serves as a dynamic array.

Key Characteristics

  • Indexed: Elements are accessed via an index, starting from 0.
  • Contiguous Memory: Elements are stored next to each other in memory, which allows for fast O(1) access.
  • Dynamic (in Python): Python's lists can grow or shrink in size.

Complexity Analysis

OperationTime ComplexitySpace ComplexityNotes
Access (by index)O(1)O(1)e.g., `my_list[i]`
Search (by value)O(n)O(1)Must potentially check every element.
Append (add to end)O(1) (Amortized)O(1)Occasionally requires resizing (O(n)), but on average it's constant.
Insert (at arbitrary index)O(n)O(1)Requires shifting all subsequent elements.
Delete (at arbitrary index)O(n)O(1)Requires shifting all subsequent elements.

Python Code Example


# Python's list is a dynamic array
arr = [10, 20, 30, 40, 50]

# Access (O(1))
print(f"Element at index 2: {arr[2]}") # Output: 30

# Append (Amortized O(1))
arr.append(60)
print(f"After append: {arr}") # Output: [10, 20, 30, 40, 50, 60]

# Insert (O(n))
arr.insert(1, 15) # Insert 15 at index 1
print(f"After insert: {arr}") # Output: [10, 15, 20, 30, 40, 50, 60]

# Delete (O(n))
arr.pop(3) # Deletes element at index 3
print(f"After pop: {arr}") # Output: [10, 15, 20, 40, 50, 60]

# Search (O(n))
try:
    index = arr.index(40)
    print(f"Index of 40: {index}") # Output: 3
except ValueError:
    print("Element not found")

Strings

A string is a sequence of characters. In Python, strings are immutable, meaning they cannot be changed after they are created. Any operation that appears to modify a string actually creates a new one.

Key Characteristics

  • Immutable: Cannot be changed in place. `my_string[0] = 'a'` will raise a `TypeError`.
  • Ordered: Characters have a defined sequence.
  • Iterable: You can loop through the characters.

Complexity Analysis

OperationTime ComplexitySpace ComplexityNotes
Access (by index)O(1)O(1)e.g., `my_string[i]`
SlicingO(k)O(k)`k` is the length of the slice. A new string is created.
ConcatenationO(n + m)O(n + m)`n` and `m` are lengths of strings. A new string is created.
Search (substring)O(n * m)O(1)Can be faster with algorithms like KMP. `in` operator in Python is highly optimized.

Python Code Example


s = "hello"

# Immutability
# s[0] = 'H'  # This will cause a TypeError

# Slicing (O(k))
new_s = s[1:4] # "ell"
print(f"Slice: {new_s}")

# Concatenation (O(n+m))
s2 = s + " world"
print(f"Concatenated: {s2}")

# Search (Optimized in Python)
if "ell" in s:
    print("Substring 'ell' found")

# String methods create new strings
s_upper = s.upper()
print(f"Original: {s}, Uppercase: {s_upper}")

Linked Lists

A linked list is a linear data structure where elements are not stored at contiguous memory locations. The elements (nodes) are linked using pointers.

Diagram: A -> B -> C -> NULL
Each box (A, B, C) is a node containing `data` and a `next` pointer.

Types of Linked Lists

  • Singly Linked List (SLL): Each node points only to the next node. Traversal is unidirectional.
  • Doubly Linked List (DLL): Each node points to both the next and the previous node. Traversal can be bidirectional.
  • Circular Linked List (CLL): The last node's `next` pointer points back to the first node, forming a circle.

Singly Linked List Implementation


class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

class SinglyLinkedList:
    def __init__(self):
        self.head = None

    def append(self, data):
        new_node = Node(data)
        if not self.head:
            self.head = new_node
            return
        last_node = self.head
        while last_node.next:
            last_node = last_node.next
        last_node.next = new_node

    def prepend(self, data):
        new_node = Node(data)
        new_node.next = self.head
        self.head = new_node

    def delete(self, key):
        curr = self.head
        if curr and curr.data == key:
            self.head = curr.next
            curr = None
            return
        
        prev = None
        while curr and curr.data != key:
            prev = curr
            curr = curr.next

        if curr is None:
            return # Key not found

        prev.next = curr.next
        curr = None
        
    def display(self):
        elements = []
        curr = self.head
        while curr:
            elements.append(str(curr.data))
            curr = curr.next
        print(" -> ".join(elements))

# Example usage
sll = SinglyLinkedList()
sll.append(1)
sll.append(2)
sll.append(3)
sll.prepend(0)
sll.display() # Output: 0 -> 1 -> 2 -> 3
sll.delete(2)
sll.display() # Output: 0 -> 1 -> 3

Comparison: Array vs. Linked List

FeatureArray (Python List)Linked List
Access TimeO(1)O(n)
Insertion/Deletion (at start)O(n)O(1)
Insertion/Deletion (at end)O(1) AmortizedO(n) (or O(1) if tail pointer is maintained)
Insertion/Deletion (in middle)O(n)O(n) (search) + O(1) (link change)
MemoryContiguous blockScattered, requires extra memory for pointers

Stacks

A stack is a linear data structure that follows the LIFO (Last-In, First-Out) principle. The last element added to the stack will be the first one to be removed.

Analogy: A stack of plates. You add a plate to the top and remove a plate from the top.

Core Operations

  • Push: Add an element to the top of the stack.
  • Pop: Remove the top element from the stack.
  • Peek/Top: View the top element without removing it.
  • isEmpty: Check if the stack is empty.

Implementation using Python List


class Stack:
    def __init__(self):
        self.items = []

    def push(self, item):
        self.items.append(item)

    def pop(self):
        if not self.is_empty():
            return self.items.pop()
        return "Stack is empty"

    def peek(self):
        if not self.is_empty():
            return self.items[-1]
        return "Stack is empty"

    def is_empty(self):
        return len(self.items) == 0

    def size(self):
        return len(self.items)

# Example usage
stack = Stack()
stack.push(10)
stack.push(20)
stack.push(30)
print(f"Top element: {stack.peek()}") # Output: 30
print(f"Popped element: {stack.pop()}") # Output: 30
print(f"Is empty? {stack.is_empty()}") # Output: False
In Python, you can simply use a `list` as a stack (`append()` for push, `pop()` for pop). For better performance with a large number of push/pop operations, `collections.deque` is recommended.

Complexity Analysis

OperationTime ComplexitySpace Complexity
PushO(1)O(1)
PopO(1)O(1)
PeekO(1)O(1)
SearchO(n)O(1)

Queues

A queue is a linear data structure that follows the FIFO (First-In, First-Out) principle. The first element added will be the first one to be removed.

Analogy: A checkout line at a store. The first person in line is the first person to be served.

Implementation with `collections.deque`

Using a Python `list` for a queue is inefficient for removals from the front (O(n)). The `collections.deque` (double-ended queue) is optimized for appends and pops from both ends (O(1)).


from collections import deque

queue = deque()

# Enqueue (add to the right)
queue.append(10)
queue.append(20)
queue.append(30)
print(f"Queue: {queue}")

# Dequeue (remove from the left)
item = queue.popleft()
print(f"Dequeued item: {item}") # Output: 10
print(f"Queue after dequeue: {queue}")

# Peek
print(f"Front of queue: {queue[0]}") # Output: 20

Variations of Queues

  • Circular Queue: A fixed-size queue where the last position is connected back to the first, making it a circle. It efficiently uses a block of memory.
  • Deque (Double-Ended Queue): Elements can be added or removed from either the front or the rear. `collections.deque` is Python's implementation.
  • Priority Queue: Each element has a priority associated with it. An element with high priority is dequeued before an element with low priority. In Python, this is implemented using the `heapq` module (which uses a min-heap).

Priority Queue (`heapq`) Example


import heapq

# A min-priority queue
pq = []

# heapq treats the list 'pq' as a min-heap
heapq.heappush(pq, (2, 'Task B')) # (priority, item)
heapq.heappush(pq, (1, 'Task A'))
heapq.heappush(pq, (3, 'Task C'))

print(f"Priority queue: {pq}")

# Pop the item with the smallest priority
highest_priority_task = heapq.heappop(pq)
print(f"Processing: {highest_priority_task}") # Output: (1, 'Task A')

Hashing

Hashing is a technique used to map keys of any size to data of a fixed size. A hash function is used to generate this mapping. Hash tables (or hash maps) are data structures that implement an associative array abstract data type, which can map keys to values.

Key Concepts

  • Hash Function: A function that takes a key and computes an index (a hash code) into an array of buckets or slots. A good hash function distributes keys uniformly.
  • Hash Table: An array where data is stored according to its computed hash code.
  • Collision: Occurs when two different keys hash to the same index.

Collision Resolution Techniques

  1. Separate Chaining: Each bucket in the hash table points to a linked list of records that have the same hash code. This is the most common method.
    Diagram:
    Index 0: -> NULL
    Index 1: -> KeyA -> KeyC -> NULL
    Index 2: -> KeyB -> NULL
  2. Open Addressing (Probing): All elements are stored in the hash table itself. When a collision occurs, we probe for the next empty slot.
    • Linear Probing: Check the next slot `(hash(key) + i) % size`. Suffers from primary clustering.
    • Quadratic Probing: Check slots `(hash(key) + i^2) % size`.

Python's `dict` and `set`

Python's built-in `dict` (dictionary) and `set` are implemented using hash tables. They are highly optimized for performance.

Complexity Analysis (Average Case)

OperationTime ComplexitySpace Complexity
InsertO(1)O(1)
DeleteO(1)O(1)
Search/LookupO(1)O(1)

*Worst-case complexity is O(n) if all keys hash to the same bucket, but this is extremely rare with good hash functions.


# Dictionary (Hash Map)
my_dict = {'name': 'Alice', 'age': 30}
my_dict['city'] = 'New York' # Insert/Update - O(1)
print(f"Age: {my_dict['age']}") # Lookup - O(1)
del my_dict['age'] # Delete - O(1)
print(f"Dictionary: {my_dict}")

# Set
my_set = {1, 2, 3, 4}
my_set.add(5) # Insert - O(1)
my_set.remove(2) # Delete - O(1)
print(f"Is 3 in set? {3 in my_set}") # Lookup - O(1)
print(f"Set: {my_set}")

Trees

A tree is a non-linear hierarchical data structure that consists of nodes connected by edges. Each tree has a root node, and all other nodes are descendants of the root.

Tree Traversal Algorithms

  • In-order (Left, Root, Right): Visits nodes in ascending order for a BST.
  • Pre-order (Root, Left, Right): Useful for creating a copy of the tree.
  • Post-order (Left, Right, Root): Useful for deleting nodes.
  • Level-order (Breadth-First Search): Visits nodes level by level.

Binary Search Tree (BST)

A binary tree where for each node, all values in its left subtree are less than its own value, and all values in its right subtree are greater.


class TreeNode:
    def __init__(self, key):
        self.left = None
        self.right = None
        self.val = key

def insert(root, key):
    if root is None:
        return TreeNode(key)
    else:
        if root.val < key:
            root.right = insert(root.right, key)
        else:
            root.left = insert(root.left, key)
    return root

def inorder_traversal(root):
    if root:
        inorder_traversal(root.left)
        print(root.val, end=' ')
        inorder_traversal(root.right)

# Example
root = TreeNode(50)
root = insert(root, 30)
root = insert(root, 20)
root = insert(root, 40)
root = insert(root, 70)
root = insert(root, 60)
root = insert(root, 80)

print("In-order traversal of the BST:")
inorder_traversal(root) # Output: 20 30 40 50 60 70 80 

Balanced vs. Unbalanced Trees

A standard BST can become unbalanced (like a linked list) in the worst case, leading to O(n) for search/insert/delete operations. Self-balancing trees like AVL trees and Red-Black trees perform rotations to maintain a balanced height, ensuring O(log n) performance.

Heap

A heap is a specialized tree-based data structure that satisfies the heap property. In a Min-Heap, the parent node is always smaller than its children. In a Max-Heap, the parent is always larger.

Heaps are commonly used to implement Priority Queues. Python's `heapq` module provides a min-heap implementation.

Trie (Prefix Tree)

A Trie is a tree-like data structure that is efficient for retrieving keys in a dictionary-like dataset. It's commonly used for autocomplete features and spell checkers. Each node represents a character, and paths from the root to a node represent prefixes.

Graphs

A graph is a non-linear data structure consisting of a set of vertices (nodes) and a set of edges connecting these vertices.

Graph Representations

  1. Adjacency Matrix: A 2D array where `matrix[i][j] = 1` if there is an edge from vertex `i` to `j`. Space complexity is O(V2). Good for dense graphs.
  2. Adjacency List: An array of lists. `adj[i]` contains a list of all vertices adjacent to vertex `i`. Space complexity is O(V + E). Good for sparse graphs.

# Adjacency List Representation in Python using a dictionary
graph = {
    'A': ['B', 'C'],
    'B': ['D', 'E'],
    'C': ['F'],
    'D': [],
    'E': ['F'],
    'F': []
}

Graph Traversal Algorithms

  • Breadth-First Search (BFS): Explores neighbor nodes first, level by level. Uses a queue. Useful for finding the shortest path in an unweighted graph.
  • Depth-First Search (DFS): Explores as far as possible along each branch before backtracking. Uses a stack (or recursion). Useful for topological sorting and detecting cycles.

# BFS Implementation
from collections import deque

def bfs(graph, start_node):
    visited = set()
    queue = deque([start_node])
    visited.add(start_node)
    
    while queue:
        vertex = queue.popleft()
        print(vertex, end=" ")
        
        for neighbor in graph.get(vertex, []):
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)

print("BFS traversal:")
bfs(graph, 'A') # Example: A B C D E F
print("\n")

# DFS Implementation (Recursive)
def dfs(graph, start_node, visited=None):
    if visited is None:
        visited = set()
    
    visited.add(start_node)
    print(start_node, end=" ")
    
    for neighbor in graph.get(start_node, []):
        if neighbor not in visited:
            dfs(graph, neighbor, visited)

print("DFS traversal:")
dfs(graph, 'A') # Example: A B D E F C

Important Graph Algorithms

  • Dijkstra's Algorithm: Finds the shortest path from a source node to all other nodes in a weighted graph (with non-negative weights).
  • Bellman-Ford Algorithm: Finds the shortest path, and can handle negative edge weights.
  • Floyd-Warshall Algorithm: Finds the shortest paths between all pairs of nodes.
  • Prim's & Kruskal's Algorithms: Finds the Minimum Spanning Tree (MST) of a graph.
  • Topological Sort: Linear ordering of vertices in a Directed Acyclic Graph (DAG) such that for every directed edge from vertex `u` to `v`, `u` comes before `v` in the ordering.

Searching Algorithms

Linear Search

Iterates through a collection one element at a time. It can be used on any iterable, sorted or unsorted.

Time Complexity: O(n) | Space Complexity: O(1)

Binary Search

A highly efficient search algorithm that works on sorted collections. It repeatedly divides the search interval in half.

Time Complexity: O(log n) | Space Complexity: O(1) (iterative), O(log n) (recursive)


def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    
    while left <= right:
        mid = (left + right) // 2
        
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
            
    return -1 # Not found

sorted_arr = [10, 20, 30, 40, 50, 60]
index = binary_search(sorted_arr, 40)
print(f"Index of 40 is: {index}") # Output: 3

Sorting Algorithms

Sorting arranges items in a collection into a specific order.

Comparison of Sorting Algorithms

Algorithm Time Complexity (Best) Time Complexity (Avg) Time Complexity (Worst) Space Complexity Stable?
Bubble Sort O(n) O(n2) O(n2) O(1) Yes
Selection Sort O(n2) O(n2) O(n2) O(1) No
Insertion Sort O(n) O(n2) O(n2) O(1) Yes
Merge Sort O(n log n) O(n log n) O(n log n) O(n) Yes
Quick Sort O(n log n) O(n log n) O(n2) O(log n) No
Heap Sort O(n log n) O(n log n) O(n log n) O(1) No

Merge Sort Implementation


def merge_sort(arr):
    if len(arr) > 1:
        mid = len(arr) // 2
        left_half = arr[:mid]
        right_half = arr[mid:]

        merge_sort(left_half)
        merge_sort(right_half)

        i = j = k = 0
        # Merge the two halves
        while i < len(left_half) and j < len(right_half):
            if left_half[i] < right_half[j]:
                arr[k] = left_half[i]
                i += 1
            else:
                arr[k] = right_half[j]
                j += 1
            k += 1

        while i < len(left_half):
            arr[k] = left_half[i]
            i += 1
            k += 1

        while j < len(right_half):
            arr[k] = right_half[j]
            j += 1
            k += 1

my_list = [54, 26, 93, 17, 77, 31, 44, 55, 20]
merge_sort(my_list)
print(f"Sorted list: {my_list}")

Algorithm Paradigms

Divide and Conquer

This paradigm involves three steps:

  1. Divide: Break the problem into smaller subproblems.
  2. Conquer: Solve the subproblems recursively.
  3. Combine: Combine the solutions to solve the original problem.
Examples: Merge Sort, Quick Sort, Binary Search.

Greedy Algorithms

Makes the locally optimal choice at each step with the hope of finding a global optimum. It doesn't always find the best solution but is fast.
Example: Finding minimum number of coins for a given change amount (for standard coin denominations).

Dynamic Programming (DP)

Solves complex problems by breaking them down into simpler subproblems, solving each subproblem only once, and storing their solutions. It is used for problems with overlapping subproblems and optimal substructure.

  • Memoization (Top-Down): Recursive approach where you store the results of expensive function calls and return the cached result when the same inputs occur again.
  • Tabulation (Bottom-Up): Iterative approach where you solve subproblems starting from the smallest ones and build up to the desired solution.

# Fibonacci with Memoization
memo = {}
def fib_memo(n):
    if n in memo:
        return memo[n]
    if n <= 1:
        return n
    result = fib_memo(n - 1) + fib_memo(n - 2)
    memo[n] = result
    return result

print(f"Fibonacci (memoization): {fib_memo(10)}")

Backtracking

A general algorithmic technique for finding all (or some) solutions to computational problems, notably constraint satisfaction problems. It incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it determines that the candidate cannot possibly be completed to a valid solution.
Example: N-Queens problem, solving Sudoku.