Python Data Structures: Core Concepts

Mastering Python's built-in data structures for efficient programming

1. Lists and Tuples
2. Sets and Dictionaries
3. List Comprehensions
4. Generator Expressions
5. Performance & Optimization
6. Specialized Collections
7. Memory Management

1. Lists and Tuples

Lists (Mutable Sequences)

Basics

Lists are ordered, mutable collections that can hold heterogeneous data types.

# List creation
numbers = [1, 2, 3, 4]
mixed = [1, "two", 3.0, [4, 5]]

# Indexing and slicing
print(numbers[0])     # 1 (first element)
print(numbers[-1])    # 4 (last element)
print(numbers[1:3])   # [2, 3] (slice from index 1 to 2)

Common Methods

Method	Description	Example
`append(x)`	Add item to end	`numbers.append(5)`
`extend(iterable)`	Add all items from iterable	`numbers.extend([6,7])`
`insert(i, x)`	Insert item at position	`numbers.insert(0, 0)`
`remove(x)`	Remove first occurrence	`numbers.remove(3)`
`pop([i])`	Remove and return item	`last = numbers.pop()`

List Operations

# Concatenation
combined = [1, 2] + [3, 4]  # [1, 2, 3, 4]

# Repetition
repeated = [0] * 5  # [0, 0, 0, 0, 0]

# Membership testing
print(2 in numbers)  # True

Advanced Operations

Shallow vs Deep Copying

import copy

original = [[1, 2], [3, 4]]

# Shallow copy (only copies references)
shallow = original.copy()
shallow[0][0] = 99  # Affects original!

# Deep copy (copies all nested objects)
deep = copy.deepcopy(original)
deep[0][0] = 100  # Doesn't affect original

Memory Allocation

Python lists are dynamic arrays that overallocate memory for growth:

import sys

lst = []
for i in range(10):
    print(f"Length: {len(lst)}, Size: {sys.getsizeof(lst)} bytes")
    lst.append(i)

Tuples (Immutable Sequences)

Basics

Tuples are immutable, ordered sequences typically used for heterogeneous data.

# Tuple creation
point = (10, 20)
colors = ("red", "green", "blue")

# Single-element tuple (note trailing comma)
single = (42,)

# Common gotcha - not a tuple
not_a_tuple = (42)  # Just an integer

Immutability and Use Cases

Dictionary keys (must be hashable/immutable)
Function arguments and return values
Data integrity (can't be modified accidentally)
Generally more memory efficient than lists

Advanced Concepts

Named Tuples

from collections import namedtuple

# Create a class-like structure
Point = namedtuple("Point", ["x", "y"])
p = Point(10, y=20)
print(p.x, p.y)  # 10 20

Tuple Unpacking

# Basic unpacking
x, y = (10, 20)

# Extended unpacking (Python 3+)
first, *rest = (1, 2, 3, 4)  # first=1, rest=[2, 3, 4]

# Swapping variables
a, b = b, a

Performance Note: Tuples can be more memory efficient than lists for small collections, but the difference is negligible for most use cases.

2. Sets and Dictionaries

Sets (Unique, Unordered Collections)

Basics

Sets are mutable collections of unique, hashable elements with fast membership testing.

# Set creation
primes = {2, 3, 5, 7}
empty_set = set()  # {} creates empty dict!

# Basic operations
primes.add(11)      # Add element
primes.remove(3)    # Remove element (raises KeyError if missing)
primes.discard(99)  # Remove if present (no error)

# Set operations
odds = {1, 3, 5, 7, 9}
print(primes | odds)   # Union
print(primes & odds)   # Intersection
print(primes - odds)   # Difference
print(primes ^ odds)   # Symmetric difference

Frozensets

Immutable version of sets (can be used as dictionary keys):

immutable = frozenset([1, 2, 3])

Advanced Usage

Set Comprehensions

squares = {x**2 for x in range(10)}

Performance

Sets provide O(1) average case membership testing:

# Much faster than checking lists for membership
if 999999 in huge_set:  # Fast
    print("Found")

if 999999 in huge_list:  # Slow (O(n))
    print("Found")

Use Cases

Removing duplicates from a sequence: list(set(duplicates))
Membership testing (faster than lists)
Mathematical set operations
Finding unique elements

Dictionaries (Key-Value Stores)

Basics

Dictionaries are mutable mappings from unique keys to values.

# Dictionary creation
person = {"name": "Alice", "age": 25}
empty_dict = {}

# Accessing values
print(person["name"])      # Alice
print(person.get("age"))   # 25
print(person.get("job", "unemployed"))  # Default if missing

# Adding/modifying
person["job"] = "Engineer"
person["age"] = 26

# Membership testing
print("name" in person)  # True

Dictionary Methods

Method	Description	Example
`keys()`	View of dictionary keys	`person.keys()`
`values()`	View of dictionary values	`person.values()`
`items()`	View of (key, value) pairs	`person.items()`
`update()`	Merge with another dict	`person.update({"city": "NY"})`
`setdefault()`	Get value or set default	`person.setdefault("age", 30)`

Advanced Topics

Dictionary Comprehensions

squares = {x: x**2 for x in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Default Dictionaries

from collections import defaultdict

# Automatically initialize missing keys
word_counts = defaultdict(int)
for word in text.split():
    word_counts[word] += 1

Ordered Dictionaries

As of Python 3.7+, regular dictionaries preserve insertion order. For additional features:

from collections import OrderedDict

# Remember insertion order (useful before Python 3.7)
od = OrderedDict()
od["a"] = 1
od["b"] = 2
od.move_to_end("a")  # Move to end

Memory Optimization

Dictionaries consume more memory than lists. For memory-critical applications:

Use __slots__ in classes to prevent dynamic dictionary creation
Consider array.array or NumPy arrays for homogeneous numeric data
Use sys.getsizeof() to measure memory usage

3. List Comprehensions

Basic Syntax

Concise way to create lists by applying operations to iterables.

# Basic comprehension
squares = [x**2 for x in range(10)]

# With condition
even_squares = [x**2 for x in range(10) if x % 2 == 0]

# Equivalent to:
even_squares = []
for x in range(10):
    if x % 2 == 0:
        even_squares.append(x**2)

Advanced Patterns

Nested Comprehensions

# Flatten a 2D list
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [num for row in matrix for num in row]
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

Multiple Loops

# Cartesian product
colors = ["red", "green"]
sizes = ["S", "M", "L"]
tshirts = [(color, size) for color in colors for size in sizes]
# [('red', 'S'), ('red', 'M'), ('red', 'L'), 
#  ('green', 'S'), ('green', 'M'), ('green', 'L')]

Performance Considerations

List comprehensions are generally faster than equivalent loops:

Optimized at the C level in Python
More concise and often more readable
But can become unreadable if overly complex

Readability Tip: If a comprehension spans multiple lines or has complex logic, consider using a traditional loop for better readability.

4. Generator Expressions

Basics

Similar to list comprehensions but produce items lazily (one at a time).

# Generator expression
squares_gen = (x**2 for x in range(10))

# Consume generator
for num in squares_gen:
    print(num)

# Can only be consumed once
print(list(squares_gen))  # [] (already exhausted)

Memory Efficiency

Generators don't store all values in memory at once:

# Compare memory usage
import sys

list_comp = [x**2 for x in range(10000)]
gen_expr = (x**2 for x in range(10000))

print(sys.getsizeof(list_comp))  # Large (stores all values)
print(sys.getsizeof(gen_expr))   # Small (constant size)

Advanced Usage

Chaining Generators

def integers():
    """Infinite generator of integers"""
    i = 1
    while True:
        yield i
        i += 1

def squares():
    for i in integers():
        yield i * i

# Take first 5 squares
first_5 = (x for _, x in zip(range(5), squares()))

yield vs Generator Expressions

Generator functions (using yield) are more flexible but generator expressions are more concise:

# Generator function
def squares(n):
    for i in range(n):
        yield i ** 2

# Generator expression equivalent
squares = (i**2 for i in range(n))

Use Cases

Processing large files line by line
Memory-efficient pipelines
Infinite sequences
Anywhere you only need to process items once

5. Performance & Optimization

Time Complexity

Understanding Big-O notation helps choose the right data structure:

Operation	List	Set/Dict	Deque
Index Access	O(1)	N/A	O(1) (ends only)
Search	O(n)	O(1)	O(n)
Insert at End	O(1)*	N/A	O(1)
Insert at Start	O(n)	N/A	O(1)
Delete Item	O(n)	O(1)	O(n)

* Amortized O(1) due to occasional resizing

Choosing the Right Data Structure

Lists: When you need ordered, indexable data that changes frequently
Tuples: For immutable sequences, dictionary keys, or when memory matters
Sets: For membership testing and uniqueness guarantees
Dictionaries: For key-value associations and fast lookups
Deques: When you need fast appends/pops from both ends

6. Specialized Collections

collections.deque

Double-ended queue with fast O(1) appends and pops from both ends:

from collections import deque

d = deque([1, 2, 3])
d.appendleft(0)  # [0, 1, 2, 3]
d.pop()          # Removes 3
d.popleft()      # Removes 0

collections.Counter

Specialized dictionary for counting hashable objects:

from collections import Counter

words = ["apple", "banana", "apple", "orange", "banana", "apple"]
word_counts = Counter(words)
print(word_counts.most_common(2))  # [('apple', 3), ('banana', 2)]

heapq

Priority queue implementation using a binary heap:

import heapq

nums = [5, 7, 9, 1, 3]
heapq.heapify(nums)  # Convert to heap

print(heapq.heappop(nums))  # 1 (smallest element)
heapq.heappush(nums, 2)     # Add new element

7. Memory Management

Internal Storage

Python data structures have different memory layouts:

Lists: Dynamic array of pointers to objects
Tuples: Fixed-size array of pointers
Dictionaries: Hash tables with open addressing
Sets: Similar to dictionaries but without values

Memory Measurement

import sys

lst = list(range(1000))
tup = tuple(range(1000))

print(sys.getsizeof(lst))  # ~9024 bytes
print(sys.getsizeof(tup))  # ~8040 bytes

Memory Profiling

For detailed memory analysis:

# Install memory_profiler first: pip install memory-profiler
from memory_profiler import profile

@profile
def process_data():
    data = [i**2 for i in range(100000)]
    return sum(data)

process_data()

DSA | Basics of Python Developer

Python Data Structures: Core Concepts

Table of Contents

1. Lists and Tuples

Lists (Mutable Sequences)

Basics

Common Methods

List Operations

Advanced Operations

Shallow vs Deep Copying

Memory Allocation

Tuples (Immutable Sequences)

Basics

Immutability and Use Cases

Advanced Concepts

Named Tuples

Tuple Unpacking

2. Sets and Dictionaries

Sets (Unique, Unordered Collections)

Basics

Frozensets

Advanced Usage

Set Comprehensions

Performance

Use Cases

Dictionaries (Key-Value Stores)

Basics

Dictionary Methods

Advanced Topics

Dictionary Comprehensions

Default Dictionaries

Ordered Dictionaries

Memory Optimization

3. List Comprehensions

Basic Syntax

Advanced Patterns

Nested Comprehensions

Multiple Loops

Performance Considerations

4. Generator Expressions

Basics

Memory Efficiency

Advanced Usage

Chaining Generators

yield vs Generator Expressions

Use Cases

5. Performance & Optimization

Time Complexity

Choosing the Right Data Structure

6. Specialized Collections

collections.deque

collections.Counter

heapq

7. Memory Management

Internal Storage

Memory Measurement

Memory Profiling

Post a Comment