Python Data Structures: Core Concepts
Mastering Python's built-in data structures for efficient programming
Table of Contents
1. Lists and Tuples
Lists (Mutable Sequences)
Basics
Lists are ordered, mutable collections that can hold heterogeneous data types.
# List creation
numbers = [1, 2, 3, 4]
mixed = [1, "two", 3.0, [4, 5]]
# Indexing and slicing
print(numbers[0]) # 1 (first element)
print(numbers[-1]) # 4 (last element)
print(numbers[1:3]) # [2, 3] (slice from index 1 to 2)
Common Methods
Method | Description | Example |
---|---|---|
append(x) |
Add item to end | numbers.append(5) |
extend(iterable) |
Add all items from iterable | numbers.extend([6,7]) |
insert(i, x) |
Insert item at position | numbers.insert(0, 0) |
remove(x) |
Remove first occurrence | numbers.remove(3) |
pop([i]) |
Remove and return item | last = numbers.pop() |
List Operations
# Concatenation
combined = [1, 2] + [3, 4] # [1, 2, 3, 4]
# Repetition
repeated = [0] * 5 # [0, 0, 0, 0, 0]
# Membership testing
print(2 in numbers) # True
Advanced Operations
Shallow vs Deep Copying
import copy
original = [[1, 2], [3, 4]]
# Shallow copy (only copies references)
shallow = original.copy()
shallow[0][0] = 99 # Affects original!
# Deep copy (copies all nested objects)
deep = copy.deepcopy(original)
deep[0][0] = 100 # Doesn't affect original
Memory Allocation
Python lists are dynamic arrays that overallocate memory for growth:
import sys
lst = []
for i in range(10):
print(f"Length: {len(lst)}, Size: {sys.getsizeof(lst)} bytes")
lst.append(i)
Tuples (Immutable Sequences)
Basics
Tuples are immutable, ordered sequences typically used for heterogeneous data.
# Tuple creation
point = (10, 20)
colors = ("red", "green", "blue")
# Single-element tuple (note trailing comma)
single = (42,)
# Common gotcha - not a tuple
not_a_tuple = (42) # Just an integer
Immutability and Use Cases
- Dictionary keys (must be hashable/immutable)
- Function arguments and return values
- Data integrity (can't be modified accidentally)
- Generally more memory efficient than lists
Advanced Concepts
Named Tuples
from collections import namedtuple
# Create a class-like structure
Point = namedtuple("Point", ["x", "y"])
p = Point(10, y=20)
print(p.x, p.y) # 10 20
Tuple Unpacking
# Basic unpacking
x, y = (10, 20)
# Extended unpacking (Python 3+)
first, *rest = (1, 2, 3, 4) # first=1, rest=[2, 3, 4]
# Swapping variables
a, b = b, a
Performance Note: Tuples can be more memory efficient than lists for small collections, but the difference is negligible for most use cases.
2. Sets and Dictionaries
Sets (Unique, Unordered Collections)
Basics
Sets are mutable collections of unique, hashable elements with fast membership testing.
# Set creation
primes = {2, 3, 5, 7}
empty_set = set() # {} creates empty dict!
# Basic operations
primes.add(11) # Add element
primes.remove(3) # Remove element (raises KeyError if missing)
primes.discard(99) # Remove if present (no error)
# Set operations
odds = {1, 3, 5, 7, 9}
print(primes | odds) # Union
print(primes & odds) # Intersection
print(primes - odds) # Difference
print(primes ^ odds) # Symmetric difference
Frozensets
Immutable version of sets (can be used as dictionary keys):
immutable = frozenset([1, 2, 3])
Advanced Usage
Set Comprehensions
squares = {x**2 for x in range(10)}
Performance
Sets provide O(1) average case membership testing:
# Much faster than checking lists for membership
if 999999 in huge_set: # Fast
print("Found")
if 999999 in huge_list: # Slow (O(n))
print("Found")
Use Cases
- Removing duplicates from a sequence:
list(set(duplicates))
- Membership testing (faster than lists)
- Mathematical set operations
- Finding unique elements
Dictionaries (Key-Value Stores)
Basics
Dictionaries are mutable mappings from unique keys to values.
# Dictionary creation
person = {"name": "Alice", "age": 25}
empty_dict = {}
# Accessing values
print(person["name"]) # Alice
print(person.get("age")) # 25
print(person.get("job", "unemployed")) # Default if missing
# Adding/modifying
person["job"] = "Engineer"
person["age"] = 26
# Membership testing
print("name" in person) # True
Dictionary Methods
Method | Description | Example |
---|---|---|
keys() |
View of dictionary keys | person.keys() |
values() |
View of dictionary values | person.values() |
items() |
View of (key, value) pairs | person.items() |
update() |
Merge with another dict | person.update({"city": "NY"}) |
setdefault() |
Get value or set default | person.setdefault("age", 30) |
Advanced Topics
Dictionary Comprehensions
squares = {x: x**2 for x in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
Default Dictionaries
from collections import defaultdict
# Automatically initialize missing keys
word_counts = defaultdict(int)
for word in text.split():
word_counts[word] += 1
Ordered Dictionaries
As of Python 3.7+, regular dictionaries preserve insertion order. For additional features:
from collections import OrderedDict
# Remember insertion order (useful before Python 3.7)
od = OrderedDict()
od["a"] = 1
od["b"] = 2
od.move_to_end("a") # Move to end
Memory Optimization
Dictionaries consume more memory than lists. For memory-critical applications:
- Use
__slots__
in classes to prevent dynamic dictionary creation - Consider
array.array
or NumPy arrays for homogeneous numeric data - Use
sys.getsizeof()
to measure memory usage
3. List Comprehensions
Basic Syntax
Concise way to create lists by applying operations to iterables.
# Basic comprehension
squares = [x**2 for x in range(10)]
# With condition
even_squares = [x**2 for x in range(10) if x % 2 == 0]
# Equivalent to:
even_squares = []
for x in range(10):
if x % 2 == 0:
even_squares.append(x**2)
Advanced Patterns
Nested Comprehensions
# Flatten a 2D list
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [num for row in matrix for num in row]
# [1, 2, 3, 4, 5, 6, 7, 8, 9]
Multiple Loops
# Cartesian product
colors = ["red", "green"]
sizes = ["S", "M", "L"]
tshirts = [(color, size) for color in colors for size in sizes]
# [('red', 'S'), ('red', 'M'), ('red', 'L'),
# ('green', 'S'), ('green', 'M'), ('green', 'L')]
Performance Considerations
List comprehensions are generally faster than equivalent loops:
- Optimized at the C level in Python
- More concise and often more readable
- But can become unreadable if overly complex
Readability Tip: If a comprehension spans multiple lines or has complex logic, consider using a traditional loop for better readability.
4. Generator Expressions
Basics
Similar to list comprehensions but produce items lazily (one at a time).
# Generator expression
squares_gen = (x**2 for x in range(10))
# Consume generator
for num in squares_gen:
print(num)
# Can only be consumed once
print(list(squares_gen)) # [] (already exhausted)
Memory Efficiency
Generators don't store all values in memory at once:
# Compare memory usage
import sys
list_comp = [x**2 for x in range(10000)]
gen_expr = (x**2 for x in range(10000))
print(sys.getsizeof(list_comp)) # Large (stores all values)
print(sys.getsizeof(gen_expr)) # Small (constant size)
Advanced Usage
Chaining Generators
def integers():
"""Infinite generator of integers"""
i = 1
while True:
yield i
i += 1
def squares():
for i in integers():
yield i * i
# Take first 5 squares
first_5 = (x for _, x in zip(range(5), squares()))
yield vs Generator Expressions
Generator functions (using yield
) are more flexible but generator expressions are more concise:
# Generator function
def squares(n):
for i in range(n):
yield i ** 2
# Generator expression equivalent
squares = (i**2 for i in range(n))
Use Cases
- Processing large files line by line
- Memory-efficient pipelines
- Infinite sequences
- Anywhere you only need to process items once
5. Performance & Optimization
Time Complexity
Understanding Big-O notation helps choose the right data structure:
Operation | List | Set/Dict | Deque |
---|---|---|---|
Index Access | O(1) | N/A | O(1) (ends only) |
Search | O(n) | O(1) | O(n) |
Insert at End | O(1)* | N/A | O(1) |
Insert at Start | O(n) | N/A | O(1) |
Delete Item | O(n) | O(1) | O(n) |
* Amortized O(1) due to occasional resizing
Choosing the Right Data Structure
- Lists: When you need ordered, indexable data that changes frequently
- Tuples: For immutable sequences, dictionary keys, or when memory matters
- Sets: For membership testing and uniqueness guarantees
- Dictionaries: For key-value associations and fast lookups
- Deques: When you need fast appends/pops from both ends
6. Specialized Collections
collections.deque
Double-ended queue with fast O(1) appends and pops from both ends:
from collections import deque
d = deque([1, 2, 3])
d.appendleft(0) # [0, 1, 2, 3]
d.pop() # Removes 3
d.popleft() # Removes 0
collections.Counter
Specialized dictionary for counting hashable objects:
from collections import Counter
words = ["apple", "banana", "apple", "orange", "banana", "apple"]
word_counts = Counter(words)
print(word_counts.most_common(2)) # [('apple', 3), ('banana', 2)]
heapq
Priority queue implementation using a binary heap:
import heapq
nums = [5, 7, 9, 1, 3]
heapq.heapify(nums) # Convert to heap
print(heapq.heappop(nums)) # 1 (smallest element)
heapq.heappush(nums, 2) # Add new element
7. Memory Management
Internal Storage
Python data structures have different memory layouts:
- Lists: Dynamic array of pointers to objects
- Tuples: Fixed-size array of pointers
- Dictionaries: Hash tables with open addressing
- Sets: Similar to dictionaries but without values
Memory Measurement
import sys
lst = list(range(1000))
tup = tuple(range(1000))
print(sys.getsizeof(lst)) # ~9024 bytes
print(sys.getsizeof(tup)) # ~8040 bytes
Memory Profiling
For detailed memory analysis:
# Install memory_profiler first: pip install memory-profiler
from memory_profiler import profile
@profile
def process_data():
data = [i**2 for i in range(100000)]
return sum(data)
process_data()