left-icon

Essential Python Succinctly®
by Ed Freitas

Previous
Chapter

of
A
A
A

CHAPTER 8

Data Structures

Data Structures


Quick intro

In Python, efficient data handling is essential for performance and scalability, especially when working with large datasets or complex data processing tasks.

This is where Python’s advanced data structures come into play, particularly those available in the collections module, such as deque, defaultdict, and namedtuple.

Structures like sets and heaps are also invaluable for fast lookups and sorting data. Understanding and using these data structures will help you write optimized, maintainable code.

Note: You can run the following code examples by opening the built-in terminal within VS Code and executing the command: python <script>.py. Replace <script> with the name of the respective Python file to execute.

Advanced collections

The collections module in Python offers several valuable data structures for managing data efficiently. Here’s a look at three important ones: deque, defaultdict, and namedtuple.

Fast insertions and deletions

A double-ended queue, also known as a deque, is a data structure optimized for fast insertions and deletions from both ends.

Unlike lists optimized for random access, a deque is ideal when you frequently need to add or remove elements at the beginning or end. Let’s have a look at an example.

Code Listing 8-a: deque.py

from collections import deque

# Initialize a deque with some elements

dq = deque([1, 2, 3, 4, 5])

# Append elements to the right and left ends

dq.append(6)

dq.appendleft(0)

# Pop elements from the right and left ends

dq.pop()

dq.popleft()

print("Final deque state:", dq)

Explanation:

·     Initialize a deque: deque([1, 2, 3, 4, 5]) creates a deque with initial elements.

·     Appending elements: dq.append(6) adds 6 to the right end, while dq.appendleft(0) adds 0 to the left.

·     Popping elements: dq.pop() removes the rightmost element, and dq.popleft() removes the leftmost element.

·     Efficiency: Unlike lists, these operations are optimized in deque, making it a good choice when you need fast queue or stack operations.

Simplified dictionary management

A defaultdict is like a regular dictionary but provides a default value for a nonexistent key, which can prevent common errors and reduce code complexity. Let’s look at the code.

Code Listing 8-b: defaultdict.py

from collections import defaultdict

# Initialize a defaultdict with default type of int (for counting)

word_count = defaultdict(int)

# Count occurrences of each word in a list

words = ["apple", "banana", "apple", "orange", "banana", "apple"]

for word in words:

    word_count[word] += 1

print("Word counts:", word_count)

Explanation:

·     Initialize a defaultdict: defaultdict(int) creates a dictionary where each missing key automatically has a default value of 0 (since int() returns 0).

·     Counting elements: Each time a word appears in the list words, word_count[word] += 1 increments its count.

·     Efficiency: Using defaultdict prevents checking if a key exists, making code cleaner and less error-prone.

Lightweight object-like data structures

In Python, a namedtuple allows you to create a lightweight, immutable object type with named fields, providing a clean and memory-efficient alternative to classes when you need simple data storage. Let’s look at some code.

Code Listing 8-c: namedtuple.py

from collections import namedtuple

# Define a namedtuple type for a point in 2D space

Point = namedtuple("Point", ["x", "y"])

# Create instances of Point

p1 = Point(3, 4)

p2 = Point(5, 9)

# Access the fields

print("Point 1:", p1)

print("Point 1 x coordinate:", p1.x)

print("Point 2 y coordinate:", p2.y)

Explanation:

·     Defining a namedtuple type: Point = namedtuple("Point", ["x", "y"]) defines a named tuple type called Point with fields x and y.

·     Creating instances: Point(3, 4) and Point(5, 9) create immutable Point objects, which you can access by attribute name.

·     Efficiency: Named tuples are more memory-efficient than dictionaries and classes, making them ideal for simple, structured data storage.

Sets for fast membership testing

Beyond the collections module, other data structures like sets and heaps provide efficient ways to handle extensive data.

Sets store unique elements and allow fast membership testing, making them useful for tasks like deduplication and filtering. Let’s have a look.

Code Listing 8-d: sets.py

# Define two sets of numbers

set_a = {1, 2, 3, 4, 5}

set_b = {4, 5, 6, 7, 8}

# Perform set operations

union = set_a | set_b  # Union of sets

intersection = set_a & set_b  # Intersection of sets

difference = set_a - set_b  # Difference of sets

print("Union:", union)

print("Intersection:", intersection)

print("Difference:", difference)

Explanation:

·     Creating sets: {1, 2, 3, 4, 5} creates a set with unique elements.

·     Set operations: set_a | set_b finds the union, set_a & set_b finds the intersection, and set_a - set_b finds elements in set_a not in set_b.

·     Efficiency: Sets provide constant time complexity for membership tests (in), making them efficient for filtering and deduplication.

Heaps for efficient data sorting

A heap is a binary tree-based data structure that maintains a sorted list where the smallest or largest element can be accessed immediately. Python’s heapq module provides tools for working with heaps. Let’s look at the following example.

Code Listing 8-e: heaps.py

import heapq

# Create a list of unsorted numbers

numbers = [20, 5, 15, 10, 30]

# Convert the list to a heap

heapq.heapify(numbers)

print("Heapified list:", numbers)

# Add a new number to the heap

heapq.heappush(numbers, 7)

print("Heap after adding 7:", numbers)

# Pop the smallest element

smallest = heapq.heappop(numbers)

print("Smallest element:", smallest)

print("Heap after popping smallest element:", numbers)

Explanation:

·     Heapify: heapq.heapify(numbers) converts the list numbers into a heap, arranging elements so that the smallest element is at the root.

·     Adding to the heap: heapq.heappush(numbers, 7) adds seven while maintaining heap order, ensuring the smallest element is immediately accessible.

·     Removing the smallest element: heapq.heappop(numbers) removes the smallest element, reordering the heap to keep it in sorted order.

Recap

Understanding these data structures can significantly improve your code’s efficiency. You can handle complex data organization effectively with deque, defaultdict, and namedtuple.

Sets allow fast filtering, while heaps provide efficient ways to sort and access data. Mastering these structures empowers you to handle more advanced and large-scale data tasks in Python.

Scroll To Top
Disclaimer
DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.

Previous

Next



You are one step away from downloading ebooks from the Succinctly® series premier collection!
A confirmation has been sent to your email address. Please check and confirm your email subscription to complete the download.