logo

Tutorial @ PyCon USA 2026. May 14, 2026. Long Beach, CA, USA.

The source code repository for this tutorial is available on GitHub at:

1. Introduction

This tutorial peels back the layers of modern image processing by teaching you how to manipulate digital media through direct interaction with individual pixels. The Pillow library will be used as a lightweight bridge for file handling, allowing you to bypass automated methods and implement your own algorithms from scratch by treating images as raw numerical data. The session covers practical applications such as converting color to grayscale, mapping pixel coordinates for image scaling, and creating multi-panel pop art through programmatic color re-mapping. You will also explore the world of steganography by learning how to hide secret messages within pixel data so they remain invisible to the naked eye. Designed for early-intermediate Pythonistas, this tutorial focuses on building low-level data manipulation skills to help you become a more versatile programmer and problem solver.

2. Digital Color Foundations

2.1. The Pixel

A pixel (short for “picture element”) is the smallest unit of a digital image. It is a single point in a grid that stores specific color or intensity information. When thousands of these points are arranged in a matrix, they form the images we see.

2.2. The RGB Color Model

Color models are mathematical systems that represent colors as sets of numbers. This tutorial focuses on the RGB color model, an additive system used by light-emitting devices like monitors and phones. It creates various hues (the specific shade or color, such as scarlet or turquoise) by starting with black and adding red, green, and blue light together.

In digital imaging, each of these three channels is typically represented by an 8-bit integer, ranging from 0 (no intensity) to 255 (full intensity).

  • Color Depth: With 8 bits per channel, we have \(256 \times 256 \times 256 \approx 16.7\) million possible colors.

  • Neutral Colors: Whenever the three values are equal \((R=G=B)\), the color is a shade of gray.

2.3. Common RGB Colors

The following table shows how primary and secondary colors are constructed using the additive RGB model, providing the specific 8-bit integer tuples.

Table 1. RGB Tuples for Common Colors
Color RGB Tuple (R, G, B) Note

Black

(0, 0, 0)

No light added

White

(255, 255, 255)

All light at max intensity

Red

(255, 0, 0)

Primary light

Green

(0, 255, 0)

Primary light

Blue

(0, 0, 255)

Primary light

Cyan

(0, 255, 255)

Green + Blue

Magenta

(255, 0, 255)

Red + Blue

Yellow

(255, 255, 0)

Red + Green

The HTML Color Codes website provides an interactive suite of tools, including color pickers, charts, and libraries, to help designers and developers easily find, convert, and generate RGB color codes.

3. The Pillow Library

3.1. Overview

Pillow serves as a modern, actively maintained fork of the original Python Imaging Library (PIL), which was first released by Fredrik Lundh in 1995. Created in 2010 by Jeffrey “Alex” Clark and contributors, Pillow acts as a powerful and user-friendly successor, bringing the core capabilities of PIL into the current Python ecosystem.

As the industry standard for image processing, it provides extensive file format support for everything from JPEGs and PNGs to more specialized types. While it includes many built-in filters and transformations, its real strength for this tutorial lies in its ability to open a file and hand off the pixel information as a manageable object. Because it is built on an optimized C backend, it provides the necessary performance to load and save data efficiently, allowing you to focus your energy on the logic of your custom algorithms.

Pillow provides full read and write support for a wide range of formats including BMP, GIF, ICO, JPEG, PNG, PPM, TIFF, and WebP.

3.2. Installation

For most users, the standard package manager pip is the way to go. Open your terminal or command prompt and run:

pip install Pillow

Sometimes the pip command doesn’t work. Use this specific command if the one above fails:

python -m pip install Pillow
If the standard commands fail, try using pip3 or python3. Some environments, particularly on macOS and Linux, use the 3 suffix to distinguish the modern interpreter from legacy system versions and prevent version conflicts.

Run the following command at the terminal to verify that Pillow was successfully installed:

python -c "import PIL; print('Pillow installed successfully')"

If everything is OK, you should see the message Pillow installed successfully printed in your terminal.

3.3. The Anatomy of a Digital Image

This section explores the fundamental structure of a digital image, detailing how Pillow organizes pixel data to represent visual information in a computer’s memory.

  • The Image Object: This is the primary container in Pillow. An Image object is more than just the visual data; it contains the metadata (dimensions, format, and resolution) required to interpret the pixels.

  • Modes: The mode defines the type and depth of a pixel in the image. For this tutorial we will use Modes 1, L, and RGB (see Bands below).

  • Bands: A band or channel represents a single color component of an image.

    • A single-bit image has only one band, representing black and white pixels. In Pillow, this is referred to as Mode 1, where each pixel is stored as a single bit rather than a full byte.

    • A grayscale image also has only one band, which represents the brightness (luminance) of each pixel. In Pillow, this is called Mode L, where a single 8-bit value determines the shade of gray from black (0) to white (255).

    • An RGB image has three bands: red, green, and blue. Each band uses 8 bits per pixel to represent the intensity of that specific color. In Pillow, this is known as Mode RGB, the standard for full-color digital images.

  • The Coordinate System: Pillow uses a Cartesian coordinate system where \((0, 0)\) is the top-left corner. The x‑coordinate (column) increases from left to right. The y‑coordinate (row) increases from top to bottom. This distinction is vital when moving between a 2D grid logic and the 1D “stream” logic.

  • Lossy vs. Lossless Compression: When saving an image to a file, the computer uses compression to reduce its size. The method chosen dictates whether the original pixel data is preserved perfectly or not.

    • Lossless Compression ensures that every single pixel is preserved exactly as it is. When you open a lossless file, the image is mathematically identical to the original.

      • Examples: PNG (standard for web graphics) and TIFF (standard for high-quality printing).

      • Best for: Images with sharp edges, text, or when you plan to perform multiple rounds of editing.

    • Lossy Compression achieves much smaller file sizes by permanently discarding “unnecessary” visual information that the human eye is less likely to notice. Each time you save a lossy image, a small amount of quality is lost.

      • Example: JPEG (the standard for digital photography).

      • Best for: Photos and complex scenes where a slight loss in detail is an acceptable trade-off for a significantly smaller file.

3.4. API Reference

Table 2 summarizes the key components of the Pillow library that we will use throughout this tutorial, providing a concise API reference for the factory methods, attributes, and instance methods required for our image manipulation projects.

Ensure that the following import statement is included at the beginning of your code to provide access to the necessary functions and classes:

from PIL import Image
Table 2. Key Classes, Methods, and Attributes
Name Description

\(\texttt{Image}\)

The Image class is the central component of the Pillow library, acting as an object-oriented container that encapsulates a digital image’s pixel data along with its metadata, such as size, format, and color mode. However, one should be careful not to confuse the Image class (the blueprint for a specific image object) with the Image module (the namespace containing factory functions like open and new), as the latter is used to create or manipulate instances of the former.

\(\texttt{Image.open(} \textit{filename} \texttt{)}\)

This function is a factory method used to load an image from a file by accepting its \(\textit{filename}\) as a string. It automatically identifies the format and creates a new Image class instance in a “lazy” state that keeps the file open until the pixel data is actually accessed.

\(\texttt{Image.new(} \textit{mode} \texttt{,} \textit{size} \texttt{)}\)

This function is a factory method used to create a blank image from scratch by specifying its \(\textit{mode}\) (such as "1", "L", or "RGB") and its \(\textit{size}\) as a \((\textit{width}, \textit{height})\) tuple. This is particularly useful for generating canvas backgrounds or creating a target image.

\(\textit{img} \texttt{.size}\)

This attribute is a read-only property that returns a tuple containing the width and height of \(\textit{img}\) in pixels. It is often used to set up the limits for loops when iterating through an image to perform pixel-by-pixel transformations.

\(\textit{img} \texttt{.convert(} \textit{mode} \texttt{)}\)

This method is used to create a new copy of \(\textit{img}\) by transforming it into a different pixel format. It accepts a string representing the desired \(\textit{mode}\) (such as "1", "L", or "RGB") and is recommended as a safety measure to ensure the image is in a consistent, predictable format before you begin processing its pixels.

\(\textit{img} \texttt{.get_flattened_data()}\)

This method returns the pixel values of \(\textit{img}\) as a flattened, sequence-like object, providing the entire image’s contents in a single continuous stream. This is useful for processing all pixels in bulk.

\(\textit{img} \texttt{.putdata(} \textit{sequence} \texttt{)}\)

This method is used to copy pixel data from \(\textit{sequence}\) into an existing \(\textit{img}\). It populates the image starting from the upper-left corner and continuing row by row, making it an efficient choice for applying bulk changes to an entire image after processing its data as a single stream.

\(\textit{img} \texttt{.load()}\)

This method is used to allocate storage for the image \(\textit{img}\) and load its pixel data into memory, returning a pixel access object that allows for high-speed reading and modification of individual pixels via coordinate indices. While Pillow often loads data lazily, calling this method explicitly ensures that the file is read and the image remains available in memory even after the original file handle is closed.

The returned pixel access object can be used like a two-dimensional array. You can read a pixel value using the syntax px[x, y] and modify it by assigning a new value, such as px[x, y] = (r, g, b).

\(\textit{img} \texttt{.save(} \textit{filename} \texttt{)}\)

This method is used to write an image to a file by accepting its \(\textit{filename}\) as a string. It automatically determines the storage format based on the file extension and persists the image data to permanent storage, effectively concluding the manipulation process.

4. Exercises

You can find all the necessary image files for the tutorial’s exercises in this ZIP archive:

4.1. Negative

In digital image processing, a negative transformation is a point-processing operation that inverts the intensity values of an image. It is often referred to as a “1D” or linear operation because the transformation is applied to each pixel independently of its neighbors, treating the entire image as a flat, sequential stream of data.

For an 8-bit color channel, the transformation maps an input intensity \(s\) to an output intensity \(s'\) using the linear equation:

\[s' = 255 - s\]

The constant 255 is the maximum intensity for a standard 8-bit color channel. Since an 8-bit channel allocates 1 byte of memory, it can represent \(2^8 = 256\) distinct values, ranging from 0 to 255. Subtracting the input from 255 effectively reverses its position on this scale.

This process flips the brightness scale: light areas become dark, dark areas become light, and vibrant colors are swapped with their complementary hues. The following Python script demonstrates how to implement this transformation by flattening an image into a 1D sequence of RGB pixels, manipulating their values, and rebuilding the final image.

Write a complete Python script that processes an input image file and outputs its corresponding negative transformation image file.

For example, given the following input image:

woman
Figure 1. woman.png

The resulting negative transformation should look like this:

negative woman
Figure 2. negative_woman.png

4.2. Horizontal Mirroring

A horizontal flip transformation is a spatial operation that mirrors an image from left to right along its vertical centerline. Unlike point-processing operations, it alters the geometric arrangement of the data rather than its color values. It is often referred to as a “2D” operation because the transformation relies on the exact coordinate grid position of each individual pixel.

For an image with a width \(W\), the transformation maps an input horizontal coordinate \(x\) to an output horizontal coordinate \(x'\) while keeping the vertical coordinate \(y\) completely unchanged, using the linear coordinate equation:

\[x' = W - 1 - x\]

Because digital images utilize 0-based indexing, the valid pixel columns across the horizontal plane range from 0 to \(W - 1\). Subtracting the current index from the maximum index (\(W - 1\)) effectively reverses the pixel order across each individual row.

This process creates a perfect mirror reflection: elements on the left side of the frame swap places with elements on the right, causing text to appear backward.

Write a complete Python script that processes an input image file and outputs its corresponding horizontal flip transformation image file.

For example, given the following input image:

puppy
Figure 3. puppy.png

The resulting horizontal flip transformation should look like this:

mirror puppy
Figure 4. mirror_puppy.png

4.3. Shades of Gray

A grayscale conversion is a point-processing operation that strips color information from an image, transforming it into a monochrome representation consisting entirely of shades of gray. This is achieved by mapping the three distinct color channels — Red \((R)\), Green \((G)\), and Blue \((B)\) — of each pixel into a single intensity value \(Y\) that is then replicated across all three channels.

Depending on how human visual perception is modeled, this transformation can be approached in two distinct ways:

  • RGB Arithmetic Average

    The simplest mathematical approach computes the unweighted arithmetic mean of the three color components. It treats every color channel as having equal structural importance:

    \[Y = \lfloor \frac{R + G + B}{3} \rfloor\]
  • Luma-Weighted Average

    Because the human eye contains different concentrations of color-sensitive cone cells, we do not perceive all colors as equally bright. Humans are highly sensitive to green light, moderately sensitive to red light, and significantly less sensitive to blue light. To create a grayscale image that accurately matches human visual perception of brightness (luminance), standard video and imaging protocols (such as ITU-R BT.601) utilize a weighted formula:

    \[Y = \texttt{int}(0.299R + 0.587G + 0.114B)\]

    The weights sum to exactly 1.0, ensuring that the resulting intensity \(Y\) remains safely within the standard 8-bit color channel range of 0 to 255. This process eliminates chromaticity while preserving the structural luminance of the scene: vibrant primary colors are converted into their corresponding light or dark gray equivalents.

Implement a Python program that processes an input image file and outputs two separate grayscale image files: one using the arithmetic average approach and the other using the Luma-Weighted average approach.

When generating a grayscale image, utilize mode "L" to achieve a more efficient, smaller file size, noting that its output pixel data must be a flat list of single integer intensity values rather than three-element RGB tuples.

For example, given the following input image:

snake
Figure 5. snake.png

The resulting arithmetic average transformation should look like this:

grayscale average snake
Figure 6. grayscale_average_snake.png

The resulting Luma-Weighted average transformation should look like this:

grayscale luma snake
Figure 7. grayscale_luma_snake.png

4.4. Steganography

Steganography is the art and science of hiding secret messages, files, or data within another, non-secret file (like images, audio, or video) to avoid detection. One of the most common techniques for hiding digital data in raw pixels is Least Significant Bit (LSB) insertion.

In this exercise, you will step into the shoes of a digital forensic analyst. It turns out that the snake.png asset you have been working with all along has been harboring a secret. The image appears completely normal on the surface, but a subtle anomaly has been hiding deep within its raw color channels from the very beginning.

In an 8-bit color channel (such as the Green channel of an RGB pixel), values range from 0 to 255. The binary representation of these numbers looks like this:

  • 0 = 00000000 (Even)

  • 1 = 00000001 (Odd)

  • 2 = 00000010 (Even)

  • 3 = 00000011 (Odd)

  • …​

  • 254 = 11111110 (Even)

  • 255 = 11111111 (Odd)

Notice that altering only the absolute rightmost bit — the least significant bit — changes the numerical intensity value by at most 1 unit. To the human eye, this is completely imperceptible. This makes the LSB channel the perfect hiding spot for encoding a secret binary matrix.

To extract the 1-bit binary image hidden inside a color host, your code needs to isolate the LSB of every single targeted pixel. Mathematically, this can be achieved by using the modulus operation (%) with a divisor of 2, which checks if the intensity value is even or odd.

Write a Python script that processes our original RGB asset, extracts the hidden 1-bit data layer from the Green channel of every pixel, and reconstructs the hidden matrix into a distinct single-bit image (Mode 1) output file.

Your program should follow these steps:

  1. Load the familiar host image file (snake.png) and explicitly convert it to an RGB color space to ensure a reliable three-tuple stream.

  2. Retrieve the flat pixel sequence. Iterate through each \((R, G, B)\) tuple, apply the modulus operation with a divisor of 2 to isolate the LSB of the Green channel, and store the binary result (0 or 1) in a list.

  3. Create a brand new, empty canvas in single-bit image mode ("1") matching the precise structural dimensions of the original image, populate it with your extracted bit list, and save the output file.

If you successfully extract the data layer, the output file will be an image with a classic piece of Python history that was tucked away in your source file all along.

4.5. Posterization

A posterization transformation is a point-processing operation that reduces the continuous gradation of color tones in an image to a small, distinct selection of flat colors. Historically, this term comes from the traditional printing process used to create mass-media posters, where limited ink palettes forced smooth photographic gradients to be split into sharp, solid regions of contrast.

Because this transformation alters each pixel independently based solely on its own data—completely ignoring the spatial layout or neighboring pixel values—it is treated as a flat, sequential 1D operation stream.

To achieve this effect, the continuous intensity spectrum of an input pixel must be categorized into discrete intervals. A common approach is to first reduce the three-dimensional color data down to a single representative grayscale brightness value \(Y\) using the arithmetic average:

\[Y = \lfloor \frac{R + G + B}{3} \rfloor\]
Alternatively, you can extract a single channel like green, pick the max or midrange of the components, or apply a human-centric Luma-Weighted average as explained in a previous exercise.

Once the overall brightness \(Y\) is computed, a conditional control structure categorizes the pixel value and maps it into one of three specific artistic palette colors.

To help you visualize the implementation, the following configuration serves as a structural example mapping the intensity scale into distinct shadow, midtone, and highlight zones:

Brightness Range (Example) Visual Tone Class Output Tuple Color
(Example)

\(Y < 50\)

Dark Shadows

Brandy: (120, 41, 15)

\(50 \le Y < 130\)

Midtones

Harvest Orange: (255, 125, 0)

\(Y \ge 130\)

Highlights

Papaya Whip: (255, 236, 209)

This process eliminates the fine subtleties of shadows and highlights, transforming a standard photographic image into a stark, stylized vector-like graphic piece.

You are not restricted to a three-tier palette; you can expand your conditional structure to accommodate four, five, or as many distinct colors as your artistic vision requires.

Code a Python program that processes an input image file and outputs the posterized transformation image file. Feel free to define your own boundary conditions and select any three colors that match your aesthetic goals.

For example, given the following input image:

woman
Figure 8. woman.png

The resulting posterized transformation should look like this:

poster woman
Figure 9. poster_woman.png

For inspiration on choosing your custom colors, visit the Coolors web site to browse thousands of beautifully curated, trending color palettes that you can adapt directly into your program’s logic. Because the colors on this web site are represented using hexadecimal values, you will need to convert them to match the tuple format expected by your Python code. To convert a hexadecimal color string like D4A373 to a Python tuple, split the string into three two-character pairs representing the red, green, and blue components, and prefix each pair with 0x to define them as hexadecimal integers, yielding (0xD4, 0xA3, 0x73).

4.6. Shrinking

Downsampling (or shrinking) is a geometric spatial transformation that reduces the physical dimensions of an image. Unlike point-processing operations that only modify color intensity, spatial resizing alters the total pixel count by mapping a large coordinate grid onto a smaller one. It is inherently a 2D operation, as it requires navigating the structural rows and columns of the pixel matrix.

The most straightforward way to reduce an image’s size without interpolating or inventing new pixel values is through nearest-neighbor subsampling. To shrink an image by an integer scale factor \(S\), we step through the original image grid and sample exactly one pixel at regular intervals of \(S\), discarding the intermediate data.

For an input image with a width \(W\) and a height \(H\), scaling the image down by a factor of \(S\) yields a new width \(W'\) and height \(H'\) calculated using integer division:

\[W' = \lfloor \frac{W}{S} \rfloor, \quad H' = \lfloor \frac{H}{S} \rfloor\]

To populate a coordinate \((x', y')\) in the brand-new, smaller destination image, we map backward to sample the pixel from the original input image using the scale equation:

\[x' = x \times S, \quad y' = y \times S\]

Because digital image grids are non-continuous and bound to discrete coordinates, this structural skipping effectively drops rows and columns. While computationally trivial and highly efficient, uniform skipping can introduce aliasing or “jaggies” because high-frequency details falling between the sampling strides are entirely lost.

Implement this spatial reduction by writing a script that drops rows and columns at regular intervals, effectively shrinking the source image down to exactly one-tenth of its original width and height (i.e. \(S = 10\)).

The example images below are not rendered to scale.

For example, given the following input image:

tree
Figure 10. tree.png

The resulting downsampled transformation should look like this:

shrink tree
Figure 11. shrink_tree.png

4.7. Tiling

In digital asset management, print publishing, and computer vision, a common spatial task is combining multiple independent images into a single composite layout. Rather than mixing or overlapping pixels, a tiling grid composition positions separate, uniformly sized images edge-to-edge across a larger coordinate canvas.

This operation relies on an anchor-based translation mechanism. Every pixel location \((x, y)\) in an independent source image is mapped to a designated target coordinate \((x', y')\) on a shared output canvas using a specific upper-left starting offset \((\Delta x, \Delta y)\):

\[x' = \Delta x + x\]
\[y' = \Delta y + y\]

By varying the offset \((\Delta x, \Delta y)\) based on the uniform width \(\textit{W}\) and height \(\textit{H}\) of the source images, you can precisely position assets into individual quadrants. For a symmetric 2x2 grid, the geometric layout dictates four distinct anchor positions across the expanded canvas dimensions:

Quadrant Location Horizontal
Canvas Offset \((\Delta x)\)
Vertical
Canvas Offset \((\Delta y)\)

Top-Left

0

0

Top-Right

\(\textit{W}\)

0

Bottom-Left

0

\(\textit{H}\)

Bottom-Right

\(\textit{W}\)

\(\textit{H}\)

Implement this spatial composition process by writing a program that reads four individual source images of matching dimensions and systematically arranges them into a uniform 2x2 grid output file.

To ensure your solution scales cleanly and avoids repetitive code blocks, this program can be decomposed into two distinct, cooperative functions:

  • A localized helper function responsible for translating and copying a single source image’s pixel matrix directly onto a mutable pixel map of a target canvas at a specified coordinate offset.

  • A main orchestration function that determines the dimensions of the constituent assets, constructs an empty canvas exactly double the width and height of a single image, and triggers the helper function sequentially across all four quadrants before saving the final composite.

The example images below are not rendered to scale.

For example, given these four separate input images:

puppy
Figure 12. puppy.png
snake
Figure 13. snake.png
tree
Figure 14. tree.png
woman
Figure 15. woman.png

The resulting 2x2 grid composition should look like this:

tile2x2
Figure 16. tile2x2.png

4.8. The Warhol Effect

In the 1960s, visual artist Andy Warhol revolutionized the Pop Art movement by taking singular, iconic portrait photographs and reproducing them across a grid using starkly contrasting silkscreen color palettes. From an algorithmic standpoint, creating this classic aesthetic requires a sequential pipeline that merges two fundamental image processing concepts you have already used: posterization (tonal color mapping) and tiling (spatial grid composition).

Instead of writing a new application from scratch, your task is to integrate the functional components of the programs you developed in the previous exercises to transform a single continuous-tone source image into a vibrant, multi-quadrant creative piece.

Your program should combine your previous code to form this two-step pipeline:

  1. The Transformation Phase: Reuse your posterization logic to process a single source image four separate times. In each iteration, map the image pixels into a unique, customized three-color palette.

  2. The Spatial Composition Phase: Pass those four distinct, stylized color variations directly into the coordinate translation logic of your 2x2 tiling grid program to arrange them edge-to-edge on an expanded canvas.

The resulting image should look similar to this (you are welcome to use any source image and color choices you prefer):

warhol
Figure 17. warhol.png

4.9. Alpha Blending

In this exercise, you will implement a fundamental computer graphics technique known as Alpha Blending. Your goal is to take two separate images and mathematically combine them to create a new, translucent composite where one image appears to glow through the other.

Alpha blending works by calculating a weighted average of the color values for each corresponding pixel in two images, referred here as \(\textit{Foreground}\) and \(\textit{Background}\). We use a factor called Alpha \(\alpha\), which represents the opacity. The value of \(\alpha\) ranges from 0.0 (completely transparent) to 1.0 (completely opaque).

For every color channel (Red, Green, and Blue), you must apply the following formula:

\[\textit{Result} = \textit{Foreground} \times \alpha + \textit{Background} \times (1 - \alpha)\]

By applying this formula to every pixel, you effectively “cross-fade” the two images. For instance, if you set your percentage factor to 0.5, you will get a perfect 50/50 mix of both sources.

Write a Python script that loads two images of identical dimensions and blends them together using the weighted average formula.

Because the result of your calculation will be a floating-point number, you must cast the final color values back to integers using int() before storing them in your output list.

For example, given these two input images:

woman
Figure 18. woman.png (foreground)
sunset
Figure 19. sunset.png (background)

The resulting alpha blend, with \(\alpha = 0.5\), should look like this:

woman sunset
Figure 20. woman_sunset.png

5. License and Credits

  • Copyright © 2026 by Ariel Ortiz.

  • This work is licensed under a CC BY-NC-SA 4.0.

  • This document was prepared using the Asciidoctor text processor.

  • The topics and examples for this tutorial were inspired by the Media Computation approach developed by Mark Guzdial and Barbara Ericson in their book, “Introduction to Computing and Programming in Python: A Multimedia Approach”. While the original text uses a specific student-oriented library called JES (Jython Environment for Students), this tutorial has been adapted to use the modern, industry-standard Pillow image library.

  • Stock photographs by Pexels:

  • Icons by Flaticon by Magnific.

  • The author utilized Gemini, a large language model by Google, for drafting assistance and technical review of these tutorial notes.