CS 180: Project 1, Steven Luo
Overview
Sergei Mikhailovich Prokudin-Gorskii was a Russian chemist and photographer known for his pioneering work in color photography and his efforts to document early 20th-century Russia. He recorded three exposures of each scene onto a glass plate using a red, green, and blue filter, and envisioned special projectors would be able to combine the exposure and display a color image. Though he left Russia in 1918, these glass plate were purchased by the Library of Congress and recently digitized.
Our goal is to turn a stitched negative into a color image. A stitched negative consists of the three color exposures stacked in one image file.
Examples of digitized glass plates: tobolsk, monastery, and cathedral.
We want to use image processing techniques to align these plates to produce a color image with minimal artifacts. I used an alignment algorithm with Euclidean distance scoring and implemented an image pyramid to generate an initial set of color images, then further refined the output by using edges as features, adding white balancing, and automatically cropping images to remove undesired borders resulting from stacking the glass plates.
Approach
We start by cropping the image into thirds to get the red, green, and blue plate, and stack them on top of each other to get a color image. A quick experiment shows that this image has many visual artifacts that distort the subject of the image.
Three plates stacked without alignment.
I implemented an alignment function that finds the best displacement on each axis, then stacks the shifted plates. By finding the displacement that results in the best alignment when aligning the green to the blue then the red to the blue plate, we can stack the displaced green, displaced red, and blue plates together to get an improved color image. To evaluate the quality of an alignment, I use a scoring method that calculates the Euclidean distance between two images for every possible pair of (x,y) displacements within a specified search range; the returned, best-scoring image from my align
function is the result of an x and y displacement from the search range that minimizes the Euclidean distance. I also implemented normalized cross-correlation as a scoring method, but it resulted in worse alignment and worse runtime.
To prevent the borders of the image from skewing the score computation, I only scored images based on the center 90% (removing 5% of image size from each side), slightly improving alignments. I considered preprocessing images by directly cropping the initial plates, but I decided not to in order to preserve the full image to allow for smarter cropping in the future.
In order to find the best alignment, I began with an exhaustive search of every possible pair of x and y displacements from a fixed range, for example [-20, 20]. Not only is exhaustive search slow (there are (range width)2 combinations — in this case, 412 combinations to check), it also risks missing the optimal displacement pair if the optimum falls outside of the search range, especially likely on a higher resolution image.
In order to speed up the alignment process, I implemented an image pyramid that recursively searches a small range of pixel shifts starting from a low resolution to the highest resolution. The smaller search range and the improved search method was vital in order to align the larger, higher-resolution .tif
images, reducing the per-image runtime from tens of minutes to tens of seconds. This also improves the search heuristic: by centering the search range around the optimal displacement pair for a lower- (half) resolution image, we can shrink the search range significantly while still finding or ending up near the true optimal displacement pair, a speed up of several orders of magnitude.
As I improved my alignment and scoring methods, I noticed several images were consistently tricky to align: melon
, self-portrait
, church
, and especially emir
. These images are difficult for reasons such as their complex composition and inconsistent brightness between plates. The order of plate alignment also might have played a role, especially in emir
which would probably have benefited from aligning to a different plate, such as red, instead of the blue, since there was not much red on the red plate. However, implementing the image pyramid and later bells and whistles successfully minimized artifacts on all of these images.
Results
Displacement: G (x_g, y_g); R (x_r, y_r) indicates an x_g pixel shift on the x-axis and y_g pixel shift on the y-axis on the green plate; x_r shift on the x-axis and y_r shift on the y-axis on the red plate.
Low Quality Images (.jpg) | |
---|---|
Cathedral Displacement: G (2, 5); R (3, 12) Runtime: 0.13 seconds |
Monastery Displacement: G (2, -3); R (2, 3) Runtime: 0.11 seconds |
Tobolsk Displacement: G (2, 3); R (3, 6) Runtime: 0.12 seconds |
Full Quality Images (.tif) | |
---|---|
Church Displacement: G (3, 25); R (-5, 58) Runtime: 10.94 seconds |
Emir Displacement: G (24, 49); R (43, 88) Runtime: 10.41 seconds |
Harvesters Displacement: G (16, 60); R (13, 24) Runtime: 10.38 seconds |
Icon Displacement: G (17, 41); R (23, 89) Runtime: 11.21 seconds |
Lady Displacement: G (8, 55); R (12, 111) Runtime: 10.49 seconds |
Melons Displacement: G (9, 82); R (11, 177) Runtime: 11.00 seconds |
Onion-Shaped Church Displacement: G (26, 51); R (36, 108) Runtime: 11.31 seconds |
Sculpture Displacement: G (-11, 33); R (-27, 140) Runtime: 11.02 seconds |
Self-Portrait Displacement: G (29, 79); R (34, 175) Runtime: 11.83 seconds |
Three Generations Displacement: G (13, 55); R (10, 112) Runtime: 11.52 seconds |
Train Displacement: G (6, 42); R (32, 87) Runtime: 11.8 seconds |
Bells and Whistles
Better Features: Edge Detection with the Sobel Operator
Instead of aligning colors, what if we aligned shapes? It turns out the Sobel operator allows us to do just that: it produces an “edge map” (black and white image emphasizing edges) from an image, and by aligning the edge maps we can better align our images based on the subjects in them. Several examples are provided below!
The Sobel operator uses the convolution of two special 3x3 kernels with the input image to approximate the derivatives in the vertical and horizontal directions. This is a good approximation edges because it reveals sudden changes — like when one object ends and another begins — in pixel intensities in two directions. A large-magnitude derivative would indicate a large shift in pixel intensities, which could be indicative of an edge between objects. However, because pixel values are discrete, we cannot simply calculate a first derivative and instead need to use convolutions to estimate them, as we do with the Sobel operator to estimate the gradient of intensity values at a pixel. We interpret pixels that have gradients of large magnitude as likely to be part of an edge in the image.
Here’s what we get when we apply a Sobel filter to each plate of onion_church.tif
. We can see that aligning the edges will obviously result in a precisely aligned image.
Edges of each glass plate found with a Sobel filter.
This example shows the clear improvement in emir.tif
when aligning based on RGB similarity to aligning based on edges:
Edges of each glass plate found with a Sobel filter.
RGB | Edge |
---|---|
Displacement: G (24, 49); R (43, 88) Runtime: 10.41 seconds |
Displacement: G (23, 49); R (40, 107) Runtime: 11.36 seconds |
Aligning via edges works great when the object is “busy” with edges in all directions to align, but performance is not as strong on images that are “flat”, with fewer edges mostly going in the same direction. Intuitively, alignment is harder with less reference points. Notice how in the following images, after cropping to the center 90% we really only have one horizontal edge to align the three images with. This helps with alignment along the y-axis, but doesn’t give us much to work with on the x-axis (you may need to zoom in to see the lines).
Edges of each glass plate found with a Sobel filter.
On the final image, look at the sun on the horizon and notice how the red plate is slightly misaligned along the x-axis. Note that all “original” images below are all generated using the Sobel operator to align glass plates.
Just For Fun
I was curious about what other photos the Library of Congress had in their collection (and if my code would work on these images) so I downloaded 15 more images from the catalog that caught my eye, and that seemed hard to align. In my comparisons using this set of images, I found my above findings held on this new set of images.
For example, using edges as a feature significantly improved alignment on tricky images, especially when the brightness on the three color channel images were not consistent. This example shows the clear improvement in yurt.tif
when aligning based on RGB similarity to aligning based on edges:
RGB | Edge |
---|---|
Displacement: G (38, 48); R (-9, 115) Runtime: 10.57 seconds |
Displacement: G (37, 57); R (56, 107) Runtime: 12.5 seconds |
Edge-aligned results are found in the following table:
Full Quality Images (.tif) | |
---|---|
Workshop Displacement: G (15, 73); R (14, 154) Runtime: 13.3 seconds |
Greenhouse Displacement: G (28, 59); R (24, 126) Runtime: 12.6 seconds |
Ditches Displacement: G (-27, 33); R (-55, 74) Runtime: 10.85 seconds |
Vases Displacement: G (5, 25); R (9, 110) Runtime: 11.69 seconds |
Forest Displacement: G (-41, 75); R (-69, 114) Runtime: 11.64 seconds |
Sunset Displacement: G (2, 52); R (-35, 119) Runtime: 11.28 seconds |
Lugano Displacement: G (-17, 41); R (-29, 91) Runtime: 12.35 seconds |
Poppies Displacement: G (-1, 12); R (-3, 98) Runtime: 11.15 seconds |
Carpet Displacement: G (-16, 48); R (-52, 91) Runtime: 12.12 seconds |
Italy Displacement: G (22, 38); R (36, 77) Runtime: 12.1 seconds |
Lilacs Displacement: G (-11, 43); R (-36, 95) Runtime: 12.35 seconds |
Peony Displacement: G (20, 76); R (31, 155) Runtime: 13.42 seconds |
Hill Displacement: G (5, 39); R (4, 130) Runtime: 12.03 seconds |
Barracks Displacement: G (1, 11); R (0, 73) Runtime: 11.22 seconds |
White Balancing
In order to adjust the intensities of colors in the image to look more realistic, I implemented automatic white balancing with the gray world assumption. Unlike our eyes, our images do not naturally white balance themselves!
The gray world assumption assumes that all colors in an image will average out to a neutral gray, because the presence of any color will be balanced out by the presence of its complement. For most pictures, this is a reasonable assumption due to the distribution of colors in the image. White balancing involves two problems: first, estimating the illuminant, then manipulating the colors to counteract the illuminant and simulate a neutral illuminant. We estimate the illuminant by calculating the average RGB value in the image, then scale the color channels such that the new average RGB values are equal to the normalized values.
Original | Balanced |
---|---|
Cathedral, original | Cathedral, white balanced |
Self-Portrait, original | Self-Portrait, white balanced |
Lady, original | Lady, white balanced |
Church, original | Church, white balanced |
Church, original | Church, white balanced |
Church, original | Church, white balanced |
Carpet, original | Carpet, white balanced |
Our white balancing algorithm indeed appears to work better on images with a wide variety of colors and an evenly distributed color spectrum. It performs worse when there are large areas of a single dominant color — cases where the average color is not gray.
Automatic Cropping
In order to eliminate black and white borders from the resulting color images, I implemented an algorithm to automatically detect and remove these borders.
For each side of the picture, I selected the outermost column/row of pixels and averaged them to see if they fell within the threshold for close enough to 0 (black) or 1 (white). Since my images have pixel values in [0, 1], my threshold creates a range of acceptable values in [threshold, 1-threshold]. If the average pixel value of the selected group of pixels along a border falls outside of this range, a crop on that border occurs. I upper bounded the threshold by what the average value of a set of pixels would be if it was purely red, green, or blue, setting the threshold to 0.16.
However, cropping one row of pixels at a time is very slow, especially for large images, and is not robust to outliers, like a single column of a different color in the middle of a solid black border. Instead, I took the average of a larger set of pixels along a border, checked it against the threshold, and if it fell within the croppable range I cropped off 2.5% of the image to reduce the number of comparisons needed. In the worst case of a faulty threshold comparison (such as if an image feature is too similar to an undesired border) or a thin border, a 2.5% crop is still a small adjustment and preserves the bulk of the image.
Original | Cropped |
---|---|
Church, original | Church, cropped |
Tobolsk, original | Tobolsk, cropped |
Cathedral, original | Cathedral, cropped |
Monastery, original | Monastery, cropped |
With more time, I would have tried this approach on segments of a particular side. Because artifacts distort the black and whiteness of the solid color borders, checking if a piece of the border falls within the threshold could potentially result in better crops.
three_generations.tif
is a good example of an image that likely could have benefited from this approach:
Original | Cropped |
---|---|
Three Generations, original | Three Generations, cropped |
The right border was not cropped because the average of that group of pixels was about 0.5, safely in the acceptable range. Further experimentation with higher thresholds might also improve automatic crops.
All Together, Now!
Below is a selection of edge-aligned, white balanced, cropped images.
Gallery | |
---|---|
Melon | Emir |
Church | Cathedral |
Ditches | Forest |
Barracks | Workshop |