The Prokudin-Gorskii photo collection is a historical archive containing more than 10,000 glass plate images captured between 1908 and 1915 by Russian photographer Sergei Mikhailovich Prokudin-Gorskii. These photographs offer a rare view into the daily life, architecture, and landscapes of the Russian Empire during a time of profound political and social transformation. The Library of Congress digitized the negatives of these glass plates and made them available to the public. In this project, we use image plates selected from the Prokudin-Gorskii collection to experiment with image processing techniques.
The naive approach iterates over all possible displacements in both x and y directions within a window of 15 pixels. Fixing one channel as a reference, we move the other channels and evaluate the alignment score at each step. There are two metrics we use: L2 Euclidean distance and normalized cross-correlation (NCC).
For the L2 norm, we simply compute the sum of squared differences between pixels of the base image and the shifted image. A lower L2 norm indicates better matching.
For NCC, we treat each image as a vector. We take the zero-centered, normalized vectors and compute their dot product. NCC performs consistently better in my experiments.
The pyramid search algorithm recursively scales down the image by half and performs naive alignment on the scaled images to improve efficiency and handle larger displacements.
In our implementation, we fix the blue image as the base plane and shift the green and red images. For simple alignment, I implemented a function named find_displacement_naive(image_1, image_2, window_size=15, NCC=False) which exhaustively computes the score for all combinations of displacements. The final displacement is chosen by minimizing the L2 distance or maximizing the NCC score. NCC performs consistently better than L2 distance in my experiments.
For pyramid alignment, I implemented the find_displacement_pyramid function on top of find_displacement_naive, which reduced runtime and enhanced performance for high-resolution images. I found that using NCC further improved the results. The challenge was determining the optimal number of pyramid layers. Setting a fixed number of layers is not ideal because different image sizes require different depths. Therefore, I ensured that the smallest image had at least 16 to 32 pixels and calculated the number of layers using a logarithmic function. We precompute the images for all pyramid levels and iteratively refine the displacement from the coarsest level to the finest.
I also experimented with dynamically adjusting the window size as the scaling factor increased, but this resulted in longer runtimes without noticeable performance improvements. To balance runtime and accuracy, I chose a constant window size (e.g., 2 pixels for refinement levels) as the optimal value.
To make the image look more natural, I employed two additional techniques: gradient detection and cropping. The Sobel gradient functions as an edge detection filter. It highlights areas of the image where intensity changes significantly. This feature improves alignment accuracy by focusing on elements that are more structural significant. Also, image borders typically do not contribute to the alignment process, so cropping the outer borders helps eliminate noise that may be inconsistent with the main image content.
Displacement G: (1, -1), R: (7, -1)
Displacement G: (-3, 0), R: (3, 1)
Displacement G: (3, 3), R: (6, 3)
Displacement G: (25, 4), R: (58, -4)
Displacement G: (49, 24), R: (107, 40)
Displacement G: (60, 17), R: (124, 14)
Displacement G: (42, 17), R: (90, 23)
Displacement G: (38, 22), R: (77, 36)
Displacement G: (-3, -2), R: (76, -8)
Displacement G: (41, -17), R: (92, -29)
Displacement G: (80, 10), R: (177, 13)
Displacement G: (78, 29), R: (176, 37)
Displacement G: (49, -6), R: (96, -24)
Displacement G: (54, 12), R: (111, 9)
Displacement G: (57, -5), R: (125, -24)
Displacement G: (31, 21), R: (91, 24)
Displacement G: (56, 34), R: (125, 60)