[ComputerVision] 11.
Parametric (global) warping
- Transformation T is coordinate-changin machine : $p' = T(p)$
- What does it mean that $T$ is global?
> is the same for any point $p$
> can be described by just a few numbers (parameters)
- For linear transformations, we can represent T as a matrix $p' = Tp$
Common transformations
- Translation
- Rotation
- Aspect
- Affine
- Perspective
Scaling
- Scaling a coordinate means multiplying each of its components by a scalar
- Uniform scaling means this scalar is the same for all components
- Non-uniform scaling different scalars per component
- Scaling operation : $x'=ax$, $y' = by$
2-D Rotation
- $x' = x\cos \theta - y\sin \theta$ , $y' = x \sin \theta + y \cos \theta$
- Even though $\sin$ and $\cos$ are nonlinear funtions of $\theta$,
> $x'$ is a linear combination of $x$ and $y$
> $y'$ is a linear combination of $x$ and $y$
- What is the inverse transformation?
> Rotation by $-\theta$
> For rotation matrices $R^{-1} = R^$
Affine Transformations
- Affine transformations are combinations of
> Linear transformations
> Translation
- Properties of affine transformations
> Lines map to lines
> Parallel lines remain parallel
> Ratios are preserved
> Closed under composition
Projective Transformations
- Projective transformations are combos of
> Affine transformations
> Projective warps
- Properties of projective transformations
> Lines map to lines
> Parallel lines do not necessarily remain parallel
> Ratios are not preserved
> Closed under composition
> Models change of basis
> Proejctive matrix is defined up to a scale
Example) Solving for translation
- Given matched points in ${A}$ and ${B}$, estimate the translation of the object
> $x_i^B = x_i^A + t_x$ , $y_i^B = y_i^A + t_y$
- Least squares solution
- Write down objective function
- Derived solution
- Compute derivative
- Compute solution
- Computational solution
- Write in form $Ax = b$
- Solve using pseudo-inverse or eigenvalue decomposition
- RANSAC solution
- Sample a set of matching points ( 1 pair )
- Solve for transformation parameters
- Score parameters with nuber of inliers
- Repeat steps 1-3 N times
- Hough transform solution
- Initialize a grid of parameter values
- Each matched pair casts a vote for consistent values
- Find the parameters with the most votes
- Solve using least squares with inliers
Fitting and Alignment : Methods
- Global optimization / Search for parameters
- Least squares fit
- Robust least squares
- Other parameter search methods
- Hypothesize and test
- Generalized Hough transform
- RANSAC
- Iterative Closest Points (ICP)
Algorithm Summary
- Least Squares Fit
- closed form solution
- robust to noise
- not robust to outliers
- Robust Least Squares
- improves robustness to outliers
- requires iterative optimization
- Hough transform
- robust to noise and outliers
- cna fit multiple models
- only works for a few parameters ( 1-4 typically )
- RANSAC
- robust to noise and outliers
- works with a moderate number of parameters
- Iterative Closest Point
- For local alignment only : does not require initial correspondences
Mosaic
Allignment Applications
- Humna visual system has a field of view of aroung 135x200 degrees
- Typical camera has a field of view of only 35x50 degrees
- A look into the past
- Instance recognition
Feature-based alignment
- Search sets of feature matches that agree in terms of
> Local appearance
> Geometric configuration
Feature-based alignment : Overview
- Alignment as fitting
> Affine transformations
> Homographies
- Robust alignment
> Descriptor-based feature matching
> RANSAC
- Large-sacle alignment
> Inverted indexing
> Vocabulary trees
Alignment as fitting
- Previous lectures : Fitting a model to features in one image
> Find model $M$ that minimizes $\sum_i \text{residual} (x_i, M)$
- Alignment : fitting a model to a transformation between pairs of features (matches) in two images
> Find transformation $T$ that minimizes $\sum_i \text{residual} (T(x_i), x_i')$
Let's start with affine transformations
- Simple fitting procedure ( linear least squares)
- Approximates viewpoint changes for roughly planar objects and roughly orthographic cameras
( Perspective Projection, Orthographic projection )
- Can be used to initialize fitting for more complex models
Fitting an affine transformation
- Assume we know tha correspondences, how do we get the transformation?
- Linear system with six unknowns
- each match gives us two linearly independent equations need at least three to solve for the transformation parameters
Fitting a plane projective transformation
- Homography : plane projective transformation ( transformation taking a quad to another arbitrary quad)
Homography
- definition
> Gerneral mathematics : Homography = projective linear transformation
> Vision (most common usage) : homography = linear transformation betwen two image palnes
- Examples
> Project 3D surface into frontal view
> Relate two views that differ only by rotation
Fitting a homography
- Equation for homography
> $\lambda x_i' = H x_i$ -> 8 unknowns
> $x_i' \times H x_i = 0$
> $Ah = 0$, $A$ : (2xn) x 9
- H has 8 degrees of freedom
- One match gives us two linearly independent equations
- Homogeneous least squares : find h minimizing $||Ah||^2$
> eigenvector of $A^TA$ corresponding to smallest eigenvalue
> Four matches meeded for a minimal solution
Image stitching algorithm overview
- Detect keypoints ( SIFT )
- Match keypoints (1st/2nd NN < thresh)
- Estimate homography with four matched keypoints (RANSAC)
- Combine images
Computing homography
- Assume we have four matched points : How do we compute homography H?
- Direct Linear Transformation ( DLT)
> Apply SVD : $UDV^T = A$ -> h = smallest singular vector
- Normalized DLT
> Normalized coordinates for each image
>> translate for zero mean
>> Scale so that average distance to origin is ~sqrt(2) $\tilde{x} = Tx$, $\tilde{x}' = T'x'$ -> This makes problem better behaved umerically
> Compute $\tilde{H}$ using DLT in normalized coordinates
> Unnormalized : $H = T'^{-1}\tilde{H}T$, $x_i' = H x_i$
----
- Assume we have matched points with outliers : How do we compute homography H?
- Automatic Homography Estimation with RANSAC
> Choose number of samples N
> Choose 4 random potential matches
> Compute H using normalzied DLT
> Project points from x to x' for each potentially matching pair : $x_i' = Hx_i$
> Count points with projected distance < t
> Repeat steps 2-5 N times - Choose H with most inliers
Automatic Image Stitching
- Compute interest points on each image
- Find candidate matches
- Estimate homography H using matched points and RANSAC with normalized DLT
- Project each image onto the same surface and blend
Robust feature-based alignment
- Extract features
- Compute putative matches
- Loop
> Hypothesize transformation T
> Verify transformation ( search for other matches consistent with T)
Generating putative correspondences
- Need to compute feature descriptors of local patches surrounding interset points
Feature Descriptors
- Simplest descriptor : vector of raw intensity values
- How to compute two such vectors?
> sum of squared fifferences (SSD) : $SSD(u,v) = \sum_i(u_i - v_i)^2$
- Normalized correlation
Disadvantage of intensity vectors as descriptor
- Small deformations can affect the matching score a lot
Feature descriptors : SIFT
- Descriptor computation
> Divide patch into 4x4 sub-patches
> Compute histogram of gradient orientations ( 8 reference angles ) inside each sub-patch
> Resulting descriptor : 4x4x8 = 128 dimensions
- Advantage over raw vectors of pixel values
> Gradients less sensitive to illumination change
> Pooling of gradients over the sub-patches achieves robustness to small shifts, but still preserves some spatial information
Feature Matching
- Generating putative matches : for each patch in one image, find a short list of patches in the other image that could match it based solely on appearance
Rejection of unreliable matches
- How can we tell which putative matches are more reliable?
- Heuristic : compare distance of nearest neighbor to that of second nearest neighbor
> Ratio of closest distance to second-closest distance will be high for features that are not distinctive
RANSAC
- The set of putative matches contatins a very high percentage of outliers
Loop :
- Randomly select a seed group of matches
- Compute transformation from seed group
- Find inliers to this transformation
- If the number of inliers is sufficiently large, re-compute least-squares estimate of transformation on all of the inliers
Chossing seams
- Easy method
> Assign each pixel to image with nearest center ( seamless )
> Create a mask :mask(y, x) = 1 iff pixel should come from image 1
> Smooth boundaries (call feathering) : mask_sm = imfilter(mask, gausfil)
> Composite : imblend = im1_c.*mask + im2_c.*(1-mask)
Gain compensation
- Simple gain adjustment
> Compute average RGB intensity of each image in overlapping region
> Normalize intensities by ratio of averages
Image Blending vs Feathering
Affect of window size
Multi-band Blending
- Two-band blending
> only use two bands : high freq(엣지). and low freq(밝기).
> blends low freq. smoothly
> Bend high freq. with no smoothing : use binary mask
Multiband Blending with Laplacian Pyramid
- At low frequencies, blend slowly
- At high frequencies, blend quickly
- Visual representation of an image pyramid with 5 levels
- Laplacian pyramid is very similar to a Gaussian pyramid but saves the difference image of the blurred versions between each levels.
- Only the smallest level is not a difference image to enable reconstruction of the high resolution image using the difference images on higher levels
- used in image compression
Multiband blending
1. compute laplacian pyramid of image and mask
2. create blended image at each level of pyramid
3. reconstruct complete image