학교수업/컴퓨터비전

[ComputerVision] 11.

hwijin97 2022. 6. 2. 14:27

Parametric (global) warping

- Transformation T is coordinate-changin machine : $p' = T(p)$

- What does it mean that $T$ is global?

  > is the same for any point $p$

  > can be described by just a few numbers (parameters)

- For linear transformations, we can represent T as a matrix $p' = Tp$

 

Common transformations

- Translation

- Rotation

- Aspect

- Affine

- Perspective

 

Scaling

- Scaling a coordinate means multiplying each of its components by a scalar

- Uniform scaling means this scalar is the same for all components

- Non-uniform scaling different scalars per component

- Scaling operation : $x'=ax$, $y' = by$

 

2-D Rotation

- $x' = x\cos \theta - y\sin \theta$ , $y' = x \sin \theta + y \cos \theta$

- Even though $\sin$ and $\cos$ are nonlinear funtions of $\theta$,

  > $x'$ is a linear combination of $x$ and $y$

  > $y'$ is a linear combination of $x$ and $y$

- What is the inverse transformation?

  > Rotation by $-\theta$

  > For rotation matrices $R^{-1} = R^$

 

Affine Transformations

- Affine transformations are combinations of

  > Linear transformations

  > Translation

- Properties of affine transformations

  > Lines map to lines

  > Parallel lines remain parallel

  > Ratios are preserved

  > Closed under composition

Projective Transformations

- Projective transformations are combos of

  > Affine transformations

  > Projective warps

- Properties of projective transformations

  > Lines map to lines

  > Parallel lines do not necessarily remain parallel

  > Ratios are not preserved

  > Closed under composition

  > Models change of basis

  > Proejctive matrix is defined up to a scale

 

Example) Solving for translation

- Given matched points in ${A}$ and ${B}$, estimate the translation of the object

  > $x_i^B = x_i^A + t_x$ , $y_i^B = y_i^A + t_y$

- Least squares solution

  • Write down objective function
  • Derived solution
    • Compute derivative
    • Compute solution
  • Computational solution
    • Write in form $Ax = b$
    • Solve using pseudo-inverse or eigenvalue decomposition

- RANSAC solution

  • Sample a set of matching points ( 1 pair )
  • Solve for transformation parameters
  • Score parameters with nuber of inliers
  • Repeat steps 1-3 N times

- Hough transform solution

  • Initialize a grid of parameter values
  • Each matched pair casts a vote for consistent values
  • Find the parameters with the most votes
  • Solve using least squares with inliers

Fitting and Alignment : Methods

- Global optimization / Search for parameters

  • Least squares fit
  • Robust least squares
  • Other parameter search methods

- Hypothesize and test

  • Generalized Hough transform
  • RANSAC

- Iterative Closest Points (ICP)

 

Algorithm Summary

  • Least Squares Fit
    • closed form solution
    • robust to noise
    • not robust to outliers
  • Robust Least Squares
    • improves robustness to outliers
    • requires iterative optimization
  • Hough transform
    • robust to noise and outliers
    • cna fit multiple models
    • only works for a few parameters ( 1-4 typically )
  • RANSAC
    • robust to noise and outliers
    • works with a moderate number of parameters
  • Iterative Closest Point
    • For local alignment only : does not require initial correspondences

 

Mosaic

Allignment Applications

- Humna visual system has a field of view of aroung 135x200 degrees

- Typical camera has a field of view of only 35x50 degrees

- A look into the past

- Instance recognition

 

Feature-based alignment

- Search sets of feature matches that agree in terms of

  > Local appearance

  > Geometric configuration

Feature-based alignment : Overview

- Alignment as fitting

  > Affine transformations

  > Homographies

- Robust alignment

  > Descriptor-based feature matching

  > RANSAC

- Large-sacle alignment

  > Inverted indexing

  > Vocabulary trees

 

Alignment as fitting

- Previous lectures : Fitting a model to features in one image

  > Find model $M$ that minimizes $\sum_i \text{residual} (x_i, M)$

- Alignment : fitting a model to a transformation between pairs of features (matches) in two images

  > Find transformation $T$ that minimizes $\sum_i \text{residual} (T(x_i), x_i')$

 

Let's start with affine transformations

- Simple fitting procedure ( linear least squares)

- Approximates viewpoint changes for roughly planar objects and roughly orthographic cameras

( Perspective Projection, Orthographic projection ) 

- Can be used to initialize fitting for more complex models

 

Fitting an affine transformation

- Assume we know tha correspondences, how do we get the transformation?

- Linear system with six unknowns

- each match gives us two linearly independent equations need at least three to solve for the transformation parameters

 

Fitting a plane projective transformation

- Homography : plane projective transformation ( transformation taking a quad to another arbitrary quad)

 

Homography

- definition

  > Gerneral mathematics : Homography = projective linear transformation

  > Vision (most common usage) : homography = linear transformation betwen two image palnes

- Examples

  > Project 3D surface into frontal view

  > Relate two views that differ only by rotation

 

Fitting a homography

- Equation for homography

  > $\lambda x_i' = H x_i$ -> 8 unknowns

  > $x_i' \times H x_i = 0$

  > $Ah = 0$, $A$ : (2xn) x 9

- H has 8 degrees of freedom

- One match gives us two linearly independent equations

- Homogeneous least squares : find h minimizing $||Ah||^2$

  > eigenvector of $A^TA$ corresponding to smallest eigenvalue

  > Four matches meeded for a minimal solution

 

Image stitching algorithm overview

- Detect keypoints ( SIFT )

- Match keypoints (1st/2nd NN < thresh)

- Estimate homography with four matched keypoints (RANSAC)

- Combine images

 

Computing homography

- Assume we have four matched points : How do we compute homography H?

- Direct Linear Transformation ( DLT)

  > Apply SVD : $UDV^T = A$ -> h = smallest singular vector

- Normalized DLT

  > Normalized coordinates for each image

      >> translate for zero mean

      >> Scale so that average distance to origin is ~sqrt(2) $\tilde{x} = Tx$, $\tilde{x}' = T'x'$ -> This makes problem better behaved umerically

  > Compute $\tilde{H}$ using DLT in normalized coordinates

  > Unnormalized : $H = T'^{-1}\tilde{H}T$, $x_i' = H x_i$

----

- Assume we have matched points with outliers : How do we compute homography H?

- Automatic Homography Estimation with RANSAC

  > Choose number of samples N

  > Choose 4 random potential matches

  > Compute H using normalzied DLT

  > Project points from x to x' for each potentially matching pair : $x_i' = Hx_i$

  > Count points with projected distance < t

  > Repeat steps 2-5 N times - Choose H with most inliers

 

Automatic Image Stitching

- Compute interest points on each image

- Find candidate matches

- Estimate homography H using matched points and RANSAC with normalized DLT

- Project each image onto the same surface and blend

 

Robust feature-based alignment

- Extract features

- Compute putative matches

- Loop

  > Hypothesize transformation T

  > Verify transformation ( search for other matches consistent with T)

 

Generating putative correspondences

- Need to compute feature descriptors of local patches surrounding interset points

 

Feature Descriptors

- Simplest descriptor : vector of raw intensity values

- How to compute two such vectors?

  > sum of squared fifferences (SSD) : $SSD(u,v) = \sum_i(u_i - v_i)^2$

- Normalized correlation

 

Disadvantage of intensity vectors as descriptor

- Small deformations can affect the matching score a lot

Feature descriptors : SIFT

- Descriptor computation

  > Divide patch into 4x4 sub-patches

  > Compute histogram of gradient orientations ( 8 reference angles ) inside each sub-patch

  > Resulting descriptor : 4x4x8 = 128 dimensions

- Advantage over raw vectors of pixel values

  > Gradients less sensitive to illumination change

  > Pooling of gradients over the sub-patches achieves robustness to small shifts, but still preserves some spatial information

 

Feature Matching

- Generating putative matches : for each patch in one image, find a short list of patches in the other image that could match it based solely on appearance

 

Rejection of unreliable matches

- How can we tell which putative matches are more reliable?

- Heuristic : compare distance of nearest neighbor to that of second nearest neighbor

  > Ratio of closest distance to second-closest distance will be high for features that are not distinctive

 

RANSAC

- The set of putative matches contatins a very high percentage of outliers

Loop :

- Randomly select a seed group of matches

- Compute transformation from seed group

- Find inliers to this transformation

- If the number of inliers is sufficiently large, re-compute least-squares estimate of transformation on all of the inliers

 

Chossing seams

- Easy method

  > Assign each pixel to image with nearest center ( seamless )

  > Create a mask :mask(y, x) = 1 iff pixel should come from image 1

  > Smooth boundaries (call feathering) : mask_sm = imfilter(mask, gausfil)

  > Composite : imblend = im1_c.*mask + im2_c.*(1-mask)

Gain compensation

- Simple gain adjustment

  > Compute average RGB intensity of each image in overlapping region

  > Normalize intensities by ratio of averages

 

Image Blending vs Feathering

Affect of window size

Multi-band Blending

- Two-band blending

  > only use two bands : high freq(엣지). and low freq(밝기).

  > blends low freq. smoothly

  > Bend high freq. with no smoothing : use binary mask

 

Multiband Blending with Laplacian Pyramid

- At low frequencies, blend slowly

- At high frequencies, blend quickly

- Visual representation of an image pyramid with 5 levels

- Laplacian pyramid is very similar to a Gaussian pyramid but saves the difference image of the blurred versions between each levels.

- Only the smallest level is not a difference image to enable reconstruction of the high resolution image using the difference images on higher levels

- used in image compression

 

Multiband blending

1. compute laplacian pyramid of image and mask

2. create blended image at each level of pyramid

3. reconstruct complete image