[ComputerVision] 11.

학교수업/컴퓨터비전

[ComputerVision] 11.

hwijin97 2022. 6. 2. 14:27

Parametric (global) warping

- Transformation T is coordinate-changin machine : $p' = T(p)$

- What does it mean that $T$ is global?

> is the same for any point $p$

> can be described by just a few numbers (parameters)

- For linear transformations, we can represent T as a matrix $p' = Tp$

Common transformations

- Translation

- Rotation

- Aspect

- Affine

- Perspective

Scaling

- Scaling a coordinate means multiplying each of its components by a scalar

- Uniform scaling means this scalar is the same for all components

- Non-uniform scaling different scalars per component

- Scaling operation : $x'=ax$, $y' = by$

2-D Rotation

- $x' = x\cos \theta - y\sin \theta$ , $y' = x \sin \theta + y \cos \theta$

- Even though $\sin$ and $\cos$ are nonlinear funtions of $\theta$,

> $x'$ is a linear combination of $x$ and $y$

> $y'$ is a linear combination of $x$ and $y$

- What is the inverse transformation?

> Rotation by $-\theta$

> For rotation matrices $R^{-1} = R^$

Affine Transformations

- Affine transformations are combinations of

> Linear transformations

> Translation

- Properties of affine transformations

> Lines map to lines

> Parallel lines remain parallel

> Ratios are preserved

> Closed under composition

Projective Transformations

- Projective transformations are combos of

> Affine transformations

> Projective warps

- Properties of projective transformations

> Lines map to lines

> Parallel lines do not necessarily remain parallel

> Ratios are not preserved

> Closed under composition

> Models change of basis

> Proejctive matrix is defined up to a scale

Example) Solving for translation

- Given matched points in ${A}$ and ${B}$, estimate the translation of the object

> $x_i^B = x_i^A + t_x$ , $y_i^B = y_i^A + t_y$

- Least squares solution

Write down objective function
Derived solution
- Compute derivative
- Compute solution
Computational solution
- Write in form $Ax = b$
- Solve using pseudo-inverse or eigenvalue decomposition

- RANSAC solution

Sample a set of matching points ( 1 pair )
Solve for transformation parameters
Score parameters with nuber of inliers
Repeat steps 1-3 N times

- Hough transform solution

Initialize a grid of parameter values
Each matched pair casts a vote for consistent values
Find the parameters with the most votes
Solve using least squares with inliers

Fitting and Alignment : Methods

- Global optimization / Search for parameters

Least squares fit
Robust least squares
Other parameter search methods

- Hypothesize and test

Generalized Hough transform
RANSAC

- Iterative Closest Points (ICP)

Algorithm Summary

Least Squares Fit
- closed form solution
- robust to noise
- not robust to outliers
Robust Least Squares
- improves robustness to outliers
- requires iterative optimization
Hough transform
- robust to noise and outliers
- cna fit multiple models
- only works for a few parameters ( 1-4 typically )
RANSAC
- robust to noise and outliers
- works with a moderate number of parameters
Iterative Closest Point
- For local alignment only : does not require initial correspondences

Mosaic

Allignment Applications

- Humna visual system has a field of view of aroung 135x200 degrees

- Typical camera has a field of view of only 35x50 degrees

- A look into the past

- Instance recognition

Feature-based alignment

- Search sets of feature matches that agree in terms of

> Local appearance

> Geometric configuration

Feature-based alignment : Overview

- Alignment as fitting

> Affine transformations

> Homographies

- Robust alignment

> Descriptor-based feature matching

> RANSAC

- Large-sacle alignment

> Inverted indexing

> Vocabulary trees

Alignment as fitting

- Previous lectures : Fitting a model to features in one image

> Find model $M$ that minimizes $\sum_i \text{residual} (x_i, M)$

- Alignment : fitting a model to a transformation between pairs of features (matches) in two images

> Find transformation $T$ that minimizes $\sum_i \text{residual} (T(x_i), x_i')$

Let's start with affine transformations

- Simple fitting procedure ( linear least squares)

- Approximates viewpoint changes for roughly planar objects and roughly orthographic cameras

( Perspective Projection, Orthographic projection )

- Can be used to initialize fitting for more complex models

Fitting an affine transformation

- Assume we know tha correspondences, how do we get the transformation?

- Linear system with six unknowns

- each match gives us two linearly independent equations need at least three to solve for the transformation parameters

Fitting a plane projective transformation

- Homography : plane projective transformation ( transformation taking a quad to another arbitrary quad)

Homography

- definition

> Gerneral mathematics : Homography = projective linear transformation

> Vision (most common usage) : homography = linear transformation betwen two image palnes

- Examples

> Project 3D surface into frontal view

> Relate two views that differ only by rotation

Fitting a homography

- Equation for homography

> $\lambda x_i' = H x_i$ -> 8 unknowns

> $x_i' \times H x_i = 0$

> $Ah = 0$, $A$ : (2xn) x 9

- H has 8 degrees of freedom

- One match gives us two linearly independent equations

- Homogeneous least squares : find h minimizing $||Ah||^2$

> eigenvector of $A^TA$ corresponding to smallest eigenvalue

> Four matches meeded for a minimal solution

Image stitching algorithm overview

- Detect keypoints ( SIFT )

- Match keypoints (1st/2nd NN < thresh)

- Estimate homography with four matched keypoints (RANSAC)

- Combine images

Computing homography

- Assume we have four matched points : How do we compute homography H?

- Direct Linear Transformation ( DLT)

> Apply SVD : $UDV^T = A$ -> h = smallest singular vector

- Normalized DLT

> Normalized coordinates for each image

>> translate for zero mean

>> Scale so that average distance to origin is ~sqrt(2) $\tilde{x} = Tx$, $\tilde{x}' = T'x'$ -> This makes problem better behaved umerically

> Compute $\tilde{H}$ using DLT in normalized coordinates

> Unnormalized : $H = T'^{-1}\tilde{H}T$, $x_i' = H x_i$

----

- Assume we have matched points with outliers : How do we compute homography H?

- Automatic Homography Estimation with RANSAC

> Choose number of samples N

> Choose 4 random potential matches

> Compute H using normalzied DLT

> Project points from x to x' for each potentially matching pair : $x_i' = Hx_i$

> Count points with projected distance < t

> Repeat steps 2-5 N times - Choose H with most inliers

Automatic Image Stitching

- Compute interest points on each image

- Find candidate matches

- Estimate homography H using matched points and RANSAC with normalized DLT

- Project each image onto the same surface and blend

Robust feature-based alignment

- Extract features

- Compute putative matches

- Loop

> Hypothesize transformation T

> Verify transformation ( search for other matches consistent with T)

Generating putative correspondences

- Need to compute feature descriptors of local patches surrounding interset points

Feature Descriptors

- Simplest descriptor : vector of raw intensity values

- How to compute two such vectors?

> sum of squared fifferences (SSD) : $SSD(u,v) = \sum_i(u_i - v_i)^2$

- Normalized correlation

Disadvantage of intensity vectors as descriptor

- Small deformations can affect the matching score a lot

Feature descriptors : SIFT

- Descriptor computation

> Divide patch into 4x4 sub-patches

> Compute histogram of gradient orientations ( 8 reference angles ) inside each sub-patch

> Resulting descriptor : 4x4x8 = 128 dimensions

- Advantage over raw vectors of pixel values

> Gradients less sensitive to illumination change

> Pooling of gradients over the sub-patches achieves robustness to small shifts, but still preserves some spatial information

Feature Matching

- Generating putative matches : for each patch in one image, find a short list of patches in the other image that could match it based solely on appearance

Rejection of unreliable matches

- How can we tell which putative matches are more reliable?

- Heuristic : compare distance of nearest neighbor to that of second nearest neighbor

> Ratio of closest distance to second-closest distance will be high for features that are not distinctive

RANSAC

- The set of putative matches contatins a very high percentage of outliers

Loop :

- Randomly select a seed group of matches

- Compute transformation from seed group

- Find inliers to this transformation

- If the number of inliers is sufficiently large, re-compute least-squares estimate of transformation on all of the inliers

Chossing seams

- Easy method

> Assign each pixel to image with nearest center ( seamless )

> Create a mask :mask(y, x) = 1 iff pixel should come from image 1

> Smooth boundaries (call feathering) : mask_sm = imfilter(mask, gausfil)

> Composite : imblend = im1_c.*mask + im2_c.*(1-mask)

Gain compensation

- Simple gain adjustment

> Compute average RGB intensity of each image in overlapping region

> Normalize intensities by ratio of averages

Image Blending vs Feathering

Affect of window size

Multi-band Blending

- Two-band blending

> only use two bands : high freq(엣지). and low freq(밝기).

> blends low freq. smoothly

> Bend high freq. with no smoothing : use binary mask

Multiband Blending with Laplacian Pyramid

- At low frequencies, blend slowly

- At high frequencies, blend quickly

- Visual representation of an image pyramid with 5 levels

- Laplacian pyramid is very similar to a Gaussian pyramid but saves the difference image of the blurred versions between each levels.

- Only the smallest level is not a difference image to enable reconstruction of the high resolution image using the difference images on higher levels

- used in image compression

Multiband blending

1. compute laplacian pyramid of image and mask

2. create blended image at each level of pyramid

3. reconstruct complete image

저작자표시