Computer Vision 计算机视觉常见问题整理

这里是一些学习CV时整理的一些问题集，可能有助于复习等。由于该笔记是很早以前制作的，暂时不做修改。

CV Question

what’s machine vision?

Machine vision (MV) is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance in industry.

Input: image, video, Output: inspection and analysis

Goal: give computers super human-level perception

Typical perception channel

Representation -> ‘fancy math’ -> output, and representation and output are the parts we are most interested in.

Common Applications

Automated visual inspection, object recognition, face detection, face makeovers, vision in cars. image stitching, virtual fitting, vr, kinect fusion, 3D reconstruction.

Subject connection

Image processing: digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing.

Computer Graphics: Computer graphics is the discipline of generating images with the aid of computers.

Pattern Recognition: Pattern recognition is the automated recognition of patterns and regularities in data.

Computer Vision: Computer vision is an interdisciplinary scientific field that deals with how computers can be made to gain high-level understanding from digital images or videos.

Difference between Computer Vision adn Machine Vision: Computer vision refers to automation of the capture and processing of images, with an emphasis on image analysis. In other words, CV’s goal is not only to see, but also to process and provide useful results based on the observation. Machine vision refers to the use of computer vision in industrial environments, making it a subcategory of computer vision.

Artificial intelligence: Computer science defines AI research as the study of “intelligent agents”: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.[1] A more elaborate definition characterizes AI as “a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation.”

Vision Process

Feature extraction and region segmentation. (Low)
Modeling and Schema Representation (Midem)
Describe and understand (high)

Difficulties faced by Machine Vision

Image ambiguity: When 3D sense is projected as a 2D image, the info of the depth and invisible parts is lost. Therefore, 3D objects of different shapes projected on the image plane may produce the same image.
Environment Factors: Factors in the scene such as lighting, objects shapes, surface colors, cameras, and the changes in spatial relationships, etc.
Knowledge guidance: Under different knowledge guidance, the same image will produce different recognition results.
Large amounts of data: Gray image, color image, and depth image have a huge amount of huge amount of data requires a large storage space, and it is not easy to process quickly.

Human Vision System

Physical structure: HVS is composed of optical system, retina, visual pathway.

TODO: I don’t want to learn HVS knowledge first, so I skip it. If I have extra time, the remain knowledge will be made up.

Key tech in Computer Vision System
1. Image process( Smooth denoising, Standardization, Missing/Outlier Value Process )
2. Image feature extraction( Shape, Texture, Color, Spatial Relations )
3. Image Recognition( GoogLeNet, ResNet… )
Image formation

The randomness of the Imaging Process and the complexity of the imaging object determine the nature of the image with a random signal.

An image bascially consists of:

Illumination component $i (x, y)$
Reflection component $r (x, y)$

So, The 2D function representation of the Image:
$f (x, y) = i (x, y) * r (x, y)$

Human eye brightness perception range

Total Range: $10^{-2} -10^6$ , so Contrast $c = B_{max} / B_{min} = 10^8$ , and the Relative Contrast $c_r = 100\% \times (B - B_0) / B_0$ where $B_0$ means background brightness and $B$ means the object brightness.

Relationship between subjective brightness S and actual brightness B:
$S = K \ln{B}+ K_0$

Brightness adaptability

Visually sensitive is contrast, not the brightness value itself.

Weber theorem:

If the brightness of an object differs from the surrounding background $I$ (their ratio is a function). It is approximately constant within a certain range of brightness, with a constant value of 0.02, which is called the Weber ratio.
$\frac{\Delta I}{I} = 0.02$
Mach Effect: The visual system is less sensitive to spatial high and low frequencies, while it is more sensitive to spatial intermediate , a brightness overshoot occurs at a sudden change in brightness. This overshoot can enhance the outline of the scene seen by the human eye.

Color imaging model

Light energy itself is colorless. Color is a physiological and psychological phenomenon that occurs when people’s eyes perceive light.

Lightwave: Light is an electromagnetic wave that radiates according to its wavelength.

Young–Helmholtz theory(trichromatic theory): the three types of cone photoreceptors could be classified as short-preferring (violet), middle-preferring (green), and long-preferring (red).

Color property

Hue: the degree to which a stimulus can be described as similar to or different from stimuli that are described as red, green, blue, and yellow.

Saturation: colorfulness of an area judged in proportion to its brightness.

Intensity: Refers to the degree of light and darkness that the human eye feels due to color stimuli.

Grassman Laws:

First law: Two colored lights appear different if they differ in either dominant wavelength, luminance or purity. Corollary: For every colored light there exists a light with a complementary color such that a mixture of both lights either desaturates the more intense component or gives uncolored (grey/white) light.

Second law: The appearance of a mixture of light made from two components changes if either component changes. Corollary: A mixture of two colored lights that are non-complementary result in a mixture that varies in hue with relative intensities of each light and in saturation according to the distance between the hues of each light.

Third law: There are lights with different spectral power distributions but appear identical. First corollary: such identical appearing lights must have identical effects when added to a mixture of light. Second corollary: such identical appearing lights must have identical effects when subtracted (., filtered) from a mixture of light.

Fourth law: The intensity of a mixture of lights is the sum of the intensities of the components.

First law:	Two colored lights appear different if they differ in either dominant wavelength, luminance or purity. Corollary: For every colored light there exists a light with a complementary color such that a mixture of both lights either desaturates the more intense component or gives uncolored (grey/white) light.
Second law:	The appearance of a mixture of light made from two components changes if either component changes. Corollary: A mixture of two colored lights that are non-complementary result in a mixture that varies in hue with relative intensities of each light and in saturation according to the distance between the hues of each light.
Third law:	There are lights with different spectral power distributions but appear identical. First corollary: such identical appearing lights must have identical effects when added to a mixture of light. Second corollary: such identical appearing lights must have identical effects when subtracted (., filtered) from a mixture of light.
Fourth law:	The intensity of a mixture of lights is the sum of the intensities of the components.

Color

The result of interaction between physical light in the environment and our visual system

Color Space

Linear color space
RGB color space
HSV color space
CIE XYZ

White Balance

White balance (WB) is the process of removing unrealistic color casts, so that objects which appear white in person are rendered white in your photo.

Color temperature describes the spectrum of light which is radiated from a “blackbody” with that surface temperature.

Von Kries adaptation:

Multiply each channel by a gain factor
A more general transformation would correspond to an arbitrary 3x3 matrix

Best way: gray card:

Take a picture of a neutral object
Deduce the weight of each channel

Brightest pixel assumption (non-staurated)

Highlights usually have the color of the light source
Use weights inversely proportional to the values of the brightest pixels

Gamutmapping

Gamut: convex hull of all pixel colors in an image
Find the transformation that matches the gamut of the image to the gamut of a “typical” image under white light

Mathematical representation of an image

Optical radiation power of wavelength ???? is received on the imaging target surface of the camera:
$\lambda, t)$
Common image types:

Binary image
Grayscale image
Index image
RGB image

Common concepts

pixel neighborhood: 4-neighborhood( $N_{4}(p)$ ), 8-neighborhood( $N_{8}(p)$ );

pixel adjacency ===> pixel connectivity;

Template(filter, mask) + convolution ===> filtering, smoothing, sharpening;

Convolution operation properties:

Smoothness: Make the fine structure of each function smooth
Diffusivity: Interval expansion, Diffusion of energy distribution

Application of convolution:

Deconvolution
Remove noise
Feature enhancement

Pixels distance

Distance measurement function characteristics:

$D (p, q) > 0$ non-negativity or separation axiom
$\Leftrightarrow p = q$ identity of indiscernibles
$D (p, q) = D (q, p)$ symmetry
$\le D(p, q) + D(q, r)$ ubadditivity or triangle inequality

Common distance metric functions:

Euclidean Distance: $D_{E}(p, q) = [(x - s)^2 + (y - t)^2]^{\frac{1}{2}}$
City-block Distance: $D_{4}(p, q) = |x - s| + |y - t|$
Chessboard Distance: $D (p, q) = m a x (∣ x - s ∣ + ∣ y - t ∣)$

p-norm: $ \norm{x}_p = (\sum_i{|x_i|^p}){\frac{1}{p}} $

Frobenius-norm: $KaTeX parse error: Undefined control sequence: \norm at position 1: \̲n̲o̲r̲m̲{A}_F = \sqrt{\…$

In the image, the L-2 norm constraint does not distinguish between the edge tangent direction and the gradient direction of the image, and does not reflect the difference between the texture area and the flat area.

The edges of the image will be blurred during the image restoration process.

The L-1 norm constraint only diffuses in the tangential direction of the edges, but does not diffuse in the gradient direction. The goal is to keep the edges of the image as close as possible. Will lead to poor noise suppression effect, step patching results.

Statistical characteristics of images

Information entropy: $-\sum_{i=1}^k p_i \log_2{p_i}$
Gray average: $\bar{f} = \frac{\sum_{i=0}^{M-1} \sum_{j=0}^{N-1} f(i, j)}{MN}$ ， Reflects the average reflection intensity of different parts of the object in the image.
Gray mode
Median grayscale
Gray variance: $\frac{\sum_{i=0}^{M-1}\sum_{j=0}^{N-1}[f(i, j) - \bar{f}]^2}{MN}$
Grayscale range: $f_{range}(i, j) = f_{max}(i, j) - f_{min}(i, j)$
Covariance: f, g represents two MxN images, $S_{gf}^2 = S_{fg}^2 = \frac{1}{MN} \sum_{i=0}^{M-1} \sum_{j=0}^{N-1}[f(i,j) - \bar{f}] [g(i, j)-\bar{g}]$
Correlation coefficient: $r_{fg} = \frac{S^2_{fg}}{S_{ff}S_{gg}}$
Histogram: A function of the gray level, which represents the number of pixels with a certain gray level in the image, and reflects the frequency of each gray level in the image. Owns Additivity.
Integrated optical density: Reflects the combination of image area and density, $\int_{0}^{max(x)} \int_{0}^{max{(y)}} D(x, y)dx dy$ , An object with a threshold area of M, the average value of its internal gray levels: $\frac{IOD(M)}{A(M)} = \frac{\int_{M}^{\infin}{DH(D)dD}}{\int_{M}^{\infin} H(D)dD}$

Method for converting color image into grayscale image

Weighted average method
Average method
Maximum method

Image enhancement

Emphasize or sharpen certain features of the image, such as edges, contours, contrast, etc., for display, observation, or further analysis and processing

Basic operations on digital images

point operation
algebra operation: Remove superimposed noise, Generate image overlay effects
logical operation
geometric operation

Interpolation

Nearest neighbor interpolation
Bilinear interpolation
Cubic interpolation

Image noise

Image noise is random variation of brightness or color information in images, and is usually an aspect of electronic noise.

Influences: 1. makes the image blurry, 2. Overwhelming image features, 3. Difficulties in image analysis

Features:

Irregular distribution and size with randomness
Correlation between noise and image
Noise is additive

Noise classification:

Additive and multiplicative noise
- Additive noise generally refers to thermal noise and shot noise
- Multiplicative noise is generally caused by channel imperfections. Multiplicative randomness is considered to be caused by the time-varying nature of the system (such as fading or Doppler) or non-linearity.
External noise and internal noise
Stationary and non-stationary noise

Image noise model:

Gaussian noise

The probability density function p of a Gaussian random variable z is given by:
$p_G(z) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(z - u)^2}{2\sigma^2}}$
Salt-and-pepper noise(Impulse noise)
Rayleigh noise, Erlang noise, Exponentially distributed noise, Uniform noise…

Common noise removal methods:

Image enhancement ( Improve image recognizability )
Image recovery or restoration ( Compensating for the effects of noise )

Classification of image denoising algorithms:

Spatial domain filtering
Transform domain filtering
Morphological noise filter ( Combining on and off can be used to filter out noise )
Partial Differential Equations ( Partial differential equations have the characteristics of anisotropy. When applied to image denoising, they can remove edges while keeping edges well )
Variational method ( Determine the energy function of the image, and make the image smooth by minimizing the energy function )

Image filtering

Image filtering is to highlight image spatial information, suppress or remove noise and irrelevant information, is an image correction or enhancement technique. The essence of image filtering is a neighborhood operation.

Classification:

Spatial filtering ( Through window or convolution kernel, neighborhood range is small)
Frequency domain filtering ( Filtering the Fourier Transformed Spectrum Image, Large neighborhoods or removing periodic noise)

Spatial filter

Smooth spatial filter ———— blurring and noise reduction
- Smoothing linear filter ———— Mean filter
- Statistical sorting filter ———— Median filter
Sharpen spatial filter ———— Highlight details in an image or enhance blurred details
- Laplace operator
- Sobel operator

Definition of linear filters:

Linear filters process time-varying input signals to produce output signals, subject to the constraint of linearity.

Primary linear spatial filter:

Low-pass filter: Smooth images and remove noise
High-pass filter: Edge enhancement, edge extraction
Band-pass filter: Remove specific frequencies, rarely used in enhancement

Primary non-linear spatial filter:

Median filtering: Smooth images and remove noise
Maximum filtering: Find the bright point
Minimum filtering: Find the darkest point

Main purpose of the smoothing filter:

Delete useless small details before processing large images
Connect broken lines and curves
Reduce noise
Smooth processing to recover over-sharpened images
Image creation( Shadows, soft edges, hazy effects )

Smoothing Filter —— Mean Filter

The neighborhood averaging method is based on the assumption that there is a high spatial correlation between adjacent pixels in the image, and the noise is relatively independent.

Advantage:

A typical linear denoising method
Simple and fast, and can effectively remove Gaussian noise

Disadvantage:

Blur the image while reducing noise, especially at edges and details. And the larger the neighborhood, the greater the degree of blurring while the denoising ability is enhanced.

Improvement:

Overcoming the shortcomings of simple local averaging
The starting points are all focused on how to choose the size, shape and direction of the neighborhood, the number of points to participate in the average, and the weight coefficient of each point in the neighborhood.

Smoothing Filter —— Order Statistical Filter

Main characteristics:

Invariance to certain input signals: monotonic, periodic
Denoising
Spectrum characteristics: non-linear, there is no one-to-one correspondence
Effective at smoothing impulse noise, Protect sharp edges of images

Differential filter —— first order differential

horizontal direction: $KaTeX parse error: No such environment: equation at position 7: \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲}̲ H = \left[\beg…$ , Vertical direction: $KaTeX parse error: No such environment: equation at position 7: \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲}̲ H = \left[\beg…$

Post-processing:

Add a positive integer as a whole to ensure that all pixel values are positive.
Take all pixel values as absolute values.

Undirected first order sharpening

Cross Differential Algorithm (Roberts Algorithm):
$g (i, j) = ∣ f (i + 1, j + 1) - f (i, j) ∣ + ∣ f (i + 1, j) - f (i, j + 1) ∣$
Sobel sharpening:
$KaTeX parse error: No such environment: equation at position 55: … j)} \\ \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲}̲ d_x = \left[\b…$

Prewitt sharpening:
$KaTeX parse error: No such environment: equation at position 55: … j)} \\ \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲}̲ d_x = \left[\b…$
Canny operator:

Three guidelines:

Low error rate: The edge operator should only respond to edges and find all edges, but it should be discarded for non-edges.
Positioning accuracy: The distance between the edge pixels found by the edge operator and the true edge pixels should be as small as possible.
Unilateral response: Where unilateral exists, the test result should not appear multilateral.

Canny edge detection algorithm:

step1: Smooth the image with a Gaussian filter
step2: Calculate the magnitude and direction of the gradient using the finite difference of the first-order partial derivative
step3: Non-maximum suppression of gradient amplitude
- First determine whether the gray value of pixel C is the largest in its 8-value neighborhood
- If it is judged that the gray value of point C is less than either of these two points, it means that point C is not a local maximum, and then point C can be excluded as an edge
- After the non-maximum suppression is completed, a binary image will be obtained, and the gray values of the non-edge points are all 0
step4: Use double threshold algorithm to detect and connect edges
- Double threshold algorithm detection
  - Apply two thresholds th1 and th2 to the non-maximum suppression image, the relationship between the two is $th_1 = 0.4\times th_2$
  - Set the gray value of the pixel whose gradient value is less than $t h 1$ to 0 to obtain image 1.
  - Set the gray value of the pixel whose gradient value is less than $t h 2$ to 0 to obtain image 2.
  - Based on image 2 and supplemented by image 1 to connect the edges of the image.
- connect edges
  - Step1: Scan image 2. When a non-zero grayscale pixel p (x, y) is encountered, the contour line with p (x, y) as the starting point is tracked until the end point of the contour line q (x, y).
  - Step2: Consider the 8 adjacent areas of point s (x, y) corresponding to the position of q (x, y) point in image 2 in image 1. If a non-zero pixel s (x, y) exists in the 8 neighboring areas of the s (x, y) point, it is included in the image 2 as the r (x, y) point. Starting from r (x, y), repeat the first step until you can’t continue in both image 1 and image 2.
  - Step3: When the concatenation of the contour line containing p (x, y) is completed, the contour line is marked as visited. Go back to the first step and look for the next contour line.
  - Step4: Repeat steps 1, 2 and 3 until no new contour lines are found in image 2.

Differential filter —— second order differential

Laplace sharpening operator:

Laplacian is a scalar rather than a vector. It has linear characteristics and constant rotation, that is, isotropic properties. Often used in image processing.
$\frac{\partial^2 f}{\partial x^2} = f(i+1, j) - 2 f(i, j) + f(i-1, j) \\ \nabla^2f = \frac{\partial^2f}{\partial x^2} + \frac{\partial^2f}{\partial y^2}$
Common Laplacian Qualcomm templates:
$KaTeX parse error: No such environment: equation at position 8: \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲}̲ H_1 = \left[\…$
Isotropic filter: its response is independent of the direction of discontinuities in the filtered image.

Analysis of the effect of several templates on sharpening edges:
$KaTeX parse error: No such environment: equation at position 8: \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲}̲ H_1 = \left[\…$
The effects of H1 and H2 are basically the same. The effect of H3 is the worst, and H4 is the closest to the original.

Laplacian operator is more sensitive to noise, laplacian operator responds double to certain edges in an image.

Improvement strategy:

Improve edge detection based on human visual characteristics
The image is generally smoothed first, usually a Laplacian operator and a smoothing operator are combined to generate a new template

Laplacian of Gauss(LoG)

First use Gaussian function to smooth the image, and then use Laplacian operator to form Laplacian-Gauss algorithm:
$(\frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2}) \frac{1}{2 \pi \sigma^2} \exp( - \frac{(x^2 + y^2)}{2 \sigma^2 }) = \frac{-1}{2 \pi \sigma^4} (2 - (\frac{x^2 + y^2}{\sigma^2})) \exp( - \frac{(x^2 + y^2)}{2\sigma^2})$
The main ideas and steps of the algorithm are as follows:

Filtering: First smooth the image f (x, y). Its filtering function is selected as a Gaussian function according to the characteristics of human vision.
Enhancement: Laplacian on smooth image g (x, y)
Detection: The edge detection criterion is the zero-crossing point of the second derivative (the point at h (x, y) = 0) and corresponds to the larger peak of the first derivative.

Wallis algorithm

Considering that human visual characteristics include a logarithmic link, a method of logarithmic processing is added to improve the sharpening:
$$
g(i, j) = \log{f(i, j)} - \frac{1}{4}S , \
S = \log{f(i-1, j)} + \log{f(i+1, j)} + \log{f(i, j-1)} + \log{f(i, j+1)}

$$
Note:

In order to prevent the logarithm of 0, $\log (f (i, j) +1)$ is actually used in the calculation
Because the logarithmic value is small, $\log (256) = 5.45, 46 \times \log (f (i, j) +1) $ is used for calculation

Comparison of Sobel and Laplacian algorithms

The boundary obtained by the Sobel operator is a relatively rough boundary, which reflects less boundary information, but the reflected boundary is clearer.

The boundary obtained by Laplacian operator is a more detailed boundary. The reflected boundary information includes a lot of detailed information, but the reflected boundary is not too clear.

Comparison of spatial domain sharpening methods

Because Prewitt and Sobel operators are first-order differential operators, they have a certain suppression effect on noise. Among them, Sobel operators have higher sensitivity to gradients than Prewitt operators, and the detection effect is better.

The Laplacian operator is a second-order differential operator, which is very sensitive to gradient changes and very sensitive to noise.

Frequency domain enhancement principle

What is the frequency domain: Space defined by frequency variables (u, v).

A function f (t) with period T satisfies the Dirichlet condition on $[- T / 2, T / 2]$ , and then can develop into a Fourier series in $[- T / 2, T / 2]$ :
$f_T(t) = \frac{a_0}{2} + \sum_{n=1}^{\infin}(a_n\cos{nwt + b_n\sin{nwt}}) = \sum_{n = 1} ^ {\infin}c_ne^{jnwt}$

Dirichlet conditions include three aspects:

Consecutive or limited first-class discontinuities during a week
The number of maxima and minima should be limited during the week
Signals are absolutely integrable during the week

How fast the signal changes is frequency dependent:

Noise, edges, and jumps represent high-frequency components of the image
Background area and slow-changing parts represent low-frequency components of the image

The frequency domain processing of image information has the following characteristics:

Energy conservation, but energy redistribution
Conducive to extracting certain features of the image
Fast algorithms in the frequency domain can greatly reduce the amount of calculation and improve processing efficiency

The steps for frequency domain image enhancement are:

Multiply the input image by $1)^{(x + y)}$ for center transformation
Perform a two-dimensional Fourier transform on the calculated result image to find F (u, v)
Multiply F (u, v) by the designed transfer function H (u, v), cacluate $G (u, v) = H (u, v) F (u, v)$
Find the inverse Fourier transform of the result of step (3), cacluate $g(x, y) = F^{-1}[G(u, v)]$
Take the real part of the calculation result of step, and multiply $1)^{(x + y)}$ to get g(x, y)

Design of the transfer function H (u, v):

Filters are largely specified intuitively.
Select the frequency filter by using the correspondence between the frequency components and the appearance of the image
Design of two-dimensional digital filters based on approximation of mathematical and statistical criteria.

Frequency domain filtering

Frequency-domain filtering highlights or weakens various spatial frequencies of the image, and changes the image data by modifying the image frequency components to achieve the purpose of suppressing noise or improving image quality

Low-pass filter:

Rectangular filter: $rect(\frac{x}{2a}) \Rightarrow F(s) = 2a \frac{sin(2\pi as)}{2\pi as}$
Triangle filter
- Ideal low-pass filter: $H(u, v) = $， where $\sqrt{u^2 + v^2}$
- Butterworth low-pass filter: $\frac{1}{1 + [D(u, v)/D_0]^{2n}}$ , where n is order of function
- Exponential filter: $e^{-[\frac{D(u, v)}{D_0}]^{n}}$ , There is a smooth transition between high and low frequencies, so the ringing phenomenon is relatively weak. H (u, v) generally decays faster at the beginning as the frequency increases. It has a stronger ability to filter high-frequency components and has a larger blur.
- Gaussian low-pass filter(Exponential filter special case): $e^{-D^2(u, v)/{2\sigma^2}}$ .

High-pass filter:

The edges and details of the image are mainly at high frequencies, and the image blur is caused by weak high-frequency components.

The transfer function of the high-pass filtering in the frequency domain:
$H_{hp}(u, v) = 1 - H_{lp}(u, v)$
The larger $D_0$ , the sharper the sharpenin.

Butterworth high pass filter: $\frac{1}{[1 + (D_0/D(u, v))]^{2n}}$

High-frequency enhancement filter:

Add a constant to the transfer function of the high-pass filter in the frequency domain to add some low-frequency components back.
$G_e(u, v) = G(u, v) + cF(u, v), c \in [0, 1]$

High-boost filter:

Multiply the original image by an enlargement factor A and subtract the low-pass image
$G_{HB}(u, v) = A \times F(u, v) - F_L(u, v) = (A-1)F(u, v) + F_H(u, v)$
A = 1: high-pass filter, A>1: High-frequency enhancement filter.

Band-pass and band-stop filter:

Band stop filter:

Definition of a n-th Order Radially Symmetric Butterworth Band Stop Filter(Butterworth Band Pass Filter):
$\frac{1}{1 + [\frac{D(u, v)W}{D^2(u, v)-D_0^2}]^{2n}}$
where W is the stopband bandwidth and D0 is the center radius of the stopband.

Homomorphic filtering:

Homomorphic filtering is achieved by filtering homomorphic systems using the principle of generalized superposition.

Homomorphic filtering is a frequency processing method capable of compressing the gray scale range and enhancing contrast.

Basic idea:

Enhance high frequencies, reduce low frequencies, increase contrast to eliminate multiplicative noise.

Step:

Logarithm: $\Rightarrow \ln{f(x, y)} = \ln{i(x, y) + \ln{r(x, y)}}$
Fourier transform: $F (u, v) = I (u, v) + R (u, v)$
Design H (u, v) filtering: $H (u, v) F (u, v) = H (u, v) I (u, v) + H (u, v) R (u, v)$
Inverse Fourier transform: $h_f(x, y) = h_i(x, y) + h_r(x, y)$
Exp: $\exp{h_f(x, y)} = \exp{h_i(x, y)} \cdot \exp{h_r(x, y)}$

H (u, v) design:
$H_{homo}(u, v) = [H_H - H_L]H_{high}(u, v) + H_L$

Comparison between frequency domain technology and spatial domain technology

spatial domain technology based on partial pixels, Frequency domain technology based on global
(function)spatial domain: smoothing filter and Sharpening filter, Frequency domain: low-pass filtering and high-pass filtering
(Algorithms)spatial domain: convolving images and templates, Frequency domain: multiply.

Short-time Fourier transform

Adding a small window to the signal is mainly focused on transforming the signal in the small window, so it reflects the local characteristics of the signal.

defect:

The size and shape of the window function are independent of time and frequency and remain fixed, which is not good for analyzing time-varying signals.
High-frequency signals have a short duration and long low-frequency signals. I hope to use a small time window for high frequencies and a large time window for low frequency analysis. STFT can do nothing.
Cannot form orthogonal basis, which brings inconvenience to numerical calculation

Wavelet Transform

Advantage:

Inherited and developed the localization of STFT
Overcome the shortcomings of the window size does not change with frequency and lack of discrete orthogonal basis

Extract changes in “specified time” and “specified frequency” in a signal.

Orthogonal basis:

Two different descriptions, called orthogonal
If these bases can completely represent all objects, they are called complete feature bases.

Wavelet:

A special waveform with a limited length and an average value of 0.

Two properties: 1. With limited duration and abrupt frequency and amplitude, a limited time, the average is 0.

Features: 1. have compact support or approximate compact support in the time domain, 2. Positive and negative alternating “fluctuation” with a DC component of 0.

Wavelet analysis:

A signal is decomposed into a series of wavelets after the mother wavelet is scaled and translated, so the wavelet is the basis function of the wavelet transform
The wavelet transform can be understood as the result of the Fourier transform of the sine and cosine waves of the Fourier transform with a series of wavelet functions scaled and translated

The continuous wavelet transform is expressed by the following formula:
$\int_{-\infin}^{+\infin}f(t)\psi(scale, position, t)dt$
The transform result of CWT is many wavelet coefficients C.

CWT wavelet transform steps:

Take a wavelet and compare it to the front part of the signal
Calculate the correlation factor C, where C represents the correlation between the wavelet and this data
Calculate the coefficient value C after translation. Move the wavelet to the right, repeat steps 1 and 2 to traverse the entire data
Calculate the scaled coefficient value C. Scale wavelet, repeat steps 1 to 3.
Repeat for all wavelet scales

Discrete Wavelet Transform:

If both the zoom factor and translation parameters are selected as multiples of 2j. A wavelet transform that uses such scaling factors and translation parameters is called a Dyadic Wavelet Transform and is a form of DWT.(Usually DWT refers to two-scale wavelet transform)

Mallat Algorithm:

It is actually a signal decomposition method, often called dual-channel subband coding in digital signal processing.

dyadic wavelet for image edge detection, image compression and reconstruction.

A filter is a low-pass filter, and the approximate value A (Approximations) of the signal is obtained.

The other is a high-pass filter, which gets the signal’s detail value D (Detail), which is a coefficient calculated by a small scaling factor.

Wavelet Decomposition Tree:

When a filter is used to transform a real digital signal, the resulting data will be twice the original data.

According to the Nyquist sampling theorem, a down-sampling method is proposed, that is, one sample is taken from every two samples in each channel, and the coefficients of the discrete wavelet transform obtained are expressed by cD and cA.

Wavelet Reconstruction:

Using the wavelet decomposition coefficients of the signal to restore the original signal, this process is called wavelet reconstruction or Wavelet Synthesis:

The approximate value or detail value of the signal is reconstructed from the approximation coefficient and the detail coefficient, respectively, as long as the approximation or detail coefficient is set to zero.

Selection of filters in signal reconstruction:

Quadrature Mirror Filters, QMF system.

Wavelet Packet:

Image Compression
The details are the same as the approximation and can be decomposed
For N-layer decomposition, it produces 2N different pathways

Commonly used wavelet functions

The wavelet function satisfies the requirements:

The wavelet must be oscillating
Wavelet amplitude can only be non-zero over a short period of time, which is local

Haar Wavelet:
$\psi(t) = {1,0≤t<1/2−1,1/2≤<10,otherwise$

\\ \hat{\psi}(\omega) = i\frac{4}{\omega}e^{-i\omega/2} \sin^2(\omega/4)

ψ (t) = ⎩ ⎪ ⎨ ⎪ ⎧ 1, - 1, 0, 0 \leq t < 1 / 2 1 / 2 \leq < 1 o t h e r w i s e \hat{ψ} (ω) = i \frac{4}{ω} e^{- i ω / 2} sin^{2} (ω / 4)

advantage:

Tight support in time domain, non-zero interval is (0,1);
Orthogonal wavelet
Symmetry
Currently the only orthogonal wavelet with both symmetry and limited support

Defect: Discontinuous

Daubechies wavelet: Orthogonal wavelet, (also called db wavelet)

Morlet wavelet: Seismic signal analysis, A single-frequency complex sine function with a Gaussian envelope, Symmetry

Gaussian wavelet: $\psi(t)=-\frac{1}{\sqrt{2\pi}}te^{-t^2/2}, \hat{\psi(\omega)} = i\omega e^{-\omega^2/2}$ , symmetry, extraction for stepped boundaries.

Marr wavelet:
$\psi(t) = \frac{2}{\sqrt{3\sqrt{\pi}}}(1 - t^2)e^{-t^2/2}, \hat{\psi} = \frac{2 \sqrt{2\sqrt{\pi}} }{\sqrt{3}} (\omega)\omega^2e^{-\omega^2/2}$

It is the second derivative of the Gaussian function and has important applications in the edge extraction of signals and images.

It is mainly used for the extraction of roof-type boundaries and Dirac edges.

feature: Exponential decay, non-compact support; has very good localization of time frequency; about 0-axis symmetry

…too much wavelet.

Application of discrete wavelet transform in image processing

Image feature extraction
Image Compression
Data hiding and image watermarking
Image fusion

Image pyramid

Image Pyramid is a series of images whose resolution is gradually reduced in the shape of a pyramid and comes from the same original image.

Gauss Pyramid, Laplacian pyramid.

秒客网

Computer Vision 计算机视觉常见问题整理

CV Question

相关文章