Training Image Operators from Samples - Research at IME-USP

Introduction to Image Operator Learning

A large number of Image Processing and Computer Vision problems, such as Optical Character Recognition (OCR), Face Detection, Object Detection and Document Analysis, are essentially Machine Learning problems. These problems involve predicting a class either from parts of an image (in the case of OCR, regions of the images are classified as letters) or from the whole image (in Document Analysis, it may be desired to differentiate documents types). Although many different methods exists to solve these problems, most of the algorithms are developed exploring characteristics implicit to their domains of application. Also, these algorithms require deep knowledge of both Image Processing/Computer Vision and the domain of application, making them hard to approach by non-experts. Image Operator Learning proposes an alternative approach. Instead of treating each problem individually, we use the following generic formulation based on Machine Learning.

Given pairs of images containing an input image Ii and its processed form Oi, estimate the transformation f(Ii) = Oi with minimum error. The estimation must also have small error in images belonging to the same domain as Ii. The transformation f is called an Image Operator.

Image Operator Learning studies how to estimate the transformation f . In particular, W-Operator Learning, which is the focus of TRIOS, represents f as a local transformation. To determine the value of pixel p in the output, only a fixed neighborhood W around p is analysed. The local transformation f : 𝒫 (W) → 𝓛 is called W-Operator and its neighborhood W is a Window. 𝓛 is the range of values present in the output images.

W-Operator Learning

The estimation process of W-Operators is divided in three main steps. In the first step, Feature Extraction, we calculate the joint frequency for all the patterns that occur in the input images and their respective outputs. In the second step, Optimization, we minimize the error in the training images by assigning the optimum label for all the observed patterns. The third step, Generalization, uses a classifier to assign labels to the unobserved patterns. This process was first presented in papers [1] and [2] by Edward Dougherty.


Figure 1. Training process for Image Operator Learning.

Suppose that the {Ii} and {Oi} sequences of images come from jointly stationary processes (I, O) . Let W be a finite Window. The Mean Square Error of the W-Operator fW defined by a Window W with respect to (I, O) is

MSE( fW ) = E[(fW(I-zW) - O(z))2],

where I-z denotes the translation of I by z. wz = I-zW contains all pixels around z that belong to the Window W. wz is a window configuration and, together with O(z) , comprehend all the information that will be analysed by the training process. The extraction of the pairs (wz, O(z)) for all pixels z in all pairs (Ii, Oi) is done at the Feature Extraction step.

(a) Cross shaped Window. (b) Window positioned on a pixel. (c) Window configuration extracted.

Let Xz = I-zW , then Xz is a random vector. Let Yz = O(z) be a random variable that holds the value of O in the pixel z. Since I and O are stationary, z may be dropped and (X, Y) characterizes (I, O) locally with respect to W . Therefore,

MSE( fW ) = E[(fW(X) - Y)2].

The optimal MSE estimator is E[Y|X] . In other words, an Image Operator F is optimum with respect to the MSE if F(x) = ∑y y P(x, y) . In practice, the distribution P(X, Y) is unknown, but can be estimated using the frequency of the pairs (wz,O(z)). The assignment of the labels that minimize the MSE is done in the Optimization step.

Finally, it is possible that some relevant patterns do not appear in the input images. The Generalization feeds a classifier with the data provided by the Optimization step to obtain labels for the unobserved patterns. It is important that the classifier does not change the decision for the observed patterns.

Algorithm for W-operator learning

TRIOS implements the following algorithm for W-Operator learning.

  1. for each image pair (i, o) in the training set do:
  2.     for each pixel z in i do:
  3.         wz = I-zW
  4.         freq[(wz, oz)] += 1
  5. for each observed pattern wz do:
  6.     label[wz] = ∑y y (freq[(wz, y)]/freq[wz])
  7. train a classifier using the pairs (wz, label[wz]). It is necessary to mantain the labels assigned in the previous step.

The Incremental Splitting of Intervals (ISI) algorithm (described in [3]) produces binary W-Operators with good performance. Decision Trees are used for Gray-level Operators.

The application of the Image Operator follows the same steps. First the patterns wz are extracted positioning the window W at each pixel z. Then the value of pixel z in the output is the result of the classification of the pattern wz.

Challenges and Open Problems

W-Operator Learning for Gray Level images is still a challenge due the size of the space of possibilities. There are k|W|k possible local transformations, where |W| is the size of the window and k is the range of gray level of the images. In this case the Generalization error is very high, since most of the relevant patterns are not observed. Aperture Operators [4] try to mitigate this effect by restricting the range of gray levels considered. However, it is still necessary to determine an adequate range of gray levels and an adequate Window.

As seen previously, the Window W used in the Feature Extraction step determines the probability distribution P(X, Y). If W is too small, the resulting W-Operator will fail to discriminate larger patterns. If W is too large, too many relevant patterns will not be observed, lowering the quality of the estimation of P(X, Y) and increasing Generalization error. The Window Selection is a Feature Selection problem and is one of the open problems in W-Operator learning.

Finally, Multi-Level Operator training[5] improves the W-Operator training presented previously by combining the results of many Image Operators designed with different windows. The result is a W-Operator that effectively uses a large window but does not have a large Generalization error. It is necessary, however, to determine 1. the number of operators combined and 2. the window of each individual Image Operator.

Bibliography

[1] Edward R. Dougherty. 1992. Optimal mean-square N-observation digital morphological filters: i. optimal binary filters. CVGIP: Image Underst. 55, 1 (January 1992), 36-54. DOI=10.1016/1049-9660(92)90005-N http://dx.doi.org/10.1016/1049-9660(92)90005-N

[2] Edward R. Dougherty. 1992. Optimal mean-square N-observation digital morphological filters: ii. optimal gray-scale filters. CVGIP: Image Underst. 55, 1 (January 1992), 55-72. DOI=10.1016/1049-9660(92)90006-O http://dx.doi.org/10.1016/1049-9660(92)90006-O

[3] Hirata, N.S.T., R. Hirata Jr., Barrera, J.: Basis computation algorithms. In: Mathematical Morphhology and its Applications to Signal and Image Processing (Proceedings of the 8th International Symposium on Mathematical Morphology). (2007) 15–26

[4] Hirata Jr, R., Brun, M., Barrera, J., & Dougherty, E. R.: Aperture filters: theory, application and multiresolution analysis. Advances in Nonlinear Signal and Image Processing, (2006) 15-45.

[5] Nina S. T. Hirata, "Multilevel Training of Binary Morphological Operators," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 707-720, April, 2009.