# Introduction to Image Operator Learning

A large number of Image Processing and Computer Vision problems, such as Optical
Character Recognition (OCR), Face Detection, Object Detection and Document
Analysis, are essentially Machine Learning problems. These problems involve
predicting a **class** either from parts of an image (in the case of OCR,
regions of the images are *classified* as letters) or from the whole image (in
Document Analysis, it may be desired to differentiate documents types). Although
many different methods exists to solve these problems, most of the algorithms
are developed exploring characteristics implicit to their domains of
application. Also, these algorithms require deep knowledge of both Image
Processing/Computer Vision and the domain of application, making them hard to
approach by non-experts. **Image Operator Learning** proposes an alternative
approach. Instead of treating each problem individually, we use the following
generic formulation based on Machine Learning.

**Given pairs of images containing an input image I _{i} and its
processed form O_{i}, estimate the transformation f(I_{i}) =
O_{i} with minimum error. The estimation must also have small error in
images belonging to the same domain as I_{i}. The transformation f is
called an Image Operator. **

*Image Operator Learning* studies how to estimate the transformation ** f **.
In particular,

*W-Operator Learning*, which is the focus of TRIOS, represents

**as a local transformation. To determine the value of pixel**

*f***in the output, only a fixed neighborhood**

*p***around**

*W***is analysed. The local transformation**

*p***is called**

*f*: 𝒫 (*W*) → 𝓛*W-Operator*and its neighborhood

**is a**

*W**Window*.

**𝓛**is the range of values present in the output images.

*W-Operator Learning*

The estimation process of *W-Operators* is divided in three main steps. In the
first step, **Feature Extraction**, we calculate the joint frequency for all
the patterns that occur in the input images and their respective outputs. In
the second step, **Optimization**, we minimize the error in the training images
by assigning the optimum label for all the observed patterns. The third step,
**Generalization**, uses a classifier to assign labels to the unobserved
patterns. This process was first presented in papers [1] and [2] by Edward
Dougherty.

Figure 1. Training process for
Image Operator Learning.

Suppose that the ** {I_{i}} ** and

**sequences of images come from jointly stationary processes**

*{O*_{i}}**(**. Let

*I*,*O*)**be a finite Window. The Mean Square Error of the**

*W**W-Operator*

**defined by a Window**

*f*_{W}**with respect to**

*W***(**is

*I*,*O*)**MSE(**

*f*) = E[(_{W}*f*(_{W}*I*_{-z}∩*W*) -*O*(z))^{2}],where **I _{-z}** denotes the translation of

**by**

*I***z**.

**w**contains all pixels around

_{z}=*I*_{-z}∩*W***z**that belong to the Window

*W*.

**w**is a

_{z}*window configuration*and, together with

**, comprehend all the information that will be analysed by the training process. The extraction of the pairs**

*O*(z)**(w**for all pixels

_{z},*O*(z))**z**in all pairs

**(**is done at the

*I*,_{i}*O*)_{i}**Feature Extraction step**.

(a) Cross shaped Window. | (b) Window positioned on a pixel. | (c) Window configuration extracted. |

Let ** X_{z} = I_{-z} ∩ W **, then

**is a random vector. Let**

*X*_{z}**be a random variable that holds the value of**

*Y*_{z}=*O*(z)**in the pixel**

*O***z**. Since

**and**

*I***are stationary,**

*O***z**may be dropped and

**(**characterizes

*X*,*Y*)**(**locally with respect to

*I*,*O*)**. Therefore,**

*W***MSE(**

*f*) = E[(_{W}*f*(_{W}*X*) -*Y*)^{2}].The optimal MSE estimator is ** E[Y|X] **. In other words, an Image
Operator ** F ** is optimum with respect to the MSE if

**. In practice, the distribution**

*F*(x) = ∑_{y}y P(x, y)**P(**is unknown, but can be estimated using the frequency of the pairs

*X*,*Y*)**(w**. The assignment of the labels that minimize the MSE is done in the

_{z},*O*(z))**Optimization step**.

Finally, it is possible that some relevant patterns do not appear in the input
images. The **Generalization** feeds a classifier with the data provided by the
**Optimization step** to obtain labels for the unobserved patterns. It is
important that the classifier does not change the decision for the observed
patterns.

*Algorithm for W-operator learning*

TRIOS implements the following algorithm for *W-Operator* learning.

**for each**image pair**(**in the training set*i*,*o*)**do**:-
**for each**pixel**z**in*i***do**: -
**w**_{z}=*I*_{-z}∩*W* -
*freq*[(w_{z}, o_{z})] += 1 **for each**observed pattern w_{z}**do**:-
*label*[w_{z}] = ∑_{y}y (*freq*[(w_{z}, y)]/*freq*[w_{z}]) - train a classifier using the pairs
**(w**. It is necessary to mantain the labels assigned in the previous step._{z},*label*[w_{z}])

The *Incremental Splitting of Intervals (ISI)* algorithm (described in [3])
produces binary *W-Operators* with good performance. Decision Trees are used for
Gray-level Operators.

The application of the *Image Operator* follows the same steps. First the
patterns **w _{z}** are extracted positioning the window

**at each pixel**

*W***z**. Then the value of pixel

**z**in the output is the result of the classification of the pattern

**w**.

_{z}*Challenges and Open Problems*

*W-Operator Learning* for Gray Level images is still a challenge due the size of
the space of possibilities. There are **k ^{|W|k}** possible
local transformations, where

**|W|**is the size of the window and

**k**is the range of gray level of the images. In this case the

**Generalization**error is very high, since most of the relevant patterns are not observed.

*Aperture Operators*[4] try to mitigate this effect by restricting the range of gray levels considered. However, it is still necessary to determine an adequate range of gray levels and an adequate Window.

As seen previously, the Window ** W ** used in the

**Feature Extraction**step determines the probability distribution

**P(**. If

*X*,*Y*)**is too small, the resulting**

*W**W-Operator*will fail to discriminate larger patterns. If

**is too large, too many relevant patterns will not be observed, lowering the quality of the estimation of**

*W***P(**and increasing

*X*,*Y*)**Generalization**error. The

**Window Selection**is a

*Feature Selection*problem and is one of the open problems in

*W-Operator*learning.

Finally, *Multi-Level Operator* training[5] improves the *W-Operator* training
presented previously by combining the results of many *Image Operators* designed
with different windows. The result is a *W-Operator* that effectively uses a
large window but does not have a large **Generalization** error. It is
necessary, however, to determine 1. the number of operators combined and 2. the
window of each individual *Image Operator*.

#### Bibliography

[1] Edward R. Dougherty. 1992. Optimal mean-square N-observation digital morphological filters: i. optimal binary filters. CVGIP: Image Underst. 55, 1 (January 1992), 36-54. DOI=10.1016/1049-9660(92)90005-N http://dx.doi.org/10.1016/1049-9660(92)90005-N

[2] Edward R. Dougherty. 1992. Optimal mean-square N-observation digital morphological filters: ii. optimal gray-scale filters. CVGIP: Image Underst. 55, 1 (January 1992), 55-72. DOI=10.1016/1049-9660(92)90006-O http://dx.doi.org/10.1016/1049-9660(92)90006-O

[3] Hirata, N.S.T., R. Hirata Jr., Barrera, J.: Basis computation algorithms. In: Mathematical Morphhology and its Applications to Signal and Image Processing (Proceedings of the 8th International Symposium on Mathematical Morphology). (2007) 15–26

[4] Hirata Jr, R., Brun, M., Barrera, J., & Dougherty, E. R.: Aperture filters: theory, application and multiresolution analysis. Advances in Nonlinear Signal and Image Processing, (2006) 15-45.

[5] Nina S. T. Hirata, "Multilevel Training of Binary Morphological Operators," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 707-720, April, 2009.