This is a highly-cited paper. The context aware saliency proposed based on four principles, which can be explained as follows:
1. Areas that have distinctive colors or patterns should obtain high saliency;
2. Frequently occurring features should be suppressed;
3. The salient pixels should be grouped together and not spread over the image;
4. High-level factors such as priors on the salient object location and object detection are useful.
Steps:
1. Local global single-scale saliency.(Principle 1-3)
is the euclidean distance between the positions of the two patches, is the euclidean distance between the two patches in CIE L*a*b color space. This dissimilarity measure is proportional to the color difference and inversely proportional to the positional distance.
Finding the most K similar patches of the current patch centering at the current processed pixel and summing up, the single-scale saliency value is defined as above.
2. Multiscale saliency enhancement
For every patch of scale r, we search its neighboring patches who's scale range in {r, r/2, r/4}. Hence, the saliency of each pixel can be rewritten as :
Saliency map will be normalized to [0, 1]. Instead of just considering a single scale(r) of each patch, we represent each of them in multiscale(M scales for example). Then the saliency is :
3. Including the immediate context(principle 3)
The main purpose of this step is to take more attention to the area that are close to the foci of attention while attenuate those far away from.
To get the foci of attention, we set a threshold(0.8 in the paper) at each scale and its corresponding saliency map . Let be the euclidean positional distance between pixel i and the closest focus of attention pixel at scale r, normalized to [0,1]. The saliency of pixel i is redefined as :
Here is the corresponding picture:
4. Center prior(principle 4)
To enhance those near to the image center while depress others.
5. High-level factors(principle 4)
For example, one could incorporate the face detection algorithm, which generates 1 for face pixels and 0 otherwise. The saliency map can then be modified by taking the maximum value of the saliency map and the face map. This part is excluded in this paper.
.