Earl Levine, earl (at) stanfordalumni.org

- 16 x 16 (that is, a single motion vector for the whole macroblock)
- 16 x 8 (two motion vectors)
- 8 x 16 (two motion vectors)
- 8 x 8 (four motion vectors)

Presumably the extra bits needed to transmit more motion vector information could be offset in some macroblocks by the reduced residual gained from specifying more exactly the border of a region of motion. For example, in a simple scene of an large static diamond-shaped object moving uniformly in front of a static background, being able to specify a very good approximation of the diagonal boundaries of the foreground object could allow reducing the residual to near zero.

Each possible of partitioning tried takes significant computation in the encoder. The goal of this work is to investigate one technique for reducing the number of partitionings searched with the goal of reducing complexity with acceptible loss of quality in the encoding.

Intuition suggests that if a motion boundary crosses through a macroblock the cost difference between a fine partitioning and coarser partitionings would be significant. Therefore if the search begins with the smaller subpartitionings (4 x 4, 4 x 8, 8 x 4, and 8 x 8) and the optimal-cost partitioning so far is much better than the coarsest partitioning (that is, 8 x 8) then it seems likely that searching even larger partitionings (8 x 16, 16 x 8, and 16 x 16) may be a waste of time. With this in mind, for each macroblock the algorithm tried is this:

- Compute the cost of all partitionings using 8 x 8 or smaller. Don't compute the costs of the of the larger partitionings (16 x 8, 8 x 16, or 16 x 16) yet.
- Compute a decision metric which is the sum, over all four 8 x 8 subblocks of the macroblock, of the difference between the cost of that subblock's largest (i.e. 8 x 8) subpartitioning and the cost of the optimal subpartitioning. Note that this metric is nonnegative, and could be 0 if 8 x 8 is optimal for all subblocks.
- Compare this metric to a threshold. If the metric is larger than the threshold, skip searching and comparing to the larger partitionings, otherwise go ahead and search them as well.

This algorithm is just a simple scheme based on intuition and would be expected to be beaten by an algorithm based on an analysis of the cost data over a sample set. However the goal is to check whether this intuition is roughly right using experimental results.

The specific verion of reference software used was JM 7.3 and the default encoder configuration was used except for disabling rate-distortion decision for partitioning search. This simplified the confusion of the cost comparisons in that software.

In order to get an rough idea of the values of thresholds to try, simple statistics were collected on the value of the decision metric for cases in which the optimal choice was either a large partition size or smaller subpartitions. For those two cases respectively the mean of the metric was found to be 45.7 and 145.1 for the data used.

Threshold | Execution time in seconds | % execution time savings | % of macroblocks with reduced search |
average loss per macroblock in internal cost measure |
SNR Y, dB | SNR U, dB | SNR V, dB | bitrate (kb/s) |

infinite (full search) | 155.74 | 0 | 0 | 0 | 35.67 | 39.01 | 40.63 | 89.21 |

120 | 152.03 | 2.4 | 16.5 | 2.62 | 35.67 | 38.99 | 40.67 | 89.69 |

100 | 152.87 | 1.8 | 24.6 | 5.85 | 35.63 | 39.00 | 40.62 | 89.42 |

80 | 148.07 | 4.9 | 33.25 | 8.96 | 35.64 | 38.98 | 40.67 | 90.58 |

70 | 147.54 | 5.3 | 37.4 | 11.2 | 35.59 | 38.98 | 40.66 | 91.71 |

60 | 147.27 | 5.4 | 41.6 | 14.1 | 35.60 | 38.98 | 40.69 | 90.83 |

50 | 147.06 | 5.6 | 46.3 | 17.2 | 35.59 | 38.98 | 40.66 | 91.71 |

40 | 146.50 | 5.9 | 50.4 | 20.2 | 35.60 | 38.95 | 40.68 | 93.69 |

30 | 144.42 | 7.3 | 54.8 | 23.9 | 35.62 | 38.95 | 40.66 | 92.87 |

20 | 142.91 | 8.2 | 60 | 27.9 | 35.62 | 38.94 | 40.67 | 95.44 |

The overall trend that emerges from this data is that the reduced search decision can be made affimatively for a significant portion of macroblocks while having a less significant impact on encoding quality. For example, when the threshold is set to 50, reduced search may be performed on about half of macroblocks while only increasing bitrate by about 3 percent.

The reference model software does not reveal great runtime gains from reducing the number of partitionings searched. This is because much of the execution time (about 45% typically) is occupied by precomputation for motion search (i.e. subsampling). In a system where that task is accelerated by special-purpose computation hardware, the portion of execution taken by search over motion partitioning would be more significant.

The algorithm used here for deciding whether to avoid further searching is quite simple and could certainly be improved upon. A good next step would be to gather statistics over a larger set of video sequences and use that data to determine optimal parameters of a predictor, for example to predict the loss incurred by skipping the full search.

There is no reason to assume that the correct order to search is the one used here or that only a single decision should be made. For example, one might first try 16 x 16, 8 x 16, and 16 x 8 partitionings, then based on those costs make a decision whether to try 8 x 8. If the decision is made to continue, after determining the cost of the 8 x 8 partitioning another decision might be made about whether to continue to the smaller partitionings and so on.

Iain E G Richardson, "H.264 / MPEG-4 Part 10 White Paper: Prediction of Inter Macroblocks in P-slices", http://www.vcodex.fsnet.co.uk/h264_interpred.pdf

Karsten Suhring, editor, H.264/AVC reference software, http://bs.hhi.de/~suehring/tml/

TML Project Video Sequences, http://kbs.cs.tu-berlin.de/~stewe/vceg/sequences.htm