Methods for detecting Optical Flow

Optical Flow attempts to describe the motion of object and the observer between two pictures or frames of a video by tracking the brightness or color of pixels representing those objects. The assumption is that the pixels from the same object in the first remain the same in the second. Finding matching pixels may then track movement.

This can be done in color or grayscale ("black and white"). For the purposes of simplicity and understanding, we will assume grayscale images.

Horn-Schunk (HS) method¹: If the motion in the image is smooth, and few objects move against each other, or for large distances, then it is likely that pixels near each other will move in the same way. Also, for each pixel in the image, it is easier to check only the neighboring pixels for sameness. To apply this over an image with many pixels, we can down-sample the source and destination images into smaller and smaller versions, by averaging each block of 4 pixels into a single pixel in the new image. Repeating this until we have very small images with few pixels allows us to then look at each pixel in the source image, compare it to the same and neighboring pixels in the second image, and find the one which is the best match. This gives us a gross idea of motion between the images.

We can now modify the source image, one level up in resolution, by moving the areas represented by each pixel of the smaller image in the direction of apparent flow. We are attempting to make the slightly larger source image look more like the larger destination image, based on the single pixel motion we see in the smaller set of images. e.g. if an entire area of 4 pixels (2x2) in the larger source image moved 4 pixels to the right in the destination image, then we will see that change by comparing the related single pixels in the smaller images. By actually moving those areas in the larger source image, we remove (but remember) that motion at the larger level. Now, if one of the pixels in that 4 pixel set actually moved relative to the others, we would have missed it when comparing the original larger images, because they were offset by 4 pixels. After making the source more like the destination, the slight motion of the signal pixel can be detected.

We repeat this single pixel flow detection and modification of the larger source image based on that flow, back up the set of larger and larger images until we have 1. made the original source image look just like the destination image (except for any single pixel changes in that image) and 2. recorded all the changes required to make that happen.

Limitations: If objects move against each other, the edges will be smoothed into each other. HS expects things to move together so works very well when the camera is moving or the focus is on a single object against a simple background. Also, small objects moving large distances will be completely missed.

Lucas-Kanade (LK) Method^1,
2,
3: Uses a window around each pixel to compare that region between frames. The method of comparison is complex, based on gradients.

Assume that we watch a scene through a square hole in a mask. The intensity visible through the hole is variable as we move the mask around. As we move the mask down and right, we see the intensity increase.

If, in the next frame, the intensity of the pixel at that same location has increased then it would be sensible to assume that a displacement of the underlying object to the left and up has occurred so that the new intensity is now visible under the square hole.

If we know that the increase in brightness per pixel at pixel (x, y) is I_x(x, y) is the x-direction, and the increase in brightness per pixel in the y direction is I_y(x, y), we have a total increase in brightness, after a movement by u pixels in the x direction and v pixels in the y direction of:
I_x(x, y) · u + I_y(x, y) · v

The negative sign is necessary because for positive Ix, Iy, and It we have a movement to the left and down.

If
Image[t+Vt, x+Vx,y+Vy] = Image[t,x,y] where t is time, and Vt, Vx, Vy are increments. We know Vt, but need to find which Vx and Vy are the best match. If we expand this function as a taylor series and assume the high-order terms are also 0 (or small enough not to matter), then the result (after taking a derivative with respect to t) is:
dI/dx * Vx + dI/dy *Vy + dI/dt = 0 Assuming any small region of the image is basically a linear gradient in order to avoid high-order terms. Then basically the flow direction points the same way as that gradient, and the length of it is based on the rate that the intensity is changing relative to the steepness of the image gradient.

The Lucas-Kanade algorithm basically computes the three partial derivatives in the above linear equation (the gradients of a single image with respect to x and y and the change in intensity of each pixel between images) and solves a least-squares estimation problem using a window of pixels around each one to compute the best fitting Vx and Vy.

Limitations: This again assumes slow moving objects, or very small time increments between frames. It also assumes a natural scene containing textured objects exhibiting shades of gray (different intensity levels) which change smoothly. It does not force nearby areas to move in the same direction, which can be an advantage over HS. It works better with multiple slow moving objects and not as well if the camera is moving.

Optical Flow Recognition Methods

Welcome to massmind.org!

Welcome to techref.massmind.org!