Go to Alexandria's home page
The Library of Alexandria

Plan for video patch analysis study

Jim Carnicelli's AI Blog

Alexandria Home | Up One Level

ò
Switch to multi-page mode for smaller pages with cross-navigation.    Switch to single-page mode for all content in one page.

Wednesday, 7/4/2007

Plan for video patch analysis study

Back to
blog home
Listen to an
audio version
Notify me of
new entries
Subscribe to a full
RSS feed of this blog

I've done a lot of thinking about this idea of making a program that can characterize the motions of all parts of a video scene. Not surprisingly, I've concluded it's going to be a hard problem. But unlike other cases where I've smacked up against a brick wall, I can see what seems a clear path from here to there. It's just going to take a long time and a lot of steps. Here's an overview of my plan.

First, the goal. The most basic purpose is to, as I said above, make a program that can characterize the motions of all parts of a video scene. The program should be able to fill an entire scene with "patches". Each patch will lock onto the content found in that frame and follow it throughout the video or until it can no longer be tracked. So if one patch is planted over the eye of a person walking through the scene, the patch should be able to follow that eye for at least as long as it's visible. Achieving this goal will be valuable because it will provide a sort of representation of the contents of the scene as fluidly moving but persistent objects. This seems a cornerstone of generalized visual perception, which has been entirely lacking in the history of AI research.

One key principle for all of this research will be the goal of constructing stable, generic views, elaborated by Donald D. Hoffman in Visual Intelligence. The dynamics of individual patches will be very ambiguous. Favoring stable interpretations of the world will help patches to make smarter guesses, especially when some lines of evidence strongly suggest non-stable ones.

One obvious challenge is when a patch falls on a linear edge, like the side of a house, instead of a sharp point, like a roof peak. Even more challenging will be patches that fall on homogenous textures, like grass, where independent tracking will be very difficult. It seems clear that an important key to the success of any single patch tracking its subject matter will be cooperating with its neighboring patches to get clues about what its own motion should be. Patches that follow sharp corners will have a high degree of confidence in their ability to follow their target content. Patches that follow edges will be less certain and will rely on higher confidence patches nearby to help them make good guesses. Patches that follow homogeneous textures will have very low confidence and will rely almost exclusively on higher confidence patches nearby to make reasonable guesses about how to follow their target content.

The algorithms for getting patches to cooperate will be a big challenge as it is. If the patches themselves aren't any good at following even strong points of interest, working on fabrics of patches will be a waste of time. Before any significant amount of time is spent on patch fabrics, I intend to focus attention on individual patches. A patch should be able to at least follow sharp points of interest. It should also be able to follow smooth edges laterally along the edge, like a buoy bobbing on water. Even this is a difficult challenge, though. Video of 3D scenes will include objects that move toward and away from the camera, so individual patches' target contents will sometimes shrink or expand. Nearby points of interest that look similar can confuse a patch if the target content is moving a lot. Changes in lighting and shadow from overcast trees, rotation, and so on will pose a huge challenge. Some of the strongest points of interest lie on outer edges of 3D objects. As such an object moves against its background, part of the patch's pattern will naturally change. The patch needs to be able to detect its content as an object edge and learn quickly to ignore the background movements.

It's apparent that solving each of these problems will require a lot of thought, coding, and testing. Also, that these components may well work against each other. It's going to be important for the patch to be able to arbitrate differing opinions among the components about where to go with each moment. How best to arbitrate is a mystery to me at present. It seems logical, then, to begin my study by creating and testing the various analysis components of a single patch.

Once I have some better definition of the analysis tools a patch will have at its disposal for independent behavior, I should then have a tool kit of black-boxes that an arbitration (and probably learning) algorithm can work with. Once I have a patch component that can do many analyses and come up with good guesses about the dynamics of its target content, then I can move on to constructing "fabrics" of patches so the patches can rely on their neighbors for additional evidence. The individual patches, if they have a generic arbitration mechanism, can use additional information from neighbors as just more evidence to arbitrate with.

I have made a conscious choice this time not to worry about performance. If it takes a day to analyze a single frame of a video, that's fine. *shudder* Well, I probably will try to at least make my research tolerable, but the result of this will almost certainly not be practical for real-time processing of video using the equipment I have on hand. However, I believe that if I am successful at least in proving the concept I'm striving for and thus advancing research into visual perception in machines, other programmers will pick apart the algorithms and reproduce them in more efficient ways. Further, it is very clear to me that individual patches are so wonderfully self-contained that it will be possible to divvy out all the patches in a scene to as many processors as we can throw at the problem. This means that if one can make a patch fabric engine that processes one frame per second using a single processor, it should be fairly easy to make it process 30 frames per second with 30 processors.

I am also dispensing somewhat with the goal of mimicking human vision with this project. I do believe a lot of what I'm trying to do does go on in our visual systems. I don't have strong reason to believe, though, that we have little parts of our brains devoted to following patches wherever they will go as time passes. That doesn't seem to fit the fixed wiring of our brains very well. It may well be that we do patch following of a sort that lets the patch slide from neural patch to neural patch, which may imply some means of passing state information along those internal paths. I can hypothesize about that, but really, I don't know enough yet to say that this is literally what happens in the human visual system. I think it's enough to say that it could.

So that's my current plan of research for a while. I have to do this in such small bites that it's going to be a challenge keeping momentum. I just hope that I've broken the project up into small enough bites to make significant progress over the longer term.

method="post" action="../../ai/feedback.asp">
Your Feedback
Name (optional):
Email (optional):

Prove Your Humanity:
Please enter the code you see here. This is designed to
protect our message board from spam posted by automated software.
Those programs can't easily read these codes like you and I can.

Subject: AI - Blog - Plan for video patch analysis study
Or write me an email instead.         

Back to
blog home
Listen to an
audio version
Notify me of
new entries
Subscribe to a full
RSS feed of this blog


All Entries

(reverse date order)

  • 11/13/2007 - Confirmation bias as a tool of perception
  • 11/6/2007 - What bar code scanners can tell us about perception
  • 10/21/2007 - Perception as construction of stable interpretations
  • 10/14/2007 - Rebuttal of the Chinese Room Argument
  • 10/7/2007 - Video stabilizer
  • 9/27/2007 - "Conscious Realism" and "Multimodal User Interface" theories
  • 7/4/2007 - Plan for video patch analysis study
  • 7/1/2007 - Patch mapping in video
  • 6/27/2007 - Emotional and moral tagging of percepts and concepts
  • 6/22/2007 - A hypothetical blob-based vision system
  • 4/21/2007 - Abstraction in neuron banks
  • 4/12/2007 - Pattern Sniffer: a demonstration of neural learning
  • 4/7/2007 - A respectful critique of the Hierarchical Temporal Memory (HTM) concept
  • 11/10/2005 - Neuron banks and learning
  • 11/3/2005 - A standardized test of perceptual capability
  • 10/29/2005 - Using your face and a webcam to control a computer
  • 10/8/2005 - Stereo disparity edge maps
  • 9/25/2005 - Some stereo vision illusions
  • 9/21/2005 - Topics in Machine Vision
  • 8/26/2005 - Introduction to Machine Vision
  • 8/14/2005 - Bob Mottram, crafty fellow
  • 8/11/2005 - Stereo vision: measuring object distance using pixel offset
  • 8/7/2005 - Automatic alignment of stereo cameras
  • 8/7/2005 - DualCameras component
  • 7/30/2005 - Patch equivalence
  • 7/12/2005 - Machine vision: motion-based segmentation
  • 6/20/2005 - Machine vision: spindles
  • 6/16/2005 - Machine vision: smoothing out textures
  • 6/15/2005 - Machine vision: studying surface textures
  • 6/10/2005 - Machine vision: pixel morphing
  • 6/10/2005 - Machine vision: motion tracking
  • 6/10/2005 - Machine vision: tilting my head
  • 6/10/2005 - Machine vision: layer-based models
  • 6/9/2005 - Machine vision: 2D collages
  • 6/9/2005 - Machine vision: Hierarchy of regions
  • 6/9/2005 - Machine vision: cost-effective action
  • 6/9/2005 - Machine vision: overlooking shadow and light splotches on surfaces
  • 6/9/2005 - Machine vision: blob growth
  • 5/11/2005 - Review of "Visual Intelligence"
  • 5/4/2005 - The portable, hand-held learning laboratory
  • 4/27/2005 - Review of "On Intelligence"
  • 4/15/2005 - Bubble Vision
  • 2/26/2005 - Machine vision of GUIs
  • 1/23/2005 - The fallacy of bigger brains
  • 1/12/2005 - Follow-up on Pile
  • 1/12/2005 - A review of the premises behind Pile
  • 11/28/2004 - Thoughts on FLARE
  • 11/28/2004 - New project: Mechasphere
  • 11/14/2004 - Review of "Bicentennial Man"
  • 11/2/2004 - Neural network demo
  • 10/17/2004 - Roamer: recent updates
  • 10/13/2004 - New Roamer project
  • 10/9/2004 - First entry


    Go to Alexandria's home page Copyright © 2010 The Library of Alexandria. All rights reserved.
    Produced in cooperation with Carnell Information Systems, Inc.