Go to Alexandria's home page
The Library of Alexandria

A standardized test of perceptual capability

Jim Carnicelli's AI Blog

Alexandria Home | Up One Level

ò
Switch to multi-page mode for smaller pages with cross-navigation.    Switch to single-page mode for all content in one page.

Thursday, 11/3/2005

A standardized test of perceptual capability

Back to
blog home
Listen to an
audio version
Notify me of
new entries
Subscribe to a full
RSS feed of this blog

I've been getting too lost in the idiosyncrasies of machine vision of late and missing my more important focus on intelligence, per se. I'm changing direction, now.

My recent experiences have shown me that one thing we haven't really done well is in the area of perceptual level intelligence. We have great sensors and cool algorithms for generating interesting but primitive information about the world. Edge detection, for example, can be used to generate a series of lines in a visual scene. But so what? Lines are just about as disconnected from intelligence as the raw pixel colors are.

Where do primitive visual features become percepts? Naturally, we have plenty of systems designed to instantly translate visual (or other sensory) information into known percepts. Put little red dots around a room, for instance, and a visual system can easily cue in on them as being key markers for a controlled-environment system. This is the sort of thinking that is used in vision-based quality control systems, too.

But what we don't have yet is a way for a machine to learn to recognize new percepts and learn to characterize and predict their behavior. I've been spending many years thinking about this problem. While I can't say I have a complete answer yet, I do have some ideas. I want to try them out. Recently, while thinking about the problem, I formulated an interesting way to test a perceptual-level machine's ability to learn and make predictions. I think it can be readily reproduced on many other systems and extended for ever more capable systems.

The test involves a very simplified, visual world composed of a black rectangular "planet" and populated by a white "ball". The ball, a small circle whose size never changes, moves around this 2D world in a variety of ways that, for the most part, are formulaic. One way, for example, might be thought of as a ball in a box in space. Another can be thought of as a ball in a box standing upright on Earth, meaning it bounces around in parabolic paths as though in the presence of a gravitational field. Other variants might involve random adjustments to velocity, just to make prediction more difficult.

The test "organism" would be able to see the whole of this world. It would have a "pointer". Its goal would be to move this pointer to wherever it believes the ball will be in the next moment. It would be able to tell where the pointer currently points using a direct sense separate from its vision.

Predicting where the ball will be in the future is a very interesting test of an organism's ability to learn to understand the nature of a percept. Measuring the competency of a test organism would be very easy, too. For each moment, there is a prediction, in the form of the pointer pointing to where it believes the ball will be in the next moment. When that moment comes, the distance between the predicted and actual positions of the ball is calculated. For any given set series of moments, the average distance would be the score of the organism in that context.

It would be easy for different researchers to compare their test organisms against others, but would require a little bit of care to put each test in a clear context. The context would be defined by a few variables. First is the ball behavior algorithm that is used. Each such behavior should be given a unique name and a formal description that can be easily implemented in code in just about any programming language. Second, the number of moments used to "warm up", which we'll call the "warm up period". That is, it should take a while for an organism to learn about the ball's behavior before it can be any good at making predictions. Third, the "test period"; i.e., the number of moments after the warm-up period is done in which test measurements are taken. The final score in this context, then, would be the average of all the distances measured between prediction and actual position.

There would be two standard figures that should be disclosed with any given test results. One is that the best possible score is 0, which means the predictions are always correct. The second is the best possible score for a "lazy" organism. In this case, a lazy organism is one that always guesses that the ball will be in the same place in the next moment that it is now. Naturally, a naive organism would do worse than this cheap approximation, but a competent organism should do better. The "lazy score" for a specific test run would be calculated as the average of all distances from each moment's ball position to its next moment's position. A weighted score for the organism could then be calculated as a ratio of actual score to lazy score. A value of zero would be the best possible. A value of one would indicate that the predictions are no better than the lazy score. A value greater than one would indicate that the predictions are actually worse than the lazy algorithm.

Some might quip that I'm just proposing a "blocks world" type experiment and that an "organism" competent to play this game wouldn't have to be very smart. I disagree. Yes, a programmer could preprogram an organism with all the knowledge it needs to solve the problem and even get a perfect score. A proper disclosure of the algorithm used would let fellow researchers quickly disqualify such trickery. So would testing that single program against novel ball behaviors. What's more, I think a sincere attempt to develop organisms that can solve this sort of problem in a generalizable way will result in algorithms that can be generalized to more sophisticated problems like vision in natural settings.

Naturally, this test can also be extended in sophistication. Perhaps there could be a series of levels defined for the test. This might be Level I. Level II might involve multiple balls of different colors. And so on.

I probably will draft a formal specification for this test soon. I welcome input from others interested in the idea. method="post" action="../../ai/feedback.asp">

Your Feedback
Name (optional):
Email (optional):

Prove Your Humanity:
Please enter the code you see here. This is designed to
protect our message board from spam posted by automated software.
Those programs can't easily read these codes like you and I can.

Subject: AI - Blog - A standardized test of perceptual capability
Or write me an email instead.         

Back to
blog home
Listen to an
audio version
Notify me of
new entries
Subscribe to a full
RSS feed of this blog


All Entries

(reverse date order)

  • 11/13/2007 - Confirmation bias as a tool of perception
  • 11/6/2007 - What bar code scanners can tell us about perception
  • 10/21/2007 - Perception as construction of stable interpretations
  • 10/14/2007 - Rebuttal of the Chinese Room Argument
  • 10/7/2007 - Video stabilizer
  • 9/27/2007 - "Conscious Realism" and "Multimodal User Interface" theories
  • 7/4/2007 - Plan for video patch analysis study
  • 7/1/2007 - Patch mapping in video
  • 6/27/2007 - Emotional and moral tagging of percepts and concepts
  • 6/22/2007 - A hypothetical blob-based vision system
  • 4/21/2007 - Abstraction in neuron banks
  • 4/12/2007 - Pattern Sniffer: a demonstration of neural learning
  • 4/7/2007 - A respectful critique of the Hierarchical Temporal Memory (HTM) concept
  • 11/10/2005 - Neuron banks and learning
  • 11/3/2005 - A standardized test of perceptual capability
  • 10/29/2005 - Using your face and a webcam to control a computer
  • 10/8/2005 - Stereo disparity edge maps
  • 9/25/2005 - Some stereo vision illusions
  • 9/21/2005 - Topics in Machine Vision
  • 8/26/2005 - Introduction to Machine Vision
  • 8/14/2005 - Bob Mottram, crafty fellow
  • 8/11/2005 - Stereo vision: measuring object distance using pixel offset
  • 8/7/2005 - Automatic alignment of stereo cameras
  • 8/7/2005 - DualCameras component
  • 7/30/2005 - Patch equivalence
  • 7/12/2005 - Machine vision: motion-based segmentation
  • 6/20/2005 - Machine vision: spindles
  • 6/16/2005 - Machine vision: smoothing out textures
  • 6/15/2005 - Machine vision: studying surface textures
  • 6/10/2005 - Machine vision: pixel morphing
  • 6/10/2005 - Machine vision: motion tracking
  • 6/10/2005 - Machine vision: tilting my head
  • 6/10/2005 - Machine vision: layer-based models
  • 6/9/2005 - Machine vision: 2D collages
  • 6/9/2005 - Machine vision: Hierarchy of regions
  • 6/9/2005 - Machine vision: cost-effective action
  • 6/9/2005 - Machine vision: overlooking shadow and light splotches on surfaces
  • 6/9/2005 - Machine vision: blob growth
  • 5/11/2005 - Review of "Visual Intelligence"
  • 5/4/2005 - The portable, hand-held learning laboratory
  • 4/27/2005 - Review of "On Intelligence"
  • 4/15/2005 - Bubble Vision
  • 2/26/2005 - Machine vision of GUIs
  • 1/23/2005 - The fallacy of bigger brains
  • 1/12/2005 - Follow-up on Pile
  • 1/12/2005 - A review of the premises behind Pile
  • 11/28/2004 - Thoughts on FLARE
  • 11/28/2004 - New project: Mechasphere
  • 11/14/2004 - Review of "Bicentennial Man"
  • 11/2/2004 - Neural network demo
  • 10/17/2004 - Roamer: recent updates
  • 10/13/2004 - New Roamer project
  • 10/9/2004 - First entry


    Go to Alexandria's home page Copyright © 2010 The Library of Alexandria. All rights reserved.
    Produced in cooperation with Carnell Information Systems, Inc.