Go to Alexandria's home page
The Library of Alexandria

Machine vision of GUIs

Jim Carnicelli's AI Blog

Alexandria Home | Up One Level

ò
Switch to multi-page mode for smaller pages with cross-navigation.    Switch to single-page mode for all content in one page.

Saturday, 2/26/2005

Machine vision of GUIs

Back to
blog home
Listen to an
audio version
Notify me of
new entries
Subscribe to a full
RSS feed of this blog

I just completed a brief foray into machine vision with a project focusing on being able to see and to some degree "understand" windowed graphical user interfaces (GUIs) like Microsoft Windows. I wrote a test program and an essay on the subject, so I'd rather suggest you visit the project's home page instead of simply repeating its contents here. But I'll summarize briefly.

The base premise of my explorations is that most GUIs are composed of rectangular blocks within blocks. I called the core of the concept I was experimenting with "expansion" and "contraction" algorithms. "Expansion" here means starting with a test rectangle that begins inside a block and, like a balloon, expands outward until it finds the outer bounds of the current block. Similarly, "contraction" means starting with a rectangle that is just inside a rectangular block that gradually shrinks downward until it wraps snugly around the one or more inner blocks that punctuate the smooth outer bounds of the first block; like water filling a dry stream to expose the islands within it.

The main point of an analysis of a user's screen involving expansion and contraction to find the boundaries of the UI blocks would be to carve up a complex screen into smaller units that can be processed by other, more traditional vision systems. An optical character recognition (ORC) system, for example, might be able to read the text on a button or in a text box. A neural network might be used to recognize an icon on a button. A neural net or classifier system could be used to draw conclusions about what a particular arrangement of blocks within blocks might represent. It might, for example, be able to distinguish a word processor from a web browser.

Ultimately, there could be all sorts of applications of a system that can reasonably grasp most of the basic elements of a windowed GUI. I had fun writing a simple demonstration system that illustrates some of the strengths and weaknesses of the concept as I describe it in the accompanying essay. Plus I made the source code of that program available for download.

method="post" action="../../ai/feedback.asp">
Your Feedback
Name (optional):
Email (optional):

Prove Your Humanity:
Please enter the code you see here. This is designed to
protect our message board from spam posted by automated software.
Those programs can't easily read these codes like you and I can.

Subject: AI - Blog - Machine vision of GUIs
Or write me an email instead.         

Back to
blog home
Listen to an
audio version
Notify me of
new entries
Subscribe to a full
RSS feed of this blog


All Entries

(reverse date order)

  • 11/13/2007 - Confirmation bias as a tool of perception
  • 11/6/2007 - What bar code scanners can tell us about perception
  • 10/21/2007 - Perception as construction of stable interpretations
  • 10/14/2007 - Rebuttal of the Chinese Room Argument
  • 10/7/2007 - Video stabilizer
  • 9/27/2007 - "Conscious Realism" and "Multimodal User Interface" theories
  • 7/4/2007 - Plan for video patch analysis study
  • 7/1/2007 - Patch mapping in video
  • 6/27/2007 - Emotional and moral tagging of percepts and concepts
  • 6/22/2007 - A hypothetical blob-based vision system
  • 4/21/2007 - Abstraction in neuron banks
  • 4/12/2007 - Pattern Sniffer: a demonstration of neural learning
  • 4/7/2007 - A respectful critique of the Hierarchical Temporal Memory (HTM) concept
  • 11/10/2005 - Neuron banks and learning
  • 11/3/2005 - A standardized test of perceptual capability
  • 10/29/2005 - Using your face and a webcam to control a computer
  • 10/8/2005 - Stereo disparity edge maps
  • 9/25/2005 - Some stereo vision illusions
  • 9/21/2005 - Topics in Machine Vision
  • 8/26/2005 - Introduction to Machine Vision
  • 8/14/2005 - Bob Mottram, crafty fellow
  • 8/11/2005 - Stereo vision: measuring object distance using pixel offset
  • 8/7/2005 - Automatic alignment of stereo cameras
  • 8/7/2005 - DualCameras component
  • 7/30/2005 - Patch equivalence
  • 7/12/2005 - Machine vision: motion-based segmentation
  • 6/20/2005 - Machine vision: spindles
  • 6/16/2005 - Machine vision: smoothing out textures
  • 6/15/2005 - Machine vision: studying surface textures
  • 6/10/2005 - Machine vision: pixel morphing
  • 6/10/2005 - Machine vision: motion tracking
  • 6/10/2005 - Machine vision: tilting my head
  • 6/10/2005 - Machine vision: layer-based models
  • 6/9/2005 - Machine vision: 2D collages
  • 6/9/2005 - Machine vision: Hierarchy of regions
  • 6/9/2005 - Machine vision: cost-effective action
  • 6/9/2005 - Machine vision: overlooking shadow and light splotches on surfaces
  • 6/9/2005 - Machine vision: blob growth
  • 5/11/2005 - Review of "Visual Intelligence"
  • 5/4/2005 - The portable, hand-held learning laboratory
  • 4/27/2005 - Review of "On Intelligence"
  • 4/15/2005 - Bubble Vision
  • 2/26/2005 - Machine vision of GUIs
  • 1/23/2005 - The fallacy of bigger brains
  • 1/12/2005 - Follow-up on Pile
  • 1/12/2005 - A review of the premises behind Pile
  • 11/28/2004 - Thoughts on FLARE
  • 11/28/2004 - New project: Mechasphere
  • 11/14/2004 - Review of "Bicentennial Man"
  • 11/2/2004 - Neural network demo
  • 10/17/2004 - Roamer: recent updates
  • 10/13/2004 - New Roamer project
  • 10/9/2004 - First entry


    Go to Alexandria's home page Copyright © 2010 The Library of Alexandria. All rights reserved.
    Produced in cooperation with Carnell Information Systems, Inc.