On 17 November 2014, research groups at Google and Stanford University jointly announced significant advances in image recognition software.
Image recognition has been pursued for many years. One of the first and still most widely deployed applications is to recognize faces. Indeed, facial recognition systems have been employed in numerous settings:
- In numerous U.S. locations, such as airports; in fact, many presume that such cameras are ubiquitous, as in TV police dramas, although this is not the case.
- In London, as part of their closed-circuit TV camera crime monitoring system.
- In various casinos, to recognize “card counters” and other unwanted gamblers.
- In the Australian and New Zealand custom services.
- By the U.S. Department of State, which has a database of over 75 million facial photos.
- By the German Federal Police, whose system permits voluntary subscribers to pass automated border controls at the Frankfurt International Airport.
These existing systems have generally been limited to recognizing one individual object in a digital image. The new Google and Stanford systems can now operate on entire scenes: young people playing Frisbee; a herd of elephants; and others.
Perhaps the most interesting aspect of the latest research is that it has been based in part on automated machine-learning techniques. For example, after being shown 10 million images, Google researchers’ software trained itself to recognize cats.
Both the Google and Stanford groups have employed neural networks in their work. Neural networks are software packages that mimic the basic scheme thought to be employed by the human brain. After the programs “learned” to see particular patterns in photos, the programs were able to perform very well on reams of new images, nearly matching human skills in some respects.
Oriol Vinyals, a co-author of the Google paper, says that “The field is just starting, and we will see a lot of increases.”
But in spite of these developments, software systems have only made limited progress in duplicating human vision, and even less in true “understanding” of what an image means. Your brain isn’t obsolete yet. Also, keep in mind that these are preliminary reports; they have not yet passed full-fledged peer review.
For additional details, see this New York Times report.