Thinking Tech

Has DARPA cracked the video search problem?

Has DARPA cracked the video search problem?

Posting in Government

Text, image and audio search technologies have changed how we find content, but video search has lagged behind. A recent announcement by DARPA, however, suggests a breakthrough.

Video search is terrible. So terrible, in fact, that nobody really does it: videos hosted on sites like YouTube are titled and tagged, linked together by social connections and manually categorized by the same users that put them there. Video search online, in other words, isn't really video search. It's text search.

This is fine for a site driven by social interactions, where contributors want their content to be seen. Sharing is YouTube's raison d'être, and Google has done everything it can to make sure every video is contextualized and made available to any viewer who might want to see it. Finding a public YouTube video without any accompanying context is impossible; uploaded videos must be assigned a category tag, at the very least.

Without their respective nests of text and social data, videos would wallow in obscurity--there'd be no reliable way to search through them. Companies like Intel and Blinkx have tried their hand at search solutions based on direct video and audio analysis. The former doesn't seem to have released its technology publicly, and the latter's service, while certainly a useful complement to Google's video search, is far from perfect.

It's heartening, then, to get news from DARPA that its Video and Image Retrieval and Analysis Tool (VIRAT) program is moving forward. From a contract announcement published on the FedBizOpps website this week:

This sole source contract is for the integration of software code into multiple programs of record for full motion video (FMV) exploitation for the Video and Image Retrieval and Analysis Tool (VIRAT).  This third phase of the VIRAT program will demonstrate rapid refinement of query results and the capability to accommodate complex searches that include multiple, dynamic events within a single query and will transition the system to multiple DoD programs of record.  The period of performance is anticipated to be the date of award through 12 months thereafter.

DARPA has determined that [Lockheed Martin] is the only entity with the requisite knowledge of VIRAT's capabilities and access to/understanding of the transition partners' systems to effectively integrate and transition VIRAT.

Note that this announcement is about the integration of existing technology into practical applications. In other words, DARPA seems to think that the underlying VIRAT technology is ready to be put to work.

What DARPA has developed here isn't full video search, and couldn't be simply dropped into a consumer product like YouTube. It was developed with a narrow set of applications in mind, and a focus on parsing footage collected by unmanned surveillance drones. The original VIRAT solicitation called for software that could identify actions, events or activities in the following categories:

  • Single person: Digging, loitering, shooting, running, limping
  • Person-person: Gathering, moving as a group, shaking hands, carrying together
  • Person-vehicle: Driving, getting out, loading, crawling under.
  • Vehicles: Accelerating, shooting, moving in a convoy

VIRAT's job is to scan countless hours of recorded drone footage for these activities, giving human analysts a starting point in their search for particular people or patterns. Recent news makes it easy to understand why such a capability is worth developing. (Analysis of drone and satellite imagery played a vital role in the killing of Osama Bin Laden.)

While VIRAT may not constitute a full video search engine, a technology that could accomplish the above listed tasks has obvious applications for all types of content. At the very least, a tool like this could be adapted to sepate videos into broad categories--people, cars, landscapes, buildings, crowds, etc.

If VIRAT technology ever trickles out of DARPA's basement and into the technological mainstream, it could drastically change how we find and interact with content on the web. Face and landmark recognition has done wonders for consumer image search. It's about time we got the same performance in video search. [via Popular Science and PCW]

Share this

John Herrman

Contributing Editor

Contributing Editor John Herrman is a freelance writer based in New York City. He is also contributing editor at Gizmodo. He holds a degree from the University of Edinburgh. Follow him on Twitter. Disclosure