I always hold my breath and get a sinking feeling in my stomach whenever a field in which I have expertise takes center stage in a news story or pop-culture piece. More often than not, there are misrepresentations of both sophisticated and not-so-sophisticated aspects of the field (e.g., see Wired Magazine).
Such errors are a common occurrence in movies and television—accuracy in details play a secondary role to the story, and the vast majority of the audience has no idea whether the details are accurate or not. Pilots may object that the location of landing gear switches are not accurately portrayed in a movie, but does anyone else really care? (I recall the howls of protest that arose as outraged chess players complained about inaccuracies in the portrayal of competitive chess in the charming and under-rated movie Searching for Bobby Fischer—seriously, does anyone really care that the players weren’t writing down their moves, or that the games were actioned-up? Chess players worldwide should have been grateful that such a beautiful portrayal of the game was the framework for such a great family film).
Thus, it was with surprise that I read a recently published article in the New Yorker on speech recognition by John Seabrook that provided an interesting and accurate tour of speech recognition, with brief asides on on a variety of related fields—the physiology of speech production, the physiology of hearing, prosody of speech—all tied together by the promise of computer-based communication that HAL presented in 2001 when the article’s author was a little kid. I was also surprised to see a popular magazine reference John Pierce’s Acoustical Society letter Whither Speech Recognition, a scathing throwdown on the field of speech recognition in 1969 by the then executive director of research at Bell Laboratories (in this highly debated letter, Pierce criticized the state of speech recognition research at the time for having a “scarcity in the field of people who behave like scientists and of results that look like science.”) I highly recommend reading this New Yorker article for anyone with an interested in the topic.
One odd aspect of the story is that it ends with a discussion of a company called Sound Intelligence, which has developed audio sensor technology that detects violent activity on city streets for use by police. The company is cited as an example of the successful application of the work that Seabrook detailed on detecting emotion in speech. An engineer of the company, whom I heard speak about their technology last year, is quoted as saying that the Sound Intelligence grew out of auditory modeling research at the University of Groningen and its application to separating speech from background noise. It’s unclear to me how much the success of the technology requires complex auditory models or any of the science and technology the article had detailed up to that point. While I applaud Sound Intelligence’s success, the inclusion of their technology as the coda to an otherwise great review of the speech recognition field makes for an empty conclusion. I’m sure that the folks at Sound Intelligence, however, would disagree with me completely.