New Yorker on Speech Recognition

I always hold my breath and get a sinking feeling in my stomach whenever a field in which I have expertise takes center stage in a news story or pop-culture piece. More often than not, there are misrepresentations of both sophisticated and not-so-sophisticated aspects of the field (e.g., see Wired Magazine).

Such errors are a common occurrence in movies and television—accuracy in details play a secondary role to the story, and the vast majority of the audience has no idea whether the details are accurate or not. Pilots may object that the location of landing gear switches are not accurately portrayed in a movie, but does anyone else really care? (I recall the howls of protest that arose as outraged chess players complained about inaccuracies in the portrayal of competitive chess in the charming and under-rated movie Searching for Bobby Fischer—seriously, does anyone really care that the players weren’t writing down their moves, or that the games were actioned-up? Chess players worldwide should have been grateful that such a beautiful portrayal of the game was the framework for such a great family film).

HALThus, it was with surprise that I read a recently published article in the New Yorker on speech recognition by John Seabrook that provided an interesting and accurate tour of speech recognition, with brief asides on on a variety of related fields—the physiology of speech production, the physiology of hearing, prosody of speech—all tied together by the promise of computer-based communication that HAL presented in 2001 when the article’s author was  a little kid. I was also surprised to see a popular magazine reference John Pierce’s Acoustical Society letter Whither Speech Recognition, a scathing throwdown on the field of speech recognition in 1969 by the then executive director of research at Bell Laboratories (in this highly debated letter, Pierce criticized the state of speech recognition research at the time for having a “scarcity in the field of people who behave like scientists and of results that look like science.”) I highly recommend reading this New Yorker article for anyone with an interested in the topic.

One odd aspect of the story is that it ends with a discussion of a company called Sound Intelligence, which has developed audio sensor technology that detects violent activity on city streets for use by police. The company is cited as an example of the successful application of the work that Seabrook detailed on detecting emotion in speech. An engineer of the company, whom I heard speak about their technology last year, is quoted as saying that the Sound Intelligence grew out of auditory modeling research at the University of Groningen and its application to separating speech from background noise. It’s unclear to me how much the success of the technology requires complex auditory models or any of the science and technology the article had detailed up to that point. While I applaud Sound Intelligence’s success, the inclusion of their technology as the coda to an otherwise great review of the speech recognition field makes for an empty conclusion. I’m sure that the folks at Sound Intelligence, however, would disagree with me completely.

WSJ, Hearing and the Looming AAAS Conference

The Wall Street Journal today mentioned a conference session for which I am both a co-organizer and speaker. The WSJ article has an interview with Stefan Heller, a professor at Stanford University who is one of the invited speakers in the session, on the damage to hearing caused by such popular products as the iPod—a topic that I’ve posted at length on before. Dr. Heller’s research is on the use of embryonic stem cells to restore hearing to those with sensorineural hearing loss. The WSJ article simply discusses the potential for damage from current audio products and the fact that people don’t know that they are causing damage to their hearing until it’s too late:

WSJ: Can you actually kill some cells just from listening to a single CD on an iPod at top volume?
Heller: There probably are some people that can turn the volume of their iPods up to the limit and never have a problem. But other people might do it once and wipe out their high frequencies. And once that damage is done, it will get progressively worse. But you can only know which group you are in after you’ve lost your hearing.

The conference at which both Dr. Heller and I are speaking is the annual meeting of the American Association for the Advancement of Science, the organization that publishes Science Magazine, which is possibly the most cited scientific publication in the world. The meeting is in San Francisco from Feb 15–19, 2007. The theme of the conference this year is Science and Technology for Sustainable Well-Being, and the session that I am co-organizing with Dr. Steven Greenberg is titled Hearing Health—The Looming Crisis and What Can Be Done. (For you loomers out there who found this post after googling “Loom”: Welcome. Please link to me on your Looming site.) Looks like the conference will be an interesting one, see the bottom of this post for a sampling of session titles.

I believe that we’re going to be reading a lot more about prevalence of hearing damage and attempts at hearing conservation over the next few years. A small startup is addressing these issues with their recently launched iHearSafe earbuds that have hearing protection built right into them. This accessory to the iPod and other audio products appears to be designed with a more rigorous approach to hearing conservation than the iPod firmware upgrade last year that purported to address similar concerns about hearing conservation. As further evidence, over 150 scientists and intellectuals responded to web magazine Edge’s new year’s inquiry, “What are you optimistic about? Why?” and among such responses as Nathan Myhrvold’s “The Power of Educated People to Make Important Innovations,” Jared Diamond’s “Good Choices Sometimes Prevail,” and Steven Pinker’s “The Decline of Violence” was David Myer’s optimism towards benefit from hearing aids.

Back to the AAAS meeting: I’ll be speaking at the Hearing Health session about the application of hearing science to hearing technology. Due to an AAAS embargo on releasing presentation material before the session, I won’t be posting my talk or providing details from it until after the conference. This is done to ensure that the conference receives maximum press coverage, I suppose.

The program at the conference is extensive and incredibly diverse. As an example, below are listed the symposia that will occur on Friday at 8:30am:

  • Achieving and Sustaining a Diverse Science Work Force
  • Addiction and the Brain: Are We Hard-Wired To Abuse Drugs?
  • Research Competitiveness Strategies of Small Countries
  • Communicating Climate Change: Strategies for Effective Engagement
  • Science, Society, and Shared Cyberinfrastructure: Discovery on the Grid
  • Smart Prosthetics: Interfaces to the Nervous System Help Restore Independence
  • The New Mars: Habitability of a Neighbor World
  • Tinkerers and Tipping Points: Invention and Diffusion of Marine Conservation Technology
  • The Crime Drop and Beyond: Explaining U.S. Crime Trends
  • Dynamics of Extinction
  • Achieving Sustainable Water Supplies in the Drought-Plagued West
  • National Innovation Strategies in the East Asian Region
  • Mixed Health Messages: Observational Versus Randomized Trials
  • Education in Developing Countries and the Global Science Web
  • Food Safety and Health: Whom Can You Trust?
  • Numbers and Nerves: Affect and Meaning in Risk Information
  • Teaching Sustainable Engineering
  • Anti-Evolutionism in Europe: Be Afraid, Be Very Afraid, or Not?

See you there.

How a Cochlear Implant Sounds

PBS has a demonstration of what a cochlear implant sounds like to a hearing impaired person wearing one. The sounds were created by a friend of mine at the House Ear Institute, Bob Shannon, who is one of the leading cochlear implant scientists. The demonstration also provides a visual analogy that will be appreciated by those who use digital cameras, showing how the reduction in the "audio pixel" resolution of sound affects quality and intelligibility.

The human cochlea, the snail shell-looking organ, transduces sound from an acoustic wave into electrical nerve impulses that the brain can understand. The frequency resolution at which it does this is approximately 3500 "sound pixels" (the number of inner hair cells, for those of you who know the auditory system’s biology). Cochlear implants attempt to replicate this transduction in place of cochleas that no longer function. These implants have a resolution of around 20 "sound pixels" (channels in the implant industry’s nomenclature). Imagine how the quality of your digital photos would be reduced and how much detail would be lost if the image resolution were changed from 3000×2000 pixels to 30×20 pixels, and you get some of the idea of the difficulty facing implant design and implant wearers.

I remember when I first heard this demo from Bob at a hearing science conference (Association for Research in Otolaryngology) about 15 years ago. I was eating lunch at Crabby Bills in St. Petersburg, Florida and Bob came by with a portable tape player saying, "Listen to this!" and I heard a demo similar to the one on the PBS site. I, and anyone who heard it, was amazed at how understandable speech was even with only 4 "sound pixels", and the perceived potential for success with cochlear implants grew tremendously at that moment.