@SounDoer：来自 Microsoft Research 的视频，三位演讲者从原理角度分析了目前用于 AR/VR Spatial Audio 实现的声音技术。
Faculty Summit 2016 – Spatial Audio for Augmented and Virtual Reality
Head mounted displays for virtual and augmented reality are a hot topic for research and product development. Integral part of these devices is the spatial audio rendering system. Unlike the vision, where humans have approximately 100° field of view, human hearing covers all directions in all three dimensions. This means that the spatial audio system is expected to provide realistic rendering of sound objects in full 3D to complement the stereo rendering of the visual objects. This session will discuss the problems and solutions around the spatial audio systems in the devices for virtual and augmented reality.
- 第一位演讲者 Mark Billinghurst, Spatial Audio For Augmented Reality
- Augmented Reality
1) Combines Real and Virtual Images. Both can be seen at the same time.
2) Interactive in real-time. The virtual content can be interacted with.
3) Registered in 3D. Virtual objects appear fixed in space.
- Pokemon GO: Handheld AR, Touch Input, GPS/Compass Sensors
- Hololens: Head Mounted Augmented Reality
Speech, Gesture Input, Stereo View
- 2 Eyes + 2 Ears = AR Spatial Interface
Visual Interface: see through HMD has ~ 30º – 90º field of view
Audio Interface: Binaural headphone has 360º field of hearing
- Benefits of Adding Spatial Audio to AR
Cognitive: More information display without additional cognitive load
Information: Sumultaneous information display using multiple modalities; Use appropriate modality for information
Interface: Overcome limitations of limited visual display; Increase interface design options
- Spatial audio helps with information presentation
Spatial audio can direct user attention
Spatial audio cues can improve AR conferencing
Tools can be developed for spatial audio authoring
Spatial audio enables richer AR ecperiences
- Directions for Future Research
User Interface Metaphors: How to interact with hybrid interfaces? How to present information between modalities?
Collaborative Interfaces: Using spatial audio for sharing communication cues; Recording and sharing spatial audio.
Applications/Tools: Which AR applications should use spatial audio? AR spatial audio development tools
Technology: Using headphones vs. bone conducting transducers/other tech. Spatial audio algorithms (individual HRTF vs. generic HRTF, etc)
- 第二位演讲者 Ramani Duraiswami, Physically Accurate Low Latency Audio for Virtual and Augmented Reality
- Virtual Reality
Create artificial world that the user believes is real
Simulation, Gaming and Entertainment
- Augmented Reality
Insert objects and information into the real world
Training, Surgery and Entertainment
Hardware: Moore’s law, displays, graphics, tracking
Software: Engines for creating virtual worlds
Visual Perception: Improved latency, using persistence
Most studied for vision/vestibular system interaction
Use of persistence and improved frame rates mentioned by Valve/Oculus as primary improvements allowing VR
Smaller fields of view
Still lots of stories about gamers getting sick
- Fool the Visual System?
Visual system part of larger perceptual system, responsible for sense-making
Perceptual system is a sophisticated sensing, measuring and computing system
Designed by evolution to perform real time measurements and take quick decisions
*Fool this system into believing that it is perceiving an object that is not there
- Human Spatial Localization Ability
- Hypothesis: Render Sound Correctly
Ger the sound right at the entrances to the ear canals
Approximately solve the audio propagation problem from sources in the scene to the ear canal
Do what graphics and vision did:
1) Move from emulation to approximate simulation
2) Use physics based models, appropriately simplified
3) Simplify based on knowledge of what is perceptible: focus attention on things that matter
4) Level of detail based on available comuting power
5) Capture representations of the real world that allow rendering
Render not only objects but scenes
- Audible Sound Scattering: sound wavelengths comparable to human dimensions and dimensions of spaces we live in.
- Accurate Approximate Scattering
Linear systems can be characterized by impulse response (IR). Knowing IR, can compute response to general source by convolution
Response to impulsive source at a particular location. Scattering off person by Head Related Impulse Response (HRIR). Room scattering by Room Impulse Response (RIR).
Response differs according to source and receiver locations, thus encodes source location
HRTF and RTF are Fourier transforms of the impulse response. Convolution is cheaper in the Fourier domain (becomes a multiplication)
- Creating Auditory Reality
VR/Gaming: Given a sound source and an environment build an engine that reproduces the cues
Augmented Reality: Capture sound remotely and rerender it by reintroducing cues that exist in the real world
Scattering of sound off the human: Head Related Transfer Functions
Scattering off the environment: Room Models
Head Motion: Head/Body Tracking
- Breaking up the Filter:
Convolution is linear
Early reflections are more important and time separated. Important for determining range.
Later reflections are a continuum. Important for spaciousness / envelopment / warmth, etc.
Create early reflections filter on the fly. Reflections of up to 5th or 6th order (deoending on computational resources). These are convolved with their HRTF.
Tail of room impulse response is approximated depending on room size.
- HRTFs are very individual; Need indicidual HRTFs for creating accurate virtual audio.
- 一种新的快速测算 HRTF 的方法：
Turned out headphone drivers
Array of tiny microphones
- Oculus Audio SDK 中使用了 VisiSonics RealSpace3D 技术
- HRTF Measurement:
- Rendering Approaches:
Channel-based Rendering（注：视频中以 Dolby Atmos 为例；确切来说，Atmos 是 Channel-based 与 Object-based 的结合。）
SounDoer – Focus On Sound Design