#SounDoer# Faculty Summit 2016 讲座:AR/VR Spatial Audio 技术讨论

@SounDoer:来自 Microsoft Research 的视频,三位演讲者从原理角度分析了目前用于 AR/VR Spatial Audio 实现的声音技术。
 

Faculty Summit 2016 – Spatial Audio for Augmented and Virtual Reality
Head mounted displays for virtual and augmented reality are a hot topic for research and product development. Integral part of these devices is the spatial audio rendering system. Unlike the vision, where humans have approximately 100° field of view, human hearing covers all directions in all three dimensions. This means that the spatial audio system is expected to provide realistic rendering of sound objects in full 3D to complement the stereo rendering of the visual objects. This session will discuss the problems and solutions around the spatial audio systems in the devices for virtual and augmented reality.
 
摘要:
  1. 第一位演讲者 Mark Billinghurst, Spatial Audio For Augmented Reality
  2. Augmented Reality
    1) Combines Real and Virtual Images. Both can be seen at the same time.
    2) Interactive in real-time. The virtual content can be interacted with.
    3) Registered in 3D. Virtual objects appear fixed in space.
  3. Pokemon GO: Handheld AR, Touch Input, GPS/Compass Sensors
  4. Hololens: Head Mounted Augmented Reality
    Speech, Gesture Input, Stereo View
  5. 2 Eyes + 2 Ears = AR Spatial Interface
    Visual Interface: see through HMD has ~ 30º – 90º field of view
    Audio Interface: Binaural headphone has 360º field of hearing
  6. Benefits of Adding Spatial Audio to AR
    Cognitive: More information display without additional cognitive load
    Information: Sumultaneous information display using multiple modalities; Use appropriate modality for information
    Interface: Overcome limitations of limited visual display; Increase interface design options
  7. Spatial audio helps with information presentation
    Spatial audio can direct user attention
    Spatial audio cues can improve AR conferencing
    Tools can be developed for spatial audio authoring
    Spatial audio enables richer AR ecperiences
  8. Directions for Future Research
    User Interface Metaphors: How to interact with hybrid interfaces? How to present information between modalities?
    Collaborative Interfaces: Using spatial audio for sharing communication cues; Recording and sharing spatial audio.
    Applications/Tools: Which AR applications should use spatial audio? AR spatial audio development tools
    Technology: Using headphones vs. bone conducting transducers/other tech. Spatial audio algorithms (individual HRTF vs. generic HRTF, etc)
  9. 第二位演讲者 Ramani Duraiswami, Physically Accurate Low Latency Audio for Virtual and Augmented Reality
  10. Virtual Reality
    Create artificial world that the user believes is real
    Simulation, Gaming and Entertainment
  11. Augmented Reality
    Insert objects and information into the real world
    Training, Surgery and Entertainment
  12. Enablers
    Hardware: Moore’s law, displays, graphics, tracking
    Software: Engines for creating virtual worlds
    Visual Perception: Improved latency, using persistence
  13. Sickness:
    Most studied for vision/vestibular system interaction
    Use of persistence and improved frame rates mentioned by Valve/Oculus as primary improvements allowing VR
    Smaller fields of view
    Still lots of stories about gamers getting sick
  14. Fool the Visual System?
    Visual system part of larger perceptual system, responsible for sense-making
    Perceptual system is a sophisticated sensing, measuring and computing system
    Designed by evolution to perform real time measurements and take quick decisions
    *Fool this system into believing that it is perceiving an object that is not there
  15. Human Spatial Localization Ability
  16. Hypothesis: Render Sound Correctly
    Ger the sound right at the entrances to the ear canals
    Approximately solve the audio propagation problem from sources in the scene to the ear canal
    Do what graphics and vision did:
    1) Move from emulation to approximate simulation
    2) Use physics based models, appropriately simplified
    3) Simplify based on knowledge of what is perceptible: focus attention on things that matter
    4) Level of detail based on available comuting power
    5) Capture representations of the real world that allow rendering
    Render not only objects but scenes
  17. Audible Sound Scattering: sound wavelengths comparable to human dimensions and dimensions of spaces we live in.
  18. Accurate Approximate Scattering
    Linear systems can be characterized by impulse response (IR). Knowing IR, can compute response to general source by convolution
    Response to impulsive source at a particular location. Scattering off person by Head Related Impulse Response (HRIR). Room scattering by Room Impulse Response (RIR).
    Response differs according to source and receiver locations, thus encodes source location
    HRTF and RTF are Fourier transforms of the impulse response. Convolution is cheaper in the Fourier domain (becomes a multiplication)
  19. Creating Auditory Reality
    VR/Gaming: Given a sound source and an environment build an engine that reproduces the cues
    Augmented Reality: Capture sound remotely and rerender it by reintroducing cues that exist in the real world
    Scattering of sound off the human: Head Related Transfer Functions
    Scattering off the environment: Room Models
    Head Motion: Head/Body Tracking
  20. Breaking up the Filter:
    Convolution is linear
    Early reflections are more important and time separated. Important for determining range.
    Later reflections are a continuum. Important for spaciousness / envelopment / warmth, etc.
    Create early reflections filter on the fly. Reflections of up to 5th or 6th order (deoending on computational resources). These are convolved with their HRTF.
    Tail of room impulse response is approximated depending on room size.
  21. HRTFs are very individual; Need indicidual HRTFs for creating accurate virtual audio.
  22. 一种新的快速测算 HRTF 的方法:
    Turned out headphone drivers
    Array of tiny microphones
  23. Oculus Audio SDK 中使用了 VisiSonics RealSpace3D 技术
  24. 观众提问:如何平衡游戏中声音的艺术创造和技术上真实再现之间的关系?
  25. HRTF Measurement:
    Anthropometry-based Personalisation
    Model-based Personalisation
  26. Rendering Approaches:
    Object-based Rendering(目前主要运用在游戏中)
    Channel-based Rendering(注:视频中以 Dolby Atmos 为例;确切来说,Atmos 是 Channel-based 与 Object-based 的结合。)
    Parametric Approaches
    Modal Rendering(Ambisonics)
 
 
SounDoer – Focus On Sound Design
@SounDoer 编译,若有错误还望不吝指教。转载烦请告知并注明出处。