Institute for Telecommunication Sciences / Research / Quality of Experience / Audio Quality Research / Audio Demos for Frame Duration Study
Audio Demos for Frame Duration Study
Full details on the algorithm that calculates the stationarity index (ψ) are available here.
Demo 1: In Support of Figure 1.
Figure 1: Speech quality as function of frame duration for oracle magnitude recovery (OMR) case. Noise types are coffee shop (blue), saw (red), and white (gold). Noise at 0 dB SNR. Dashed and solid lines show lower and higher stationarity speech, (ΨL and ΨH ) resp. Nominal POLQA MOS-LQO scale: 1 means “Bad”, 5 means “excellent.”
Example Audio Files
Original speech files are low_original.wav and high_original.wav.
All 44 of these .wav files are available in FrameDurationDemo1.zip (51 MB)
Demo 2: In Support of Figure 2.
Figure 2: Speech quality as function of frame duration for 0 dB SNR coffee shop noise. Oracle Binary Mask (OBM), Oracle Magnitude Recovery (OMR), and convolutional model noise model (CNM) shown in blue, red, and black, resp. Dashed and solid lines show lower and higher stationarity speech (ΨL and ΨH) resp. Nominal POLQA MOS-LQO scale: 1 means “Bad”, 5 means “excellent”. Simple CNM captures vast majority of OBM and OMR quality effects.
Example Audio Files
1 ms | 4 ms | 10 ms | 20 ms | 40 ms | 100 ms | 400 ms | |
ΨL, Convolutional Noise Model eqn. (9) | low_CNM_1ms.wav | low_CNM_4ms.wav | low_CNM_10ms.wav | low_CNM_20ms.wav | low_CNM_40ms.wav | low_CNM_100ms.wav | low_CNM_400ms.wav |
ΨL, OBM | low_OBM_1ms.wav | low_OBM_4ms.wav | low_OBM_10ms.wav | low_OBM_20ms.wav | low_OBM_40ms.wav | low_OBM_100ms.wav | low_OBM_400ms.wav |
ΨL, OMR | low_OMR_1ms.wav | low_OMR_4ms.wav | low_OMR_10ms.wav | low_OMR_20ms.wav | low_OMR_40ms.wav | low_OMR_100ms.wav | low_OMR_400ms.wav |
ΨH, Convolutional Noise Model eqn. (9) | high_CNM_1ms.wav | high_CNM_4ms.wav | high_CNM_10ms.wav | high_CNM_20ms.wav | high_CNM_40ms.wav | high_CNM_100ms.wav | high_CNM_400ms.wav |
ΨH, OBM | high_OBM_1ms.wav | high_OBM_4ms.wav | high_OBM_10ms.wav | high_OBM_20ms.wav | high_OBM_40ms.wav | high_OBM_100ms.wav | high_OBM_400ms.wav |
ΨH, OMR | high_OMR_1ms.wav | high_OMR_4ms.wav | high_OMR_10ms.wav | high_OMR_20ms.wav | high_OMR_40ms.wav | high_OMR_100ms.wav | high_OMR_400ms.wav |
Original speech files are low_original.wav and high_original.wav.
All 44 of these .wav files are available in FrameDurationDemo2.zip (48 MB)
Demo 3: Greater Stationary Reduces Perceptibility of Temporal Blurring
Original | Temporal Blurring (via CNM), 500 ms frame | |
Less Stationary Piano Excerpt (Ψ=27 ms) | pianoLS_org.wav | pianoLS_cnm.wav |
More Stationary Piano Excerpt (Ψ=230 ms) | pianoMS_org.wav | pianoMS_cnm.wav |
Demo 4: Similarity of Artifacts
At shorter frame durations the artifacts caused by random frame phase (RFP) sound similar to artifacts of convolutional noise model (CNM), oracle binary mask (OBM), and oracle magnitude recovery (OMR). Random frame phase means simply multiplying all samples of each time domain frame by either +1 or -1, chosen at random.
1 ms | 4 ms | 10 ms | |
Random Frame Phase | low_RFP_1ms.wav | low_RFP_4ms.wav | low_RFP_10ms.wav |
Convolutional Noise Model of Eqn. (9) | low_CNM_1ms.wav | low_CNM_4ms.wav | low_CNM_10ms.wav |
OBM | low_OBM_1ms.wav | low_OBM_4ms.wav | low_OBM_10ms.wav |
OMR | low_OMR_1ms.wav | low_OMR_4ms.wav | low_OMR_10ms.wav |
Demo 5: Some Musical Examples
Four musical excerpts (2 sec each) processed with convolutional noise model (CNM), oracle binary mask (OBM), and oracle magnitude recovery (OMR) for the case of coffee shop noise at 0 dB SNR.
1 ms | 4 ms | 10 ms | 40 ms | 100 ms | 400 ms | |
Convolutional Noise Model eqn. (9) | CNM_1ms.wav | CNM_4ms.wav | CNM_10ms.wav | CNM_40ms.wav | CNM_100ms.wav | CNM_400ms.wav |
OBM | OBM_1ms.wav | OBM_4ms.wav | OBM_10ms.wav | OBM_40ms.wav | OBM_100ms.wav | OBM_400ms.wav |
OMR | OMR_1ms.wav | OMR_4ms.wav | OMR_10ms.wav | OMR_40ms.wav | OMR_100ms.wav | OMR_400ms.wav |
Original music file is original.wav.
All 19 of these .wav files are available in FrameDurationDemo5.zip (12 MB)