Proceedings of the IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP 2021), Tampere, Finland, October 6-8, 2021
Optimal Frame Duration for Oracle Audio Signal Separation is Determined by Joint Minimization of Two Antagonistic Artifacts
Cite This Publication
Stephen D. Voran, “Optimal Frame Duration for Oracle Audio Signal Separation is Determined by Joint Minimization of Two Antagonistic Artifacts,” in Proceedings of the IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP 2021) Tampere, Finland, October 6-8, 2021.
Abstract: We demonstrate that the optimal audio signal processing frame duration in oracle binary masking and oracle magnitude restoration is determined by joint minimization of two antagonistic artifacts: temporal blurring (which increases with frame duration and log-spectral-error change per unit time (which decreases with frame duration). This is novel — the factors underlying the empirical optimization of frame duration have not been previously identified. Signal stationarity alone cannot explain the existence of an optimal frame duration. Stationarity can explain why a frame duration is too long, but it cannot explain why a frame duration is too short. We introduce a method for measuring the stationarity of an audio signal. We then use this essential tool along with measurements, modeling, and analysis in order to identify the two underlying factors that cause there to be an optimal frame duration. In addition we show that when recovering s from the mixture y = s + n with oracle binary masks or oracle magnitudes, the stationarity of s and the stationarity of n have opposite influences on the optimal frame duration. Increasing the stationarity of s increases optimal frame duration but increasing the stationarity of n decreases optimal frame duration. Stationarity alone cannot explain these opposing influences but our results do.
Keywords: speech enhancement; source separation; frame size; oracle binary mask; stationarity
For technical information concerning this report, contact:
Stephen D. Voran
Institute for Telecommunication Sciences
(303) 497-3839
svoran@ntia.gov
Disclaimer: Certain commercial equipment, components, and software may be identified in this report to specify adequately the technical aspects of the reported results. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the equipment or software identified is necessarily the best available for the particular application or uses.
For questions or information on this or any other NTIA scientific publication, contact the ITS Publications Office at ITSinfo@ntia.gov or 303-497-3572.