Evaluation of Python Sound Modules

Background

For a project, I needed to evaluate sound processing modules for use in software written in Python. After finishing my evaluation, I decided to make the evaluation criteria and results publicly available. This way, more people can also help themselves and other people by sending me corrections and information about new Python sound modules.

2022 Oct 31 update

During the last 15 years, there have been big changes in the Python audio scene. If you wish to select a Python sound module for your project, I recommend that you head to the Real Python tutorial about playing and recording sound in Python. It was last updated at January 2021.

Another source of information is the wiki page on audio in Python.

The following information is obsolete.

2007 Sep 24 update

Elijah Rutschman brought to my attention an additional Python sound processing package, and I decided to make this Web page even more useful by adding information about it, even if only partial.

Criteria

For my project, I needed the following evaluation criteria:

  1. Work with my Python scripts (duh).
  2. Multi-platform (at least Linux and MS-Windows XP).
  3. Real time sound acquisition from soundcard and making it available for subsequent processing at real time.
  4. Support for 16KHz sampling rate of >8-bit sound.
  5. Process a sound file, not necessarily at real time.
  6. Efficient

In addition, I needed the following speech sound processing capabilities:

  1. Determine whether there is pitch and if yes, its frequency.
  2. Locate the lowest three formants and their bandwidths.
  3. Find the power of the sound.
  4. Perform 256-point FFT of the sound, after applying to it standard pre-emphasis and Hamming window.

Evaluated packages

Snack

Multi-platform
The same scripts are usable on Windows 95/98/NT/2K/XP, Linux, Macintosh, Sun Solaris, HP-UX, FreeBSD, NetBSD, and SGI IRIX.
Real-time sound acquisition
Yes.
Support for 16KHz sampling rate and >8-bit sound
Yes.
Sound file processing
Yes.
Efficiency
Inefficient - data is converted into string by the Tcl part of the package and then converted back into data by the Python part.
Pitch existence and frequency
Yes. Each 10mSec, using the ESPS method (the ADMF method is available, too).
Formants and their bandwidths
http://www.speech.kth.se/snack/man/snack2.2/tcl-man.html#sound - see the formant subcommand.
FFT with pre-emphasis and Hamming window
See above link - the powerSpectrum subcommand.
Power
See above link - the power subcommand.

ossaudiodev

Multi-platform
Not enough. Implemented in Linux and FreeBSD. Available for a wide range of open-source and commercial Unices. But apparently not for MS-Windows.
Real-time sound acquisition
Blocking reads, by default. Probably can set to non-blocking.
Support for 16KHz sampling rate and >8-bit sound
Seems to depend upon the sound card.
Sound file processing
Use another package for this.
Efficiency
Direct I/O access.
Pitch existence and frequency
Use another package for this.
Formants and their bandwidths
Use another package for this.
FFT with pre-emphasis and Hamming window
Use another package for this.
Power
Use another package for this.

winsound

Not relevant for our needs. This module knows only to play existing sound files.

MCI.py (from Arik Baratz) together with ctypes.py

Multi-platform
ctypes.py is supported by all 32-bit MS Windows (95/98/NT/2000/XP), All BSD Platforms (FreeBSD/NetBSD/OpenBSD/Apple Mac OS X), All POSIX (Linux/BSD/UNIX-like OSes), WinCE.
MCI.py was designed to communicate with MS-Windows winmm.dll.
Real-time sound acquisition
Unknown.
Support for 16KHz sampling rate and >8-bit sound
Unknown.
Sound file processing
Seems to be able to record to a file.
Efficiency
Commands are sent as strings.
Pitch existence and frequency
Use another package for this.
Formants and their bandwidths
Use another package for this.
FFT with pre-emphasis and Hamming window
Use another package for this.
Power
Use another package for this.

PyMedia.py

The documentation is very sketchy.

Multi-platform
Package is compilable for MS-Windows, Linux and cygwin.
Real-time sound acquisition
Unknown
Support for 16KHz sampling rate and >8-bit sound
Probably depends upon sound card.
Sound file processing
Yes.
Efficiency
Unknown.
Pitch existence and frequency
Use another package for this.
Formants and their bandwidths
Use another package for this.
FFT with pre-emphasis and Hamming window
Use another package for this.
Power
Use another package for this.

Additional Packages

PyAudio

The following only summarizes information from the PyAudio Web page.

PyAudio provides Python bindings for the PortAudio audio I/O library. The current version of PyAudio is V0.1.0, which is alpha quality.

Multi-platform
Package is compilable for MS-Windows, Apple Mac OS X, Linux and cygwin.
Real-time sound acquisition
Unknown
Support for 16KHz sampling rate and >8-bit sound
Unknown.
Sound file processing
Unknown.
Efficiency
Unknown.
Pitch existence and frequency
Unknown.
Formants and their bandwidths
Unknown.
FFT with pre-emphasis and Hamming window
Unknown.
Power
Unknown.