AudioFlux is a Python library that provides deep learning tools for audio and music analysis and feature extraction. It supports several time-frequency analysis transform methods, which are techniques for analyzing audio signals in both the time and frequency domains. Some examples of these transformation methods include the short time Fourier transform (STFT), the constant Q transform (CQT), and the wavelet transform.
In addition to time-frequency analysis transformations, AudioFlux also supports hundreds of combinations of corresponding time-domain and frequency-domain features. These functions can be used to represent various characteristics of the audio signal, such as its spectral content, its temporal dynamics, and its rhythmic patterns. These features can be extracted from the audio signal and used as input to deep learning networks for classification, separation, music information retrieval (MIR), and automatic speech recognition (ASR) tasks.
For example, in music classification, AudioFlux might extract a set of features from a piece of music, such as its spectral centroid, mel frequency cepstral coefficients (MFCC), and its zero crossing rate. These features could then be used as input to a deep learning network trained to classify music into different genres, such as rock, jazz, or hip-hop. AudioFlux provides a complete set of tools for analyzing and processing audio signals. This is an essential asset for professionals and academics studying and applying methods to analyze audio and music.
The main functions of audio stream include transform, characteristicand look modules.
- Transform: The “Transform” function in audioFlux offers various time-frequency representations using transformation algorithms such as BFT, NSGT, CWT, and PWT. These algorithms support several types of frequency scaling, including linear, mel, bark, erb, octave, and logarithmic scale spectrograms. However, some transformations, such as CQT, VQT, ST, FST, DWT, WPT, and SWT, do not support multiple types of frequency scaling and can only be used as independent transformations. AudioFlux provides detailed documentation on the functions, descriptions, and usage of each transform. The synchronization or remapping technique is also available to tune the frequency representations of time using algorithms such as reassign, synsq, and wsst. Users can refer to the documentation for more information on these techniques.
- Characteristic: The “Features” module in audioFlux offers several algorithms, including spectral, xxcc, deconv, and chroma. The spectral algorithm provides spectrum functions and supports all spectrum types. The xxcc algorithm provides cepstrum coefficients and supports all spectrum types, while the deconv algorithm provides deconvolution for the spectrum and supports all spectrum types. Finally, the chroma algorithm offers chroma functionality, but only supports the CQT spectrum and can be used with a BFT-based linear or octave scale.
- LOOK: The “MIR” module in audioFlux includes several algorithms such as pitch detection algorithms like YIN, STFT, etc. The startup algorithm provides spectrum flow and novelty, among other techniques. Finally, the hpss algorithm offers median and MFN filtering techniques.
The library is compatible with various operating systems, including Linux, macOS, Windows, iOS, and Android. When audioFlux’s performance was compared to other audio libraries, it was found to be the fastest and had the shortest processing time. The test used sample data of 128 milliseconds each (with a sample rate of 32,000 and a data length of 4,096), and the results were compared across multiple libraries. The following table shows the time it takes each library to extract features for 1000 data samples.
The documentation for the package can be found online: https://audioflux.top.
AudioFlux is open to collaboration and welcomes contributions from interested persons. Users must first fork the latest git repository and create a feature branch to contribute to. All submissions must pass continuous integration tests. Additionally, AudioFlux invites users to suggest improvements, including new algorithms, bug reports, feature requests, general inquiries, etc. Users can open an issue on the project page to start these discussions.
review the Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 16k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Niharika is a technical consulting intern at Marktechpost. She is a third year student, currently pursuing her B.Tech from the Indian Institute of Technology (IIT), Kharagpur. She is a very enthusiastic individual with a strong interest in machine learning, data science, and artificial intelligence and an avid reader of the latest developments in these fields.