====================== Aligning MIDI to audio ====================== In order to receive audio-timed chord labels from a MIDI file, DECIBEL first finds an optimal alignment from the Midi file to the audio file, realigns the MIDI file using this alignment and then uses a MIDI chord recognizer to estimate the chord labels on the realigned MIDI file. The audio-midi aligner contains methods to re-align the MIDI file to the audio file. For alignment between MIDI files and audio recordings, DECIBEL uses a DTW algorithm by Raffel and Ellis ([raffel2016optimizing]_) Dynamic Time Warping (DTW) is a common technique to align two feature vectors, for example two representations of the same song. Let us have a look at the outline of the algorithm. First, all MIDI files are synthesized using the fluidsynth software synthesizer with the FluidR3_GM soundfont. Now we have a waveform representation for both the audio and the MIDI file, as shown below: **Audio waveform** .. image:: Audio-waveform.png **Synthesized MIDI waveform** .. image:: Synth-MIDI-waveform.png Note that our example MIDI file starts with silence, while in the audio recording the music starts immediately. Also, the MIDI file has a longer duration, as the MIDI file repeats the chorus an additional time, compared to the audio file. Then, the Constant-Q transform is calculated for both the audio and the synthesized MIDI waveform: **Audio CQT** .. image:: Audio-cqt.png **Synthesized MIDI CQT** .. image:: MIDI-cqt.png Features are found by aggregation over the Constant-Q transform vectors. Then, the optimal path between the audio file and the synthesized MIDI is calculated using DTW. This results in an optimal path and the alignment confidence score: **Alignment path** .. image:: Alignment.png In this figure, we see that the alignment path starts not in the coordinate (0, 0), but a bit to the right: the silence at the start of the MIDI file is not mapped to any position in the audio file. The same goes for the end of the MIDI file, which is a superfluous repetition of the chorus. Finally, this alignment path is used to remap the MIDI file to the audio recording: **MIDI re-alignment** .. image:: Alginment-mapping.png Decibel uses the unchanged parameter setting reported in the paper by [raffel2016optimizing]_: ======================= ======================================== Parameter Setting ======================= ======================================== Feature representation log-magnitude Constant-Q transform Time scale every 46 milliseconds Cost function cosine distance Penalty median distance of all pairs of frames Gully 0.96 Band path constraint none ======================= ======================================== Synthesize MIDI files --------------------- .. automodule:: decibel.audio_midi_aligner.synthesizer :members: :undoc-members: :show-inheritance: Aligning synthesized MIDI to audio ---------------------------------- .. automodule:: decibel.audio_midi_aligner.aligner :members: :undoc-members: :show-inheritance: .. [raffel2016optimizing] Raffel, Colin, and Daniel PW Ellis. "Optimizing DTW-based audio-to-MIDI alignment and matching." 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2016.