Scraping MIDI and tab files

DECIBEL uses a data set of audio, MIDI files and tabs. This data set is based on a subset of the Isophonics Reference Annotations [mauch2009omras2]. The Isophonics data set contains chord annotations for 180 Beatles songs, 20 songs by Queen, 7 songs by Carole King and 18 songs by Zweieck. In my experiments, I only used the songs by the Beatles and Queen, as there were no MIDI or tabs for Zweieck available and there were some inconsistencies in the Carole King annotations.

The decibel.file_scraper.midi_scraper and decibel.file_scraper.tab_scraper modules contains some handy functions to automatically scrape a predefined list of MIDI and tab files from the internet. Using these functions, you can either reproduce my experiments on the Isophonics dataset or create your own data set of MIDI and tab files.

Scraping MIDI files

This module contains all the methods you need for scraping either a single MIDI file or a predefined set of MIDI files from the Internet.

decibel.file_scraper.midi_scraper.download_data_set_from_csv(csv_path: str, midi_directory: str)[source]

Download a data set of MIDI files, as specified by the csv file in csv_path, and put them into midi_directory. If a MIDI file cannot be downloaded successfully, for example because the file already existed or because the Internet connection broke down, then the function continues with downloading the other MIDI files. After trying to download all prescribed MIDI files, this function returns a message indicating the number of MIDI files that were downloaded successfully and the number of MIDI files for which the download failed.

Parameters
  • csv_path – Path to the csv file with lines in format [midi_name];[midi_url] (for example IndexMIDI.csv)

  • midi_directory – Local location for the downloaded files

decibel.file_scraper.midi_scraper.download_midi(midi_url: str, midi_directory: str, midi_name: str) -> (<class 'bool'>, <class 'str'>)[source]

Download a MIDI file from the Internet, using the midi_url and place it in the midi_directory, called midi_name. Return a message indicating success or failure.

Parameters
  • midi_url – Location of the MIDI file on the Internet

  • midi_directory – Local directory where the MIDI file should be placed on your machine

  • midi_name – File name of your MIDI file

Returns

Boolean and str message, indicating success or failure

Scraping Tab files

This module contains all the methods you need for scraping either a single tab file or a predefined set of tab files from the Internet.

decibel.file_scraper.tab_scraper.download_data_set_from_csv(csv_path: str, tab_directory: str)[source]

Download a data set of tab files, as specified by the csv file in csv_path, and put them into tab_directory. If a tab file cannot be downloaded successfully, for example because the file already existed or because the Internet connection broke down, then the function continues with downloading the other tab files. After trying to download all prescribed tab files, this function returns a message indicating the number of tab files that were downloaded successfully and the number of tab files for which the download failed.

Parameters
  • csv_path – Path to the csv file with lines in format [url];[name];[key];[filename] (for example IndexTabs.csv)

  • tab_directory – Local location for the downloaded files

decibel.file_scraper.tab_scraper.download_tab(tab_url: str, tab_directory: str, tab_name: str) -> (<class 'bool'>, <class 'str'>)[source]

Download a tab file from the Internet, using the tab_url and place it in the tab_directory, called tab_name. Return a message indicating success or failure.

Parameters
  • tab_url – Location of the tab file on the Internet

  • tab_directory – Local directory where the tab file should be placed on your machine

  • tab_name – File name of your tab file

Returns

Boolean and str message, indicating success or failure

mauch2009omras2

Mauch, Matthias, et al. “OMRAS2 metadata project 2009.” Proc. of 10th International Conference on Music Information Retrieval. 2009.