Scraping MIDI and tab files¶
DECIBEL uses a data set of audio, MIDI files and tabs. This data set is based on a subset of the Isophonics Reference Annotations [mauch2009omras2]. The Isophonics data set contains chord annotations for 180 Beatles songs, 20 songs by Queen, 7 songs by Carole King and 18 songs by Zweieck. In my experiments, I only used the songs by the Beatles and Queen, as there were no MIDI or tabs for Zweieck available and there were some inconsistencies in the Carole King annotations.
The decibel.file_scraper.midi_scraper
and decibel.file_scraper.tab_scraper
modules contains some handy
functions to automatically scrape a predefined list of MIDI and tab files from the internet. Using these functions, you
can either reproduce my experiments on the Isophonics dataset or create your own data set of MIDI and tab files.
Scraping MIDI files¶
This module contains all the methods you need for scraping either a single MIDI file or a predefined set of MIDI files from the Internet.
-
decibel.file_scraper.midi_scraper.
download_data_set_from_csv
(csv_path: str, midi_directory: str)[source]¶ Download a data set of MIDI files, as specified by the csv file in csv_path, and put them into midi_directory. If a MIDI file cannot be downloaded successfully, for example because the file already existed or because the Internet connection broke down, then the function continues with downloading the other MIDI files. After trying to download all prescribed MIDI files, this function returns a message indicating the number of MIDI files that were downloaded successfully and the number of MIDI files for which the download failed.
- Parameters
csv_path – Path to the csv file with lines in format [midi_name];[midi_url] (for example IndexMIDI.csv)
midi_directory – Local location for the downloaded files
-
decibel.file_scraper.midi_scraper.
download_midi
(midi_url: str, midi_directory: str, midi_name: str) -> (<class 'bool'>, <class 'str'>)[source]¶ Download a MIDI file from the Internet, using the midi_url and place it in the midi_directory, called midi_name. Return a message indicating success or failure.
- Parameters
midi_url – Location of the MIDI file on the Internet
midi_directory – Local directory where the MIDI file should be placed on your machine
midi_name – File name of your MIDI file
- Returns
Boolean and str message, indicating success or failure
Scraping Tab files¶
This module contains all the methods you need for scraping either a single tab file or a predefined set of tab files from the Internet.
-
decibel.file_scraper.tab_scraper.
download_data_set_from_csv
(csv_path: str, tab_directory: str)[source]¶ Download a data set of tab files, as specified by the csv file in csv_path, and put them into tab_directory. If a tab file cannot be downloaded successfully, for example because the file already existed or because the Internet connection broke down, then the function continues with downloading the other tab files. After trying to download all prescribed tab files, this function returns a message indicating the number of tab files that were downloaded successfully and the number of tab files for which the download failed.
- Parameters
csv_path – Path to the csv file with lines in format [url];[name];[key];[filename] (for example IndexTabs.csv)
tab_directory – Local location for the downloaded files
-
decibel.file_scraper.tab_scraper.
download_tab
(tab_url: str, tab_directory: str, tab_name: str) -> (<class 'bool'>, <class 'str'>)[source]¶ Download a tab file from the Internet, using the tab_url and place it in the tab_directory, called tab_name. Return a message indicating success or failure.
- Parameters
tab_url – Location of the tab file on the Internet
tab_directory – Local directory where the tab file should be placed on your machine
tab_name – File name of your tab file
- Returns
Boolean and str message, indicating success or failure
- mauch2009omras2
Mauch, Matthias, et al. “OMRAS2 metadata project 2009.” Proc. of 10th International Conference on Music Information Retrieval. 2009.