Extracting chords from tabs¶
DECIBEL parses the tabs in a similar way to the parser proposed by cite{macrae2012linking}. First, it classifies each line in the tab file to a line type. Then, it segments the tab by splitting on empty lines. As a next step, all systems in each segment are identified. We define a system as a set of subsequent lines that belong to each other. For example: a tab system is very common in guitar tablature files and consists of exactly six subsequent tab files. In chord sheets, a common system is the alteration between chord lines and lyrics lines. From these systems, DECIBEL can then extract the chord labels. Thereby, the system retains line information (i.e. the line of the chord in the text file), as this will be used in the tab-audio alignment step.
Line type classification¶
For each line in the tab file, DECIBEL estimates its line type using a set of heuristics. We distinguish the following line types: empty, chords, tuning definition, capo change, structural marker, chord definition, tablature, lyrics and combined chord and lyrics.
A line has the empty line type if it is empty or consists only of a space.
To determine if a line has the chords line type, the system first splits the line by spaces. Then it checks if each element of the line matches the chord pattern. An element is a chord if:
It consists of at most 10 characters;
The first character is a note letter (a, b, c, d, e, f or g);
The element does not contain sequences of four digits (as this would indicate a chord definition);
The element does not contain symbols; and
Any three letter sequence is in the following list: {min, add, aug, dim, maj, sus, flat}.
A line has the tuning line type if it contains the word “tuning”.
To find out if a line is a structural marker, our parser searches for words like “verse”, “chorus” or “bridge”.
A line is a chord definition if it contains a sequence of exactly 6 digits.
A line is a tablature line if it contains at least 10 characters which are a digit, hyphen, vertical bar, slash, letter “h”, “b” or “p” or a space and the number of hyphens is larger than the number of spaces.
A lyrics line fulfills each of the following conditions:
It does not contain square brackets, the equals sign or the at sign;
It contains at most 10 hyphens;
Either:
It consists of just one word, which has only letter characters and contains at least three of the same letters after each other, like “oooh” or “aaah”; or
It contains at least two words.
To find out if a line is a combined chords and lyrics line, DECIBEL first extracts all characters between square brackets ([ and ]). If these characters form a chord, it removes all characters between each pair of square brackets. If the remaining line is a lyrics line, we can conclude that our original line was a combined chords and lyrics file.
If a line does not satisfy any of these requirements, the parser assigns the line type undefined.
Segmentation¶
At this point, we have line types for each line in the tab. As a next step, our parser divides the tab file into segments. A segment is simply defined as the lines between lines of the empty line type. In a typical tab, verses and choruses are separated by empty lines and therefore are designated to their own segment.
System and chord extraction¶
Now we have our tab file divided into segments, and we know the line type of each of the lines in a segment. We want to extract the chord labels, but we only extract them from a suitable system. A system consists of all of lines that should be played or sung together at the same time. For example, in a typical chord sheet, we will find a lot of systems consisting of a chord line above a lyrics line, as the chords in the chord line should be played at the same time as the word in the lyrics line should be sung.
-
decibel.tab_chord_parser.tab_parser.
classify_all_tabs_of_song
(song: decibel.music_objects.song.Song) → None[source]¶ Classify all tabs of a song, by (1) LineType classification; (2) Segmenting lines; (3) System and Chord extraction.
- Parameters
song – A Song in our data set, for which we want to parse all tabs