Song lyrics depend heavily on rhythm, syllables and rhyme (in some songs such as pop songs). Some poetry visualizations add white space between words and lines, which can then be filled with various visualization techniques, such as forming links between related words. Instead, if a lyric is considered like a stacatto sequence of syllables, the layout is more akin to a set of tiles locked together. Then instead of whitespace, visualization is constrained to the tiles.
Simple tiles with English and phonetic syllables
To start, consider Billie Eilish’s Bad Guy. Similar sounds (e.g. rhyme) don’t visibly pop-out in English text. Our goal is to encode those to make them visible. A simple approach is to convert English words to phonetic alphabet, so that the same sounds have the same phonetic symbol:
You can visually scan the phonetic symbols, but you have to look closely at the letter shapes: Rhymes are driven by the vowel sound, which may or may not be at the end of the syllable. Furthermore, in the international alphabet, some vowel sounds are represented by a single symbol and some are represented by two symbol thus making it difficult to attend to the relevant symbols. With phonetic symbols, sounds are comparable, but don’t visually pop-out.
Color-coded vowel sounds
How to make the sounds visually pop-out? Each syllable is a collection of phonemes for vowels and consonants, typically leading consonant(s), vowel(s), and trailing consonant(s). However, there are ~23 consonant phonemes and 16 vowel phonemes in English. Encodings such as brightness, font-weight, etc., don’t scale well to 16-23 uniquely discernible categories. Color is a possibility color — particularly given that some phonemes are similar sounding. Using a confusion matrix, colors can be chosen so that close-sounding sounds have similar colors (although vowel frontness and vowel origin matrix might be better).
Here is a variation where the phoneme is split into three parts:
– leading consonant in light italic serif font
– central vowel in heavyweight sans font, color coded to the vowel
sound, with similar sounds in similar colors
– trailing consonant sounds in a heavyweight serif font
You can easily scan and notice similar vowel sounds in final syllable of each line, plus the trailing consonant – aka the rhymes (e.g. gaɪ). You might also notice some other phonetic techniques such as the leading repetition in the chorus meɪk / maɪt, or near rhymes such as ˈkrɪ–mə–nəl / ˈsɪ–nɪ–kəl.
On the otherhand, using the phonetic alphabet results in some unfamiliar symbols for most native English speakers, e.g. ʌ for “uh” or ʃ for “sh”.
Instead, the tile background can be color-coded and the text switched to English spelling:
But the sound of the trailing consonant has been lost: guy and type have the same vowel sound, but don’t perfectly rhyme due to differing trailing consonant. Worse, nose, toes, and knows, actually do rhyme but are spelled quite differently.
Fun with a polychromatic font
A polychromatic font is a font specifically designed for use with multiple colors. There are a few different fonts that support multiple colors, by providing multiple versions of the font that align overtop each other. Mostly these fonts are available for purchase, not freely available. The example below uses the font Up up and away:
In the example, below, the inside color is the vowel sound, the outside color (and the gratuitous 3D) is the final consonant sound. If there is no final consonant, then background color is used:
This is just for fun – “Hey, I’ve got this great font, let’s try it out and see what happens”. It has long been known that adjacent colors influence the perception of a color. In practice, this would never work perceptually for effective visualization but could make some viscerally-exciting data-driven text. And some of the color combinations aren’t very legible. See Josef Albers Interaction of Color for awesome paintings of the effect:
Textures! (plus color and text)
Finally, we get to a version with a tile where:
– English text is used per tile
– Color indicates the vowel sound
– Texture indicates the final consonant sound (if no consonant, then no texture)
Since color is dominant, it can be seen the guy and type are the same color and thus the same vowel sound. However, type, with the ending p sound, gains the p texture, thus differentiating it from guy. Tough, rough, e-nough all share the same color with puffed, but the texture change gives away the slightly different color between puffed and the others.
Colors are created so that similar vowel sounds have similar colors. Likewise for textures, similar consonant sounds attempt to have similar textures. If rhyme is largely based on the vowel and trailing consonant, this color and texture per syllable create visible patterns across the tiles, visually showing rhyming scheme as well as other phonetic devices. Note similarities also at beginning of lines, e.g. Sleepin‘/ Creepin‘, or Own me/ I’ll be/ with me/ If she/ pity.
At a high-level, sub-columns of same color, same (or similar) trailing consonant visually standout revealing some of the textual structure running through sections of the lyrics.
Brig really (really) likes Abba. What happens when we use this to visualize Dancing Queen?
Many rhyming pairs are immediately apparent: scene / queen; low / go; swing / king; guy / high. And near rhymes stand out too: queen / sweet / teen / beat / rine all share the long E vowel (purple), and flip between a trailing n or t (diamond hatch vs horizontal line). The near match is also apparent in jive / life (both purple but sawtooth vs x texture).
At a more meta-level, Dancing Queen seems to have more of a blue/purple consistency compared to Bad Guy that tends to be purple and punctuated with other other distinct colors such as cyan and chartreuse.
What about something that isn’t quite so pop music, less lyric driven? Everything above is focused purely on words, i.e. poetry. Pitch, duration and the many other music variables haven’t been considered, and certainly there are many other music visualization techniques (e.g. Ethan Hine, Brian Cort). A linguistic musician tells me genres may use near rhymes rather than perfect rhymes, or may alter the inflection or pronunciation of words to get rhymes (thanks Craig). So, here’s grandson’s Dirty:
It is more difficult to define line length and color appears more random as well. There’s no predominant color across the entire lyrics. Unlike Bad Guy and Dancing Queen, there are no columns of color although there are some localized pockets of color. Perfect rhyming pairs exist, such as silence/ violence; sunset / up yet; neighbor / nature; but don’t prevail. There are some near rhymes too such as so go / to go / do you or floorboard / forewarned. There’s a lot more repetition of singular words such as time, you, love, for. And the tiles also help show near repetition of phrases such as: is it time / is it in / isn’t that; or do you love / do you have.
So perhaps the approach also works, but in this case different aspects are lyrics are creating different patterns and potentially different or additional elements need to be visualized as well.
Note: A rough implementation of the above is available as an Observable notebook. I had a few challenges with fonts and leveraged Riccardo Scalco’s texture.js to create the many different textures.