August 2023: AI Swallowed Hard, Like It Understood
Music generation technology, plus reviews of The Mars Volta live and the new Squid album
Took July off for Reasons. Next month is an ArcTanGent special. Don’t miss out - subscribe below!
Post-Linear Algebra: Generative AI and Music
OpenAI's ChatGPT, a large language model (LLM) configured as a chatbot, has prompted a lot of coverage about AI and its uses. Whilst the technical capabilities of LLMs are not new, this is the first to be wrapped up in a product layer that truly conveys their knowledge of language and the potential of applying this to a bunch of previously untapped domains for the technology. (For a more prosaic example, see AI Dungeon, a text-based adventure game powered by similar tech.)

Let's be clear here - at the current state of maturity, what we refer to as AI tools and models is probably synonymous with the term “applied statistics” - a fusion of mathematical/algorithmic approaches with immense computing power to produce results tailored to our expectations for particular types of question or problem. There's plenty of exciting research on computational neuroscience and psychology that draws parallels between these methods and the inner workings of the brain. But we're a long way from building a single system able to match the generalised abilities of the human mind across all domains - often referred to as Artificial General Intelligence or AGI. (Though you can do fun or perhaps inadvisable things chaining specific types of models or capabilities together.)
At least for now, most AI tools and applications best fit under the category of “potential force multiplier”. This isn't terribly different to the technological advances of the past few decades, and we can probably draw some parallels as such. Take the use of samples and programming for drum recording. These let recording artists circumvent the challenges of getting a good recording sound off a live kit by applying triggered samples over the top from an ostensibly optimally-recorded kit. Or even for hapless “bedroom guitarists” like me to program their drums by drawing in a DAW. You get speed, scale, and a democratisation of what sounds are achievable to artists without large budgets. What's the cost? Well, all drums sound the same now. You've even got drummers aurally learning to play at the velocity profiles of programmed drums (i.e. consistently maxed out) because those are now the foundational records style is learned from. There's a homogenising effect.
Most of the current available tools for AI in music can have this same effect. Auto-mastering technology? Amazing stuff for an often impractical and opaque part of the release process, but if you want to sound more than every other piece of music in the Spotify landfill you'll really want to (and should!) pay a mastering engineer. IK Multimedia’s TONEX to replicate guitar sounds? Now we can all sound like our favourite bands and not some garbage plugin, but as god-tier as that late 80s Mötley Crüe distortion is, do you want it everywhere?
I'm going to chicken out of opining on the socioeconomic impacts here, in favour of writers such as Jaime Brooks who have an excellent perspective on the macroscopic picture. But let's be a bit cynical for a moment and say - y'know, all post-rock bands are basically just repackaging Slint, all math rock bands the same with Cap'N Jazz, all modern metalcore bands just want to re-contextualise Meteora for the TikTok generation. All of this is probably fine? OK, with some limiting cases around perfect mimicry and I guess the threat of a commercial “music singularity” where all music everywhere is just an infinite mumblecore trap beat or something until the end of time, like an even shittier Anomalisa or that one movie where everyone is Rory Kinnear. But imitation is flattery and incremental musical development just as valuable as huge genre steps and so on.
Well, unless the AI can not just facilitate tools but also infinite music generation. Then those animatronic dolls start sounding even more ominous. So let's talk Generative AI and MusicLM.

ChatGPT’s appeal lies partly with the fact that you can enter basically any text prompt, and in a surprisingly broad number of cases get a sensible text result back. Google’s MusicLM, the first of a set of beta AI models released on their AI Test Kitchen, presents a similar entrypoint of “give any text prompt”, with the aim of returning a sensible piece of originally generated music. Unlike forebears such as OpenAI’s MuseNet, which would simply generate MIDI sequences with a whole bunch of restrictions to force “musicality” (e.g. a fixed tempo), MusicLM produces a literal sequence of sounds at 24kHz - which makes the musical consistency of its output impressive. There are a few restrictions - you can't get output if you name actual artists (unclear if this is enforced in the model training or business logic to prevent the recent Drake/Oasis controversies); it's much better suited to ambience and electronics than natural instrumentation or most post-
genres. Each output gives two examples and asks you to vote the best one, giving a natural & sneaky feedback system for model improvement. I've put some example inputs/outputs on Soundcloud to show the good and gnarly.
How does something like this get put together? MusicLM is built from three pre-existing components:
A neural audio codec (SoundStream), which uses a deep neural network to efficiently compress and reconstruct audio - this compressed format, a numerical vector known as an “embedding”, representing each 0.02 second slice of audio, is the interesting bit. It’s this representation that enables the audio to actually sound like something detailed and high-quality.
A sequence generator model known as w2v-BERT. This basically learns how to create convincing long-term sequences of embeddings - in the case of tech like ChatGPT, this is what makes text form a consistent narrative and each word to have some meaning in a wider context. For MusicLM, this works on sequences of audio slices (or rather, their embeddings), and learning the structure of how these relate to each other in a piece of music. With that, you can then generate new sequences that follow the same patterns as the examples you presented - i.e. new music.
A joint text/audio model, MuLan, which learns a common embedding space for both text and music, enabling us to link the two domains. (This is similar to image models like Midjourney, which learn a joint representation of text and images to let you generate new images from text prompts, or auto-generate text captions from images.) MuLan’s secret sauce is 44 million videos (presumably scraped from YouTube) combined with their titles, descriptions and linked playlist titles.
MusicLM then combines these by taking lengths of audio, calculating the sound, sequence & text/audio embeddings, and then using the MuLan (text/audio) representation to learn the semantic relationship between text and sequence. These are then used as inputs to produce sound, which is calibrated to reconstruct the original audio clip. Rinse and repeat with the Free Music Archive and plenty of burned hydrocardons to power those GPUs. For users - take in some text, convert to text/audio embedding, generate a semantic sequence, generate audio.
None of my experimentation yielded a euphoric moment of songwriting or sonic genius - nor should that be the expectation for what is an impressive early tech demo. Sequences are capped at 20 seconds (longer form example outputs are possible and demoed on the MusicLM website). They sound kind of like what they are: weird, lossy audio, though the chiptune tracks are admittedly bangers. Keep iterating on those underlying pieces though - better AI audio compression (or just more expensive computing), longer sequences, better text/audio data and models - it’s not beyond the realms of possibility that you could generate convincing identikit music shaped to whatever you can verbally articulate. Well, maybe that last part remains a challenge…
(See video above for a pretty rough demo made from those ChatGPT tabs)
I initially found the concept of MusicLM fascinating as it tries to drive the generation of music from language. This is mostly a choice of convenience - there was music and text together in a big dataset, most humans can articulate a sentence of language, and what else might you use to synthesise new music? (A simpler example would be existing music or sound; images would be interesting, but not the easiest to interpret or create. Poorly hummed riffs? Guitar tabs? Internet search history?) But it gets at a fundamental (and in the case of this newsletter, existential) question - how do we use language to articulate music?
My list of tropes from May's post were partly phrased for comedic effect, but partly because I don't know the technical name for reverb-drenched clean harmonised tremolo guitar lines, or if one exists. (Actually that probably is the correct term, but anyway.) Adam Neely's YouTube video breaking down the key change in Celine Dion's version of All By Myself as a canonical example of “modal mixture common tone enharmonic double chromatic mediant modulation” is a delightful breakdown of complex theoretical jargon. Neither of these have much bearing on the forms of text built into MusicLM.
It’s the third component, MuLan, and its vast YouTube-derived dataset that is doing all the linguistic heavy lifting here. Obvious jokes about clickbait prompts aside - this does present a quite specific type of link between music and text, namely that designed to articulate what the music is for a streaming platform that wants you to keep using it to generate revenue. And it misses so much of the detail - text is linked to 30 second clips of audio from each video, so there’s no way to calibrate to particular intra-music motifs like flat 9th dissonance, breakdowns or catharsis. Sections of my own prompts were often fully ignored where they became too specific (staccato flugelhorns not a thing, apparently).
The power we observe of AI across so many domains invariably comes back to the quality of the data we can put together to build it. ChatGPT works because as a species we write so much damn stuff online. Midjourney and DALL-E work because people made a ton of art and put it on the Internet, and they and others wrote a bunch of stuff about it, and copyright law hasn’t caught up. There’s some scalable resources for music and text/audio out there, but it seems an equivalent “free data lunch” for music and text has yet to be tapped into (or at least made public). This leaves a need for new data generation to yield improvements (MusicLM was released with a curated, if slightly vanilla, set of annotations for 5,500 audio clips).
A lot of my initial experimentation with MusicLM failed because of my personal approach to language and music. I want to talk about technical elements and emotions in the same sentence, toy with the absurd, and maybe make myself chuckle and others roll their eyes at the annoying pretension. Power users of ChatGPT talk extensively about “prompt engineering” - the art of articulating your textual ask to invoke the best possible response by leaning into what the model expects. I could probably spend time learning how to get MusicLM and its inevitable descendants to bend to my whims, to lean into its strengths (which are definitely not math rock) by abandoning my voice in favour of what it best responds to, and become another identically-voiced animatronic doll. I guess that’s one way to make content.
Infinite music is still a pretty cool concept. Lots of folks doing exciting stuff with algorithmic music and procedural generation, and I’ll doubtless be writing more on this. For now - I leave you with 65daysofstatic and their Infinite Wreckage stream, which is generating fresh, original audio 24/7 using slightly simpler modelling approaches and a lot more artistic curation. It’s probably not this timeline’s future, but it is excellent.
From the Pit: The Mars Volta
Troxy, London, 18th June 2023
Amidst a busy (but not sellout) crowd at the Troxy waiting for The Mars Volta, the person next to me was Instagramming the empty stage with the caption “Ready to party like it’s 2003”. Which - well, yeah, that’s why most of us were there, to be honest. But did Omar Rodriguez-Lopez and Cedric Bixler-Zavala have different ideas? Last year’s eponymous album following a surprise stealth reunion divided opinion with a series of short & sharp pop or pop-adjacent songs, a focus on hooks and tight structure in place of the 10+ minute odysseys of yesteryear. Would this also set a new blueprint for their live show approach?
Turns out - not so much. Omar and Cedric have probably earned the right to do what they want after releasing the seminal Relationship of Command with At The Drive In, followed by three stellar progressive, catchy, kitchen sink-included albums (OK, some of you probably don’t like Amputechture, but I maintain it is a masterpiece) - but the economic demands of such a tour (£62 a ticket!) mean you can’t skip the hits, including most of both Deloused At The Comatorium and Frances The Mute.
This is big, heavy music with a large supporting cast of improvising players - and special mention must go to touring drummer Linda-Philomène Tsoungui, who is a masterful powerhouse behind the kit throughout, especially on the insane fills of Cygnus… Vismund Cygnus. Cedric still has the energy and dynamism of 20 years prior, leavened with gravitas - the voice is still unmistakeably strong and distinctive, though the highest-pitched wails definitely tail off in volume. Omar’s guitar work and soloing is joyous to behold - as much as a distorted wah-pedal guitar sound might feel dated in lesser hands, it remains a thing of beauty here.
But yes - the hits. There’s a standard formula for most of “play the main hooks, get to the quieter bits, do big extended improv, let it run maybe a minute too long, finish the song”. This is pretty effective - the chorus of Cicatriz ESP opens a sizeable mosh pit after a good 40 minutes of restraint; L’Via L’Viaquez leaves no hips unswayed during the Latin breakdowns; arms and voices reach for the ceiling through Roulette Dares. Solos taken throughout those improv sections are good, but this isn’t as well-formed or as gripping as, say, the live shows of Kamasi Washington or even the instrumental breaks of The War On Drugs. Amidst this, we get a couple of those short, sharp pop songs from the new album, delivered in that same style, which certainly breaks up or perhaps even jars with the broader formula. Graveyard Love is an objectively great song and translates well live, but Shore Story fails to make the same impact - and the less freeform old songs such as Televators, and even the unlikely set opener Vicarious Atonement, yield far more dividends.
Which leaves us with a set of extremes - tight, constrained delivery of the new, and big, overextended jamming of the old. It feels like an effective middle ground could exist here with a bit of cross-pollinated restraint and wildness. But that wouldn’t be The Mars Volta, would it? Cedric called out the new album haters between songs, stating they made an album that reflected the music they wanted to make, and people still showed up to their tours. Looking past the fallacies of that argument - I think we need The Mars Volta to be whatever they want to be, no matter how maddening. I mean, I do - and frankly I’m delighted to have finally seen a band so important to my own music journey, and for it to have been mostly pretty excellent.
Support was provided by Teri Gender Bender - which, whilst bearing the hallmarks of an Omar Rodriguez-Lopez collaboration act (c.f. 14 EP releases in the last 18 months alone), is very much front-and-centre led by Teri herself. A mustachioed force of nature on stage (and off it, amongst the crowd), garbed in bright technicolour and LED cape, gyrating, singing, yelping, screaming, reciting odd poetry on limes and carbonation between synthy garage songs. Not boring!
On Rotation: Squid - O Monolith
Released June 9th, 2023 on Warp Records
I used to live a ten minute walk from the Brixton Windmill venue back in 2019. The fact that I've never been there is thus egregious for missing the “moment” where Dan Carey-produced South London post-punk was suddenly A Big Deal. The Speedy Wunderground label put out, in rapid succession, three incredible singles from black midi, Squid and Black Country, New Road, with the Windmill being both spiritual home and creative hotbed of this new movement of weird and “alternatively popular” music.
I might have missed the activity on my doorstep, but I did get to see both Black Country, New Road and Squid in early 2019 opening for shoegazers Our Girl, and shortly after discovered black midi's KEXP set, and I was as hooked as the alternative Rough Trade commentariat. So where are we nearly five years on? black midi have doubled down on the avant-garde and jazz fusion of their sound, throwing the kitchen sink into last year's third album Hellfire. Black Country, New Road looked set to conquer all before them until the abrupt departure of frontman Isaac Wood, and are on a journey of reinvention having retired their musical output to date.
Squid meanwhile have ploughed on, signing unexpectedly to electronic imprint Warp Records and releasing an exceptional debut album in Bright Green Field, but perhaps not quite receiving the same adulation as their peers. The five-piece have a unique post-punk sound built around a love of intricate syncopated rhythm, off-kilter attention grabbing synths and percussive instrumentation (oh, and a cornet), and the snarling commentary of drummer/vocalist Ollie Judge. New album O Monolith brings those same sensibilities, coupled with a looser, less propulsive momentum than their debut.
There's still plenty of analogues to Bright Green Field and earlier material here. After the Flash has hints of the big soaring synths we saw on Boy Racers, and even features another wailing cameo from Martha Skye Murphy à la Narrator (sadly nothing on the album reaches that pinnacle achievement). Green Light has some of the propelling intensity of early single Houseplants, swapping out the latter's millennial social commentary for something more claustrophobic; more paranoid.
It's very much a consolidation (or perhaps the dreaded “maturity”) of a sound rather than pushing of boundaries. That being said, there are few dull moments in the tight runtime - it's highly consistent, short on flab or unwarranted excess. There's nary a standard verse-chorus structure throughout, and quite right too. A solid and highly listenable sophomore effort.
Listen on Spotify | Listen on TIDAL
(I may have accidentally set a theme for album review choices, with the last two editions focusing on bands named for bodies of water and now for creatures that live in water? This will be hard to maintain…)
Have you heard… Stephen Taranto
Every now and then I have another go at trying to get into Allan Holdsworth and jazz fusion, but it never quite sticks. You go looking for inspiration in those rapid guitar and organ licks, but maybe come unstuck at the cheesy 80s vocals, or just the heavily dated New Age elements? Maybe I need the more comforting tropes of, say, modern progressive metal?
Step forward Stephen Taranto - virtuosic Australian guitarist whose credits include bands such as The Helix Nebula and I Built The Sky. Cited by progressive rock luminary Plini as one of his favourite musicians, his 2019 EP Permanence shares the hallmarks of similar instrumental guitarists like Sithu Aye, with lightning fast scale runs and heavy riffs, but with some sparkling background instrumentation showing a depth to the songwriting beyond the fretboard heroics.
Delightfully comforting fare. Still, maybe I should give Road Games another go. Or if not that, then maybe the best bits of the Gran Turismo 3 soundtrack (no, not the Feeder songs - I obviously mean the menu track bangers).
Listen on Spotify | Listen on TIDAL