Synthesis of Sound and Art in Vocaloid

I panic when I get handed the aux cord. Every single time. And I’m not being pretentious when I say I have some niche taste: a big chunk of what I listen to is Vocaloid, and it doesn’t tend to go over very well in group settings. You hand me the reins in the car, and I’ll put on Kareshi no Jude by syudou or Bocca Della Verita by Hiiragi Kirai and then we’ll sit there in silence for a bit while you make a face at the stereo, and then I’ll start compulsively explaining, because I can see that you aren’t getting it. I mean, you likely literally aren’t getting it because it’s in Japanese, but even when I try to translate as much as I can from memory (of what the YouTube captions said) you still don’t get it. And it wasn’t until the last couple times I went through trying to explain the appeal of a certain Vocaloid song that I really noticed what I was doing: I would inevitably start describing the music video, or the album cover, or the concept behind the way Hatsune Miku was designed for the song. To try to explain the sound, I would describe the visuals.

The visuals I’m trying to describe (from Loveit by PinocchioP)

Vocaloid (and other vocal synth software) has a music scene unlike almost any other, because it’s a music scene born out of a visual medium.

What is Vocaloid?

“Vocaloid” is the name of a voicebank synthesizing software published by Yamaha, and also the umbrella term for music created using those voicebanks, as well as the character mascots of the voicebanks themselves. The actual software is the Vocaloid synthesizer engine and its many, many licensed voicebanks. The voicebank is a collection of vocal samples recorded from a real-life singer, and the software allows users to edit and arrange those noises into song.

The actual software’s pretty hard to use.

Development of the project started in 2000, and the product was intended for industry professionals—it would offer a sort of next-level autotune. But it was not nearly as popular as Yamaha hoped. The first two voicebanks, Leon and Lola, were released in 2004, and they sure didn’t sell well.

Leon and Lola’s rather underwhelming box design.

The first three Vocaloids, Leon, Lola and Miriam, were all English-language voicebanks. The next two, MEIKO and KAITO, were the first to speak Japanese. They were also the first voicebanks to be represented as anime characters.


They sold better, but not great; KAITO in particular was a bit of a failure.  It wasn’t until 2007, when Yamaha released the new and improved Vocaloid 2 engine—and with it, Hatsune Miku—that Vocaloid would start to leave its mark.

Hatsune Miku: you know her, you love her

Hatsune Miku was, and is, immensely popular. She’s become synonymous with Vocaloid, and synonymous with blue-haired anime girls as a whole. She’s iconic, she’s enduring, she’s in advertisements with Scarlett Johansson; Miku was the start and the focal point of the Vocaloid scene. After her success, this character model of Vocaloid proved itself the most popular, with MEIKO and KAITO retroactively gaining popularity alongside new Vocaloids like Kagamine Rin and Len.

Kagamine Rin (left) and Kagamine Len (right)

And content created with Vocaloid occupies a unique legal loophole (one that would require its own article to fully explain) but essentially anyone can use it and make money off of what they make. This, coupled with the appeal of the character and voice of Hatsune Miku, made Vocaloid something weird otaku artists gravitated towards: a voice you could use for yourself if you couldn’t (or didn’t want to) sing.

Nowadays, in 2022, there are far too many Vocaloids to list, accompanying a thoroughly massive amount of merch, figures, art, and of course, music.

Ok, so what is “Vocaloid music”?

Music made using a Vocaloid, duh. It’s synthesized from a voicebank, but you’ll immediately draw the ire of any diehard Vocaloid fans if you call the music itself “synthetic”. The characters that represent the voicebanks are the fictitious creations of a company, but the music produced using the Vocaloid software is produced by individual, human creators, to whom it is a disservice to conflate their work with the company that created the software they use. To say a song is “by Hatsune Miku” is kind of like saying that “Für Elise” is by “Piano”: it’s by Beethoven, using a piano.

As a whole, “Vocaloid” is kind of an impossible category because it classifies based on production method rather than by sound, which ends up grouping Patchwork Staccato by Toa right in with Ramen Shop “GROTESQUE” by Utsu-P. But there are general trends in Vocaloid music: it tends toward the aggressively fast, high-pitched, and electronic. These trends come about because Vocaloid is very much a shared scene, a community of wildly creative people who all latched on to this software and its characters as a way to express their visions. It was born primarily on Nico Nico Douga (now just known as Niconico), which is a  Japanese video-sharing platform that was a hub for weird weeb culture. Thus, these early Vocaloid songs needed videos to go along with them.

Why illustration?

Visual-music culture is nothing new–it’s been decades since they invented MTV–but Vocaloid is unique in that at first it only lived on Niconico, and now it’s mostly found on YouTube. Video-sharing sites are where this music is uploaded to first, so the primary way to discover a song or an artist automatically means watching the videos. Because these songs feature anime characters, live action music videos are kind of off the table, so the result has been a booming art and animation scene that evolved alongside the music. The relationship between 3D animation and Vocaloid is a fascinating one, but I’m going to be focusing on the 2D art, the art in the video;  an integral part of the experience of listening to a Vocaloid song.

Illustration of Miku from 孤独毒毒 by syudou

A Vocaloid song is expressed in three parts: the music, the lyrics, and the art. It’s meant to be seen on YouTube or Niconico, it’s meant to be experienced as a whole piece. Rollin’ girl by wowaka is good, but Rollin’ girl by wowaka accompanied by this video is enough to make you cry. And once you’ve seen it with that video, you carry that image with you every time you listen to the song. Even for simpler videos, even ones that are just a still image of a character meant to go along with the song, it adds something to the way you interact with the song, something tangible. Phony by Tsumiki feels different once you add the visual of the girl in the fox mask; Whatever Yama Says Goes by PinocchioP takes on a more desperate flavour with the garish colours and the background image of the tank rolling in.

Whatever Yama Says Goes

A lot of the time, Vocaloid music is meant to express a specific, weird vibe, something it achieves through a combination of sound and illustration. Hachi’s Matryoshka sounds “like sensory overload” according to one of my normal friends, but the creepy, doll-like designs of Miku and GUMI in the video give it an edge that makes the frantic outro of the song almost haunting.

GUMI in Matryoshka

In the end, whether it’s a fully animated story like in PinocchioP’s Reincarnation Apple or a single iconic image like in Deco*27’s The Vampire, Vocaloid videos are a  synthesis of music and art. I think the best example to sum it up happened just a week ago: music producer Utsu-P posted on YouTube saying his new song would have to be delayed a few days because the art wasn’t finished for the music video. Sure, it would’ve been possible to just release the song on its own, but it would be incomplete like that—it was meant to be this intrinsically tied combination of sound and visuals.

And that’s how you end up with me in the front seat of your car trying to show you a picture of an anime girl while you try not to kill us both in a crash: for the sake of the multimedia art experience only possible through screechy, synthesized vocals and their anime mascots.


Comment here.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s