Japanese Syllable Structure and the Past Tense of Verbs

Studying Japanese as a native speaker of English can seem so impossible that it's easy to lose sight of the fact that there are ways that Japanese is actually simpler than a lot of languages. One thing that shows this very well is Japanese verb tenses, and how there are hardly any that you have to really worry about. Maybe we forget this because the verb tenses that do exist seem to make up for it by being awfully complicated. The past tense ending is easy when you add it to a stem that ends in a vowel: it's just -ta. Taberu in past tense becomes tabeta, what could be simpler? If only that were all there was to it. As you'll know if you've dabbled in verbs and tenses, you also have to deal with verbs like these, where a lot more wacky stuff is going on:

shinu – shinda
yomu – yonda
yobu – yonda
kaku – kaita
kagu – kaida
kasu – kashita

In fact, these patterns actually do make some sense, if you're enough of a language geek to get into the details. In this guide, you'll learn about how consonants are produced and about the rules restricting the type of Japanese syllables – and how together, these explain the patterns in the past tense that you thought was just crazy irregular junk that you had to memorize.

Simple Syllables

Japanese borrows a ton of words from English. But as you've no doubt noticed, this isn't nearly as helpful as it sounds. A lot of words become unrecognizable due to the lack of certain sounds in the Japanese language as well as to the number of vowels that seem to get added in for good measure. This is actually due to its simplicity (not its complexity), believe it or not. The syllable structure in Japanese is much simpler than than compared to English.

English allows lots of consonants next to each other, both at the beginning and ends of syllables. Japanese generally doesn't. So the sequence of consonants at the start of "story" gets broken up by a vowel, sutorii.
In Japanese, generally you don't even have one consonant at the end of a syllable, so you have to add a vowel there too: English "beer" becomes biiru.

There are exceptions to this though, and the main ones are always in the middle of words, including the past tenses we saw above and many others (we'll mark the syllable boundary with a period):

kam.pai
yon.da
gak.koo
kap.pa
yat.ta

However, despite the existence of zillions of words like these, you can't just end a syllable in m/n or p anywhere you want. Recombine those syllables randomly and you get a lot of words that wouldn't be pronounceable in Japanese: (* is what linguists use to indicate that something is impossible)

*kap.da
*kam.ta

The generalizations makes sense once we look at what the consonants have in common. For this, we need to understand some basic phonetics.

Consonant Clusters and Phonetics

There are two ways consonants can come together in the middle of a word in Japanese. One is where the consonants are identical. Some consonants in Japanese can be long, or what linguistics call geminate (as in gemini, 'twins'). These are the ones written with small tsu after them, in words like gak.koo, yat.ta, and kap.pa.

For now, we're just going to make a list of the ones that can't be doubled like this, and come back to it later: you can't double b, d, or g.

To understand the other type of consonant cluster, where two non-identical consonants can be adjacent, we need to understand three basic elements of consonant phonetics.

Nasal consonants are the sounds that you can hum, like n and m. There are three of these in Japanese: The third one is the sound we don't have a letter for in English, but is written with the sequence ng, like at the end of "sing."¹

All consonants have a place of articulation: this is where bits of your mouth touch to make them. Try to feel how you make these as you read the descriptions:

m is bilabial: your lips come together in the classic hum
n is alveolar: your tongue touches the alveolar ridge behind your teeth
(ng) is velar: the top back part of your tongue touches the velum, which is the back top part of the inside of your mouth.

Feel it? OK, now we need to see how that works with other consonants: Play along at home by feeling how these are made:

p & b are bilabial
t & d are alveolar
k & g are velar

We can lay these out in a table for those of you who like that sort of thing:

	Bilabial	Alveolar	Velar
Nasal	m	n	ng
(non-nasal)	p,b	t,d	k,g

Got it? Now I can explain why only certain combinations of nasal and following consonant are allowed: they have to share the same place of articulation:

kam.pai both are bilabials
yon.da both are alveolars

Now, to get back to the past tense – remember that we were talking about the past tense? – we need our third and last bit of phonetics, and a tricky thing about Japanese vocabulary.

This last phonetic property is easiest to feel with with s and z. Make a zzzzzz sound and put your finger on your voicebox. You'll feel a little buzz, indicating that z is what we called voiced. Make ssssss and do the same and you won't feel anything. That's because s is voiceless.

This property is harder to feel in consonants like t and k which you can't extend that way, so if you can't feel it, you'll have to trust me:

t and k are voiceless
d and g are voiced

Let's add that detail to the table:

	Bilabial	Alveolar	Velar
Nasal	m	n	ng
Voiceless	p	t,s	k
Voiced	b	d,z	g

Now you know which consonants are allowed to be geminate (remember, this means twins, or two of the same consonants together): only the voiceless sounds p, t, and k, and not the voiced b, d, and g.

Cluster Restrictions and the Past Tense

Now you know in general what consonant clusters are allowed in Japanese. The last background detail you need is that there are slightly different restrictions in different categories of Japanese vocabulary.²

The consonant clusters allowed in verb conjugations are a little more restricted than in the language as a whole: the consonant following a nasal has to be voiced.

So these are OK in verbs: nd, mb, because the second consonant is voiced.

These are not: nt, mp, because the second consonant is voiceless.

Now, if you're still hanging in there, we can go back to all those wacky past tense forms and see that they really make a little more sense than it might seem.

You've got a bunch of verbs you want to make your past tenses, and really, if life were fair, you would be able to just add –ta and be done with it. Unfortunately, sometimes the syllable structure of Japanese won't let you. What happens, then, is that various features of the consonants have to fight it out to see who wins. Depending on the consonants involved in the argument, they come up with different compromises.

Consonant fights that result in Nasal-Consonant Clusters

In three types of stems, the compromise comes out as one of those nasal-voiced consonant clusters that are allowed in verbs:

n-final stems
- shinu (die), which is what you want to do after reading this far in this article…
You want to take the stem shi(n) and add ta. But in verbs, that nasal-voiceless cluster isn't allowed. The solution in this case is easy-peasy, you just have to change one thing: make the t in ta into its voiced counterpart, and you get shinda.
m-final stems
- yomu (read)
yom+ta has two problems: the voiceless "t" can't go after a nasal, and a nasal-consonant cluster is only allowed if it has the same place of articulation. So two things have to change. "t" becomes voiced d, like in the last example. And bilabial "m" changes to alveolar "n," so that it's the same place as alveolar "t," and you get yonda.
b-final stems
- yobu (call)
yob+ta. Seems like we're really in trouble here… But really, compared to the last type, only one additional thing needs to change.

As in (2), change the place of the first consonant, bilabial changes to alveolar;
Make the first consonant nasal – nasals are voiced, so at least we're keeping that characteristic of the original "b."
As in (1), now the t is after a nasal, so make it voiced.

So the changes are (order doesn't matter): yob+ta → change the place yod+ta → change to a nasal yon+ta → voice the second consonant = yonda

Consonant fights that don't result in clusters

Those last three cases are kind of interesting because the consonants really seem to compromise and get to keep bits of themselves, although they have to change other bits of themselves. Other types of consonants resolve the dispute differently, with what might seem like less skill at compromise:

S: In a stem that ends in "s," the solution is much easier for the language learner, because "s" is totally unwilling to compromise and change any part of itself. So when a stem ends in "s," we just do what happens to fix up the consonant cluster when we borrow an English word: insert a vowel. For example:
- kasu: kas+ta → kashita
K: For some reason, velar consonants are almost total wimps when they get into this argument. They completely disappear and are replaced with a vowel, or maybe a vowel gets inserted like in the last case and the velar hates it so much that it walks out. Whatever it is, when the stem ends in the voiceless velar "k" you get:
- kaku: kak+ta → kaita
G: But here's one small and strange victory for the velars: before the voiced velar "g" walks out in the face of the insulting vowel, it manages to leave its voicing behind on the verb ending, which changes to "d":
- kagu: kag+ta -> kaida

Consonant fights that end up with geminate "t"

Aside from nasal-consonant clusters that agree in place, as mentioned earlier, the other possible sequence of consonants that's allowed in Japanese is the voiceless geminate. "t" is a voiceless consonant, so it's not surprise that in some cases, the compromise solution is the sequence "tt".

R – This consonant is completely defeated in the argument: it loses all features of itself and becomes "t", so we're left with the other type of consonant sequences that's allowed, a voiceless geminate:
- hairu: hair+ta -> haitta
- agaru: agar+ta -> agatta
- karu: karu+ta -> katta
The last two types of stems are a little harder to explain, because you have to bring in comparisons with other forms.
T – This is what I am calling verbs like "wait." Basically you don't have to do anything here, because the sequence of "tt" is already OK:
- mat+ta -> matta
- kat+ta -> katta
What's possibly confusing about these verbs is that that consonant elsewhere shows up as ts: matsu, katsu. If you prefer to think of that stem as ending in "ts", then you can call that another consonant³ like "r" that just gives up and loses to the "t."

For the final type of verb, you might just want to cry "uncle" and memorize, because I have to bring in a lot of other comparisons to support the analysis, but for what it's worth, here it is:
W – final stems. These are usually described as verb stems that end in vowels but instead of having the -ru ending, have -u added:
- ka-u, tsuka-u, instead of tabe-ru
But you'll note that elsewhere in their conjugations, these vowels have a mysterious "w," like when you add -anai:
- kawanai, tsukawanai
We can consider these stems that actually end in "w", which disappears before the -u ending, and is another consonant that loses the battle to the past tense "t":
- kaw + ta -> katta.
- tsukaw +ta -> tsukatta
but if you want to just consider these irregular, I won't blame you.

Conclusion

Hopefully now at least some of the patterns in the Japanese past tense make more sense to you. Aside from making the world a more interesting place, this may help if you're still at the point of having to memorize them- and the -te forms, too, which show the same patterns.

And finally, there are actually a couple of totally irregular verbs: suru/shita and kuru/kita. At this point you're probably relieved there isn't an explanation for those. Did you ever think you'd be so happy to have to memorize an irregular verb? There's another thing in-depth linguistic analysis does for you!

You can tell that ng in 'sing' is only one sound by comparing it with a word where there is really a separate g after it. We don't spell them differently, but if you are a native speaker of English, you should be able to tell the difference:
- singer – there's only one consonant in the middle
- finger - there's two consonants, the nasal and a separate g.
↩
There's a very interesting similarity between English and Japanese that's not true of most other major world languages and is based in similarities in our history. In English, we have two types of vocabulary: the basic Germanic words that have always been with us, and the fancier words that came in from Romance languages with the Norman conquest. So we have a word like "heart," but its related adjective is "cardiac" instead of something like "heartiac." "Heart" is the original Germanic word, and "cardiac" comes from the Romance vocabulary that came in later. Japanese is similar: there is a category of native words, that include the verbs, and another category that came in later as borrowings from Chinese. This is why Japanese, like English, often has two related words for the same meaning – and why most kanji have two different readings. ↩
Yes, "ts" represents what is a single consonant in Japanese. It's a consonant called an affricate. We have a different affricate in English, the sound that begins the word "child." Like "ng" this is a sound we don't have a single letter for. Since we don't even have a single letter for our own affricate, so it's no surprise we don't have a way to write the Japanese one with a single letter in our alphabet either. ↩