Table of Contents
Kanji is the Japanese pronunciation of two Chinese characters, KAN and JI, traditionally written 漢 and 字. Taken together the meaning is "the characters (JI) of the Han (KAN) Chinese people."
Chinese look at the two characters 漢 and 字 and pronounce them HAN and ZI. Koreans look at them and say HAN and JA. The two written characters, 漢 and 字, are the same (with stylistic variations) for all four languages, Chinese, Japanese, Korean, and Vietnamese, but they are read aloud differently depending on the which language is spoken.
This pattern--- same characters, same meanings, different pronunciations--- applies to every one of the many thousands of Chinese characters of the classical written language that form the foundation of a great part of the traditional culture of Japan, Korea, Vietnam, and other places in East Asia, as well as of the different areas of China.
I use 'kanji' simply because it is the more widely known pronunciation of these two characters among English speakers. In fact, the word 'kanji' has been included in at least one English dictionary for more than twenty-four years whereas hanzi, hanja, and chu' Ha`n do not appear in English dictionaries, as far as I know. In other words, the word 'kanji' has been imported into English.
My site is primarily for English readers, so that's why in 1995 I choose kanji.com rather than hanzi.com, or hanja.com. I could have chosen chinesecharacters.com, but it was too long.
You should have a recent browser to view the kanji on Kanjicom. Currently I use Firefox 1.5. You must set your browser to display UTF-8-encoded Unicode and have fonts for at least traditional Chinese.
If you can't see all four kanji in the following two lines then your browser isn't set up correctly or it is too old or too non-standard to display an up-to-date standards-compliant website. Each line has a bitmap graphic of the first character that should display on any GUI browser followed on the same line by the same kanji represented as a Unicode character.
Line 1: WEN: (bitmap) 文 (UTF-encoded Unicode character)
Line 2: ZI: (bitmap) 字 (UTF-encoded Unicode character)
Unlike the old version of Kanjicom, I now use bitmaps for kanji sparingly and UTF-8 encoded Unicode extensively.
In 1995 when I started Kanjicom it made sense to use bit maps for all the kanji because if I had encoded them in one of half a dozen or more possible encodings, most Net users with an English locale would not have been able to see any kanji and those with a Japanese, simplified Chinese, etc. locale would not have been able to see all the kanji on my site. Times have changed. Modern operating systems and Web browsers support Unicode, and Chinese and Japanese fonts are readily available. Therefore, in this revision of my website, I use the more efficient UTF-8 encoded Unicode method.
After playing with kanji for fifty years, I thought it was high time to express in one essay my understanding of how kanji were made. I am always learning more about kanji and although fifty years may seem like a long time when you're young, it seems like a mere flash in the pan when you're old. And few of the two hundred seasons have passed without recollection of this observation from Zhuangzi, chapter three:
“My life has a limit. But science does not have a limit. Using what has a limit to follow after what does not have a limit is to become worn out.”
The subject of kanji is limitless. My life-span is not. Time to sum up and move on.
First, in this Preface, is an explanation of the name 'kanji' and why I use it for this web site, a note on viewing kanji on Kanjicom, and this brief overview of the entire essay.
Second, I compare the Chinese script with the familiar Latin script used for English.
Third, the structure of kanji is described in terms of three basic structural patterns. The difference between indivisible and divisible kanji is introduced.
Fourth, I list each of the all-important 214 Kangxi classifiers along with a “scientific” name which, in a perfect world, would be applied by students of classical Chinese.
Fifth, I give several dozen examples of typical zi using the CHA transcription. This shows how CHA is used to keep track of newly-learned kanji in a way that gives the most important information about the new kanji in a small handful of letters, the CHA code.
Following Professor Peter A. Boodberg, let's compare the Chinese script with our familiar Latin script. What follows is a restatement of Peter A. Boodberg's succinct one-page description entitled, 'Some Basic Grammatonomic Characteristics of the Chinese Script' that appeared as 015-541120 in his Cedules from a Berkeley Workshop in Asiatic Philology, personally mimeographed and distributed by himself in the mid-1950s and still the best description of the script in comparison to Latin script that I have seen.
Although the vertical alignment of the script first strikes the ordinary Westerner as most characteristic of Chinese writing, the isometry (See Metrics below) of the graphs would better fulfill this function. In fact, Chinese is flexible in its alignment. Text aligned horizontally is quite as common nowadays as the traditional vertical alignment. Vertically aligned columns of Chinese are read right-to-left. Most modern horizontally aligned Chinese is read, like English, left-to-right. But during the 20th century right-to-left horizontal alignment has also been used. (I recall places in Taipei as recently as a decade ago where all three alignments were represented on adjacent signs affixed to building facades.) English, by contrast, is quite fixed in its horizontal dextrorsal (left-to-right) alignment.
Here is an example of two lines of Chinese text laid out first in (A) horizontal alignment read left to right, like English, in (B) vertical alignment read from top to bottom, and in (C) horizontal alignment read right to left, like Arabic. All three alignment can be easily read by any literate Chinese reader although c), horizontal alignment read right to left, is rarely used.
(A) Two lines of Chinese in horizontal alignment, left to right, like English.
L1: 道可道非常道
L2: 名可名非常名
(B) The same two lines of Chinese in vertical alignment, the traditional alignment.
L2 L1
名 道
可 可
名 道
非 非
常 常
名 道
(C) The same two lines of Chinese in horizontal alignment, right to left, like Arabic.
道常非道可道 :L1
名常非名可名 :L2
As mentioned, the isometry of the graphs, rather than the alignment, is probably the most characteristic feature of the Chinese script. Each graph stakes out the same size square on the page. Any graph, from the single stroke c001 to the 29-stroke graph meaning 'rampant', and even more complex graphs, having fifty strokes or more (fortunately, these beasts are exceedingly rare), claims the same square area on the page. Furthermore, whereas the graphemes of Latin script concatenate along one dimension, Chinese graphemes may be added in two dimensions.
In practice, 29 strokes is about the maximum used. Roughly 19,000 of the approximately 21,000 kanji included in the middle-sized, four-volume Morohashi Dictionary, a typical example of a practical but large repertoire, are written using 8 to 26 strokes. About 1,100 are written with 7 strokes or less and about the same number are written with 23 strokes or more. The winner for the number of strokes used in the largest number of kanji is 12 strokes. In the case of smaller repertoires, like the one in a Japanese kanwa dictionary intended for pre-high school students, with total kanji at less than 4,500, the number of kanji with high stroke counts is proportionally less. The winner for the most popular stroke count in a beginner's dictionary such as this is 11 strokes.
Here is a table showing the isometry of Chinese graphs. The table shows that minimalist kanji of 1, 2, 3, 4, 5, or 6 strokes, kanji of the very popular 11, 12, or 13 strokes, and complex kanji of 24, 25, 26, 27, 28, and 29 strokes each stakes out the same size piece of real estate on the page. In other words, each kanji, regardless of how many strokes it has, is given a box within which to display itself that is the same size as that given to any other kanji at the same font size. Specific font design will determine how much of each box is actually used. This is like the concept of non-proportional fonts for alphabetic scripts such as Latin or English; each letter occupies a box of the same size as any other letter. But quite unlike those scripts, the kanji in the box represent an entire syllable and that syllable may very well be a word.



Table 2.1: The Isometry of Kanji
Although the kai style is the basis for writing virtually all kanji nowadays, there are stylist variations within that style that affect the stroke count. The kanji in the first row are the same as the kanji in the second row. But the kanji in the second row are displayed in a font style that uses one additional stroke for each kanji. That additional stroke is found in the classifier part of the kanji. In the first and third column below, the classifier is on the left and in the center column it is on top of the character. More about classifiers shortly.
Table 2.2: Examples of Stylistic Variations in the Stroke Count of Classifiers
English can be written with as few as about 82 graphemes (26 x 2 letters + about 30 marks and figures). For Chinese, "the number of graphemes runs from 500 to 800, estimated on a purely graphic basis, and to over 2000, if reckoned on an organic-structural, historical, and phonosemantic basis. These form in bidimensional combinations a graphicon of some 50,000 graphs or lexigrams (of which only about 10,000 are in common use.)" [Boodberg's Cedules 015-541120]
Whereas English is phonetically atomistic (alphabetical), Chinese may be thought of as phonetically molecular (syllabic and lexigraphic). What the Chinese script lost in phonetic precision, it made up for in semantic enhancement, i.e, some meaning sailed along with each character in the form of aphonic semantic signals. These are the sematic (sematic, not semaNtic) hemigrams (signs, signals), about 300 in number, that are used as classifiers in the dictionaries. These aphonetic, sematic elements, have allowed the Chinese "to apprehend at a glance the approximate semantic contour of most of their graphs" [Ibid]. Simply reading aloud a graph loses the information provided by these sematic flags. This fact, together with the surplus of homonyms and the difficulty of truly embedding the phonetic qualities of the tones, doomed the application of any straightforward transliteration system based only on pronunciation to ultimate failure as a satisfactory written representation of Chinese graphs, both for Chinese and for Japanese.
In the English script, phthegmic (word) division is marked by intermittent blank spaces. In the Chinese script, on the other hand, nothing explicit marks the division between words. Recall that, in the words of Professor Boodberg, "each graph, occupies, irrespective of graphic complexity, equidimensional and equidistant celliform quadrates of space in which they remain ensconced even when associated in larger lexical units" such as words, proper names, etc [Ibid]. Explicit punctuation was limited to a mark or two; until recently, parison (evenly balanced sentence structure, parallelism) and commatic (marking short phrases) particles (represented by full graphs) did the job rather less blatantly than the punctuation symbols, mostly borrowed from the Latin script, used in modern Chinese and Japanese.
Individual calligraphic strokes have no meaning per se. Just as in Western calligraphy, neither of the two strokes that are used to compose an italic 'a', for example, means anything by itself. The two strokes could be combined into one stroke or separated into three or more strokes and the resulting letter, italic 'a', would still stand. That same italic 'a' could be composed of round or square dots arranged in the proper pattern. But any individual dot, isolated from the other dots, carries no meaning. And so it is for the strokes that are used to compose kanji; each stroke, by itself, means nothing. But knowing which strokes are used to write a kanji is more important than knowing which strokes are used to write a letter of the Latin or English alphabet because 'stroke count' plays a prominent role in the process of looking up kanji in dictionaries and other Chinese and Japanese reference works.
There are strict conventions for what these individual strokes are supposed to look like, which specific strokes are to be used to compose any given kanji element, how they are to be written, including the direction and pressure on the Chinese writing brush and, most important for looking up kanji in a dictionary, the 'stroke order' or sequence in which they are to be laid down upon the page. In other words, when we talk about strokes we are talking about calligraphy, how the meaningful parts of a kanji are drawn, not about the meaningful parts themselves.
Any given Chinese character (kanji) can be written in several different calligraphic styles. Along with burning almost all books, one of the drastic reforms of Qin Shihuangdi's Prime Minister Li Si (around 221 BC) was to enforce one standard writing style, called the 'xiao zhuan' (小篆 J. shouten) across his new empire, the first of China's empires. But as practical as it may have been for the ruler, it proved impractical for many of the ruled. The crooked lines were too cumbersome to write quickly. These were straightened out in practice and the resulting calligraphic style became known as the 'official style', the li shu (隸書, J. ). Even after the passage of more than two thousand years, many literate Chinese today can still read kanji written in the li shu style. Fonts in the li shu style and in even in the older xiao zhuan style are currently widely available for use on computers.
By Han times, given impetus by the Emperor Zhang who loved it, use of an even faster style of calligraphy was wide-spread, the 'grass style' or 'cursive style' of li shu, called cao li (草隸). The cao style, in many variants, has been in continual use from the Han dynasty down to the present day. But the direct basis for virtually all written and especially printed kanji today is a style called kai shu (楷書 J. kaisho ). It was a further straightening out, rectification if you will, of the li shu. This process was well underway by the end of the Later Han and by mid-Tang times the kai shu style had become a new de facto standard.
The style of calligraphy used in my Lessons in Classical Chinese is a minor variant on the kai shu style which has been in continuous use throughout the realm of kanji culture for about 1,300 years. This style of calligraphy even survived the kanji “simplification” reforms of the 20th century. Would we could say the same for the constituent elements, the meaningful parts, that go to make up some of the “simplified” kanji. In fact, from a philological point of view, many kanji were dessicated in the process of being “simplified”, with the connections to their etymological roots either totally cut off or obscured. It could be argued that this has been a more radical kanji reform than even Li Si's. We will have to wait another 2,200 years to judge if it has been as beneficial.
Note: Before Li Si's standardization of the script, the script prevalent among the Zhou was called zhouwen (籀文 J. chuubun). In contrast to Li Si's new xiao ('lesser') zhuan style, the old Zhou (周) script was called the da ('greater' ) zhuan (大篆 J. daiten) style. Three hundred years later, kanji written in the "new" standardized xiao zhuan style were the objects of analysis of Xu Shen's Shuowenjiezi, which could be called the first dictionary of classical Chinese. Xu Shen died in CE 120 but his Shuowenjiezi Dictionary was presented to the Han Emperor An Di in CE 121 by Xu Shen's son. The Shuowenjiezi analysis forms the basis of our CHA (Chinese Hemigram Annotation) system discussed below.
For more information on strokes see lesson 1 of my Lessons in Classical Chinese.
Part of my dissatisfaction with the way that Chinese graphs have been dealt with on computers can be traced to what I see as a case of mixing apples and oranges. The meaning of the term 'character' within the context of alphabetical scripts such as English, is simply not the same when applied to Chinese. Two different things are being called by the same name. While such imprecision may suffice on the chit-chat level of discourse, it has given rise to fundamental misunderstandings with far-reaching consequences.
The discussion of wen and zi, classifiers, hemigrams, and so forth still needs to be greatly amplified and refined, but one thing should have emerged: A limited number of graphemes (not strokes, see above) go to make up all Chinese graphs. It is these graphemes that more properly correspond to 'characters', not the fully composed graphs that result from their various combinations.
I think the fact that each graph is confined in an identically-sized box is part of the reason that they have been mistakenly equated with 'letters', i.e. 'characters'. Imagine for a moment, that all English words were required to be the same length. Drawing a parallel between a Chinese graph and a 'word' would be easy then, rather than mistaking a Chinese graph for a 'character'. Boodberg has suggested that this spatial restriction may have been partly responsible for the fact that the phonetic aspect of Chinese writing remained rather underdeveloped, unlike, for example, Egyptian hieroglyphics which were not confined to boxes of the same size.
Thinking of each kanji as analogous to an English word is better than thinking of each kanji as analogous to a letter of an alphabet, a character, but it is still not precise enough. Individual kanji do not always neatly correspond to 'words' either. They can correspond to words, but often they do not. When they do not, kanji more closely resemble the roots of words, or even the suffixes and prefixes of words. This is because the vast majority of modern Chinese words consist of two adjacent syllables, represented in writing by two adjacent kanji. So, in context, most kanji correspond to parts of words, not the full words themselves.
But somehow these bisyllabic, two-graph units, called 'binoms' or 'kanji compounds' often seem to represent a bit more that a single word. This is evidenced by the translator's propensity to use several English words to translate them. The percentage of words in the classical language that are binoms is less than for the modern language and this percentage declines the farther back you go in time. Nevertheless, bisyllabic words seem to be a characteristic of Chinese as far back as the earliest records. [Check references.XXXXX] Can binoms be found on the bronze inscriptions? [Check references. XXXXX]
In the long run, it is this limited set of Chinese graphemes, rather than the complex graphs they make up, that deserves a place in any character set that makes a claim to universality. Even at the high end, assuming it was desirable to encode all 2000 graphemes, approximately 19,000 code points would have been freed from the Han repertoire in Unicode 2.0, for example, with the added advantage that any Chinese graph could be written on a computer. Most of the additional 42,711 code points that were added in 2000 as Extension B to the Unicode Standard version 3.1 would not have been needed. The new PRC restriction of given names to a small list would not be needed if the characters had been represented on computeres as combinatons of a basic list of about 2000 graphems. It may always prove to be practical to select a specific subset of Chinese graphs and provide a code for each "character" therein, but if the goal is to be able to represent the greatest possible number of the world's scripts in one universal character set, the "grapheme-based" approach to kanji deserved serious consideration in my view. It may have been more fruitful to have tried to understand how Chinese graphs were composed than to remained fixated on merely listing the precomposed results.
Nevertheless, what "could have been" and "would have been" are beside the point now, merely wishful thinking on my part. It is unlikely that Unicode will change its approach to the Han repertoire at this late date and it is nearly impossible to imagine any region in the realm of kanji culture giving up their mandated or de facto character sets of Chinese graphs in favor of a lean mean set of a couple thousand graphemes. But with the system of Chinese hemigram annotation (CHA) sketched out below I shall attempt to proceed along this windmill-slaying course anyway. The consolation prize is that CHA can be a great aid for teaching and discussing kanji because in understanding CHA one goes a very long way in understanding kanji.
Some kanji are indivisible, they cannot be dissected. Technically, these indivisible kanji are called wen 文. Most kanji are divisible, they can be dissected. Divisible kanji are called zi 字.
The most common type of kanji dissection is bisection, to divide a zi into two parts called 'hemigrams' which means 'half graphs'. You try to remove a wen from the rest of the kanji on the first slice. This wen hemigram is aphonic in the sense that whatever sound is associated with it is not important to understanding the composition of the kanji. The wen hemigram was originally added as a hint to the meaning of the one-syllable word represented by the other hemigram. This is analogous to the hints given in 'charades' a popular 20th century parlor game. One person imitates something through aphonic gestures and the other people try to guess what it is. They try to narrow down the possibilities quickly by asking such questions as, “Is it mineral?” “Is it a plant?” “Is it an animal”. “Is it part of a human being?” For kanji, the aphonic wen hemigram answers this type of questions. Or, it is like the sematic of birds or other animals which signal something, usually danger. Sematic comes from Greek meaning 'sign' . This hemigram acts as a signal or sign to point the reader in the direction of the meaning of the other hemigram and we call it the 'sematic hemigram'.
For several years before CE 121 an attempt was made to systematically organize all kanji by sematic hemigram. The result was a list of 540 sematic hemigrams, headings under which 9353 kanji in use at the time were distributed. The result was the Shuowenjiezi Dictionary a masterpiece still used today. The next major kanji organization effort occurred in CE 1712 when an excellent dictionary was produced under the imperial command of the Kangxi Emperor. (I remember my teacher, Professor Helmut Wilhelm in a Chinese Bibliography class at the University of Washington in the early 1960s saying that the Kangxi dictionary was "probably still the best dictionary of classical Chinese available today.") In the Kangxi dictionary the number of sematic hemigrams used for classification headings was reduced to 214. These 214 class headings are still in use in dictionaries and reference works. Therefore, frequently, the purpose of kanji bisection is to dissect out the classifier from the rest of the graph. Then, by counting the number of strokes that make up the other hemigram, you can look up the kanji in a dictionary. Because the use of sematic hemigrams was integral to the original composition of kanji, and because the Shuowenjiezi list and later the Kangxi list of sematic hemigrams still reflects that, in many cases bisecting the kanji to dissect out the classifier also throws light on the etymology of the kanji.
In theory, after the sematic hemigram is removed, the other hemigram of the zi may be further dissected. And this process can be continued until the resulting hemigram is a non-dissectable wen or until the etymology is clear enough for the present purpose.
In short, the structure of kanji is simple: either the kanji in front of you is a 文wen, and has no structure or it is a 字 zi and can be divided into two hemigrams, one of which is a sematic hemigram (S) and the other is either also a sematic hemigram (SS-structure) or a phonetic hemigram(P), (SP-structure).
S=sematic hemigram
P=phonetic hemigram
Types of kanji:
Table 3.1: Examples of the Three Main Types of Kanji
For wen, the sematic and the phonetic are one and the same.
Some zi have an 'SS' structure, i.e., they are composed of two sematics and derive their meaning from the combination of these two sematics. In the table above 伏 fu is an example of this SS structure of kanji formation. Split the kanji vertically. On the left is a somewhat shortened form of 人 ren, 'human' and on the right is the full (automorphic) form of 犬 quan, 'dog'. The meaning of the kanji as a whole is 'face down, belly down' and it is pronounced fu. This SS combination itself was then used as the phonetic in other zi where an additional sematic was added to differentiate various meanings. E.g., 袱, 絥, 垘 all pronounced fu. This is the reflection in Chinese writing of the 'fu' word family. I call it 'fu' here, but the reconstruction of its ancient pronunciation requires considerable study and expertise and lies outside this introduction to kanji composition. To understand the way kanji were constructed, it is sufficient to know that 伏 is composed of two sematics, two wen, that it means something like 'down on the belly' and that this SS combination itself was later employed as the phonetic hemigram (P) of other kanji that represent words that belong to the same word family.
袱, 絥, 垘 in the immediately preceding paragraph are examples zi which have an 'SP' structure, i.e., they are composed of a single sematic hemigram and a single phonetic hemigram. The three sematics used are 衤, the shortened form of 衣, 'vestimentary', 糹the shortened form of 糸, 'bombycinous', and 土 'terrene-terrestial-tellurian' respectively. As mentioned, 伏 fu is performing the role of the phonetic hemigram.
伈xin, 'timid', in the table is also an example of SP structure. In 伈 xin the right side hemigram is a sematic and, aside from it wide-spread occurrence as an independent kanji meaning 'heart, mind', as a rule it is used as a sematic. But here we have an exception to the rule. In 伈 xin, as the pronunciation of the kanji suggests, 心 xin functions as a phonetic.
字the last example in the table is the famous zi meaning 'divisible kanji' we have already said so much about. Zi is itself a divisible kanji, a zi. It is to be bisected horizontally dividing the kanji into an upper and a lower hemigram. The coronary (upper) hemigram is classifier c040 mian2 宀 'hypostegoid', i.e., resembling (-oid), beneath (hypo-) a roof, a covering, a shelter (steg-) and the lower hemigram is 子 zi, a drawing of a 'baby, infant, child' 子 and which, in its role as phonetic hemigram, give us the sound for 字 zi.
We have seen how the divisible kanji, the zi, are always composed of two hemigrams, how both of these can be sematics (the SS structure) or one hemigram can be a sematic and the other a phonetic (the SP structure). Furthermore, sematics can also exist on their own as independent single-syllable words or as parts of bisyllabic words. Let us now look more closely at these sematics, these indivisible kanji or 文 wen.
Most wen are zographs, 'drawings from nature'. This kind of character is commonly called a 'pictograph', but because 'pictograph' sometimes has been applied incorrectly as a general term for all Chinese characters, I prefer zograph to specifically refer to only those kanji that are depictions of natural objects.
Here are twenty examples of zographs represented in row 1 in the xiao zhuan style of the Li Si script reform ( around 221 BC), in row 2 in the li style that became popular in the Han Dynasty and in row 3 in the kai style usually used today.
Table 3.2: Typical Zographs in 'small seal', 'li', and Modern 'kai' Styles
The Chinese script (kanji) had already been under continuous development for at least two thousand years by the time we get to the xiao zhuan style shown in the first row of the table above. So the drawings have long-since become very stylized. Nevertheless, the picture in the graph of many is still recognizable.
Examples: 天 子 木 . The first kanji is tian, 'sky, heaven, god'. The second kanji is zi, 'offspring'. The third kanji is mu, 'tree, bush', roots and all.
When we want to look up a kanji in the dictionary, only one bisection is necessary because we consciously make that bisection to isolate the classifier hemigram from the rest of the character. For this purpose, once we have the classifier, we have only to count the number of strokes in the other hemigram to look up the kanji in the dictionary.
Examples: 初 and 性 . In 初 the classifier is the aristeric hemigram (the hemigram on the left) with a stroke count of two in the dextral hemigram. The classifier is C145, 'vestiary'. So we would look in the list of two-stroke characters under the C146, 'vestiary', category in a dictionary. Likewise for 性, the classifier is again the aristeric hemigram, bractymorphic (shortened) C061, 'intimate' with five strokes going to make up the dextral hemigram. So we would look in the list of five-stroke characters under the C061, 'intimate', category in a dictionary.
For this purposes of looking up a kanji in a dictionary, once we've identified the classifier, we need not further bisect the graph. There are 540 classifiers based on the Shuowenjiezi Dictionary of CE 121, and 214 classifiers based on the Kangxi Dictionary of CE 1712. The Kangxi classifiers are still in use today. The Unicode "Han Repertoire" for example is arranged by Kangxi classifier and stroke count just like most traditional Chinese dictionaries of the past three hundred years. Once the classifier hemigram has been severed from the graph, probably only an etymologist or philologist will want to examine it further to determine if the classifier hemigram is really a zi or a wen, i.e., whether it brooks further dissection or not. For everyone else, it is enough to recognize that hemigram as one of the 214 Kangxi classifiers and to know what kind of semantic ambiance it adds to the zi.
Examples: [XXXXX]
By definition, a zi can be dissected at least once, resulting in a classifier (which is taken as a wen regardless its actual etymological history) and the remaining hemigram. But there is a limit to the complexity of graphs and that limit, with rare exceptions, is five, i.e. five cuts resulting in a maximum total of six elements, or graphemes. Any individual kanji will virtually never be composed of more than six elements or graphemes, and one of those graphemes will always be one of the 214 Kangxi classifiers.
After the first bisection we have on the one hand the classifier and on the other hand, the other hemigram. What is the other hemigram? It could be another indivisible wen or it could be a zi capable of another bisection. In either case its role is to convey the sound of the syllable, however imprecise, the sp structure, or to directly add to the meaning, the ss structure. When it conveys sound, it is called a phonetic hemigram. When it conveys meaning directly, it is called a sematic hemigram.
仔,伏,休,伙,伈,像,依,好,籽,
zi,fu,xiu,huo,xin,xiang,yi,hao,zi,
Examples of hemigrams that frequently appear as the phonetic hemigram
空府贊廷音肯將生王主一瞿意甘小
不步手丁星雍青章其子歲邦亭十斗
土羅狂斯叔千刑去華皇立竟甚予
邦
走半行士共革矛從朝平寺焦壽新
柔正工寸熏牛蒦任全帝異難定夆付乎先卦金暴此是逢
When we discuss the structure of divisible kanji as either SS (sematic-sematic) or SP (sematic-phonetic) we are glossing over the various positions that the two hemigrams can assume with each other. Actually, there are eleven ways that the two hemigrams that comprise a single zi may be positioned. We now introduce a third aspect into our contemplation of any given kanji. We try to see if a kanji is divisible or not, i.e., is it a zi or a wen? If we see it as a zi, which hemigram is the sematic; is the other hemigram also a sematic or is it a phonetic? Now we want to add to the above a consideration of the relative positioning of the two hemigrams.
These three aspects differ in emphasis and little else. Compared to each other they are really just 'six of one, half dozen of the other'. This is because the focus on any aspect leads to a dissection of the kanji. Failure to obtain a perspective from which dissection is possible just means we are dealing with a wen.
The mnemonic list below roughly follows the order of frequency of occurrence of the different kinds of positioning. By far, the most common positioning arrangement is for the hemigrams to appear side by side, with the classifier on the left. Therefore 'a' is used to refer to this arrangement. Next in popularity is for the classifier to appear below the other hemigram. Therefore, 'b' is used to refer to this arrangement. Next comes 'c', on top, 'd' for on the right side, 'e' for enclosed and so forth.
'A' is for aristeric. (Either lower or upper case can be used here as elsewhere in this list.) A hemigram may be aristeric, forming the left side of the graph. See Lessons in Classical Chinese 3.1.3 初 (chù) and 3.1.4 性 (xìng) for typical examples of this most-used form of hemigram positioning. 'Aristeric' meaning 'on the left' derives from Greek [XXXXX].
'B' is for basic. A hemigram may be basic, forming the base of the graph. See 3.1.6 善 (shàn) and 4.2.1 名 (míng).
'C' is for coronary. A hemigram may be coronary, forming the crown of the graph. See
3.2.4 習 (xí), 16 定 (dìng) and 10.3.6.安 (ān).
'D' is for dextral. A hemigram may be dextral, forming the right side of the graph. See 3.2.5 相 (xiāng) and 9.3.2 親 (qīn) . 'Dextral' derives from Greek XXXXX.
'E' is for endogram. This refers to a grapheme that is encompassed or enclosed by other graphemes. See 11.1.3 本(běn), 11.1.4 末 (mò), and 11.2.1 事 (shì). 'Embedded within' would be another way to refer to this, because the endogram is not always totally surrounded by a kalyptogram. See 'K' below.
'F' is for flange. This refers to a reduplicated grapheme that flanks another on two sides.
'I' is for isogram. This refers to a grapheme reduplicated side by side.
'J' is for janiform. This refers to reduplicated graphemes, 'back-to-back'. 'Janiform' derives from Greek 'Janus' meaning 'facing two directions'.
'K' is for kalyptogram. This refers to a grapheme that envelops another. See 13.3.7 國 (guó). Kalyptoram derives from Greek [XXXXX].
In addition, there are combinations of the above. A coronary sometimes slopes to the left, part of it forming the left edge of the graph (Japanese tare), i.e. a coronary-aristeroclitic, see 10.4.6 慮 (lv). Or, it may slope to the right, i.e. a coronary-dexioclitic. An aristeric may extend along the base of the graph, i.e., an aristero-basic. See 3.2.3 近(jìn) and 3.2.6 遠 (yuán).
A diplogram applies to graphs composed of two reduplicated graphemes. A triplogram usually has the three elements arranged in a triangle, whereas a tetraplogram has them arranged in a square and constitutes the limit of the reduplication of the same grapheme to form a new kanji. In other words there are no pentagrams.
Examples of kanji composed of reduplicated graphemes:
Diplograms:
Triplograms:
Tetraplograms:
Finally, many classifiers have both an automorphic and a brachymorphic (shortened) form of themselves. When they appear as independent characters they appear in their stand-alone automorphic form. When the appear as the classifier hemigram of a zi, they often appear in their brachymorphic form although sometimes they appear in their automorphic form. By now, more than four thousand years into the project of developing the Chinese script, the use of automorphic or brachymorphic form is governed by strict conventions.
Examples of automorphic and brachymorphic forms of classifiers:
人 仇 仁 仲 仙
刀 刑 利 别 到
心 快 怪 性 悟
手 扣 把 招 指
水 池 没 河 法
火 灯 烤 煩 燈
火 烈 無 熊 热
犬 犯 狂 狗 獅
肉 肝 肥 肯 胃
這 連 通 道
心 忘 忠 思 悲
手 挈 拳 拿 攀
Ideally, you would learn the entire list of 214 classifiers in order to use Chinese character dictionaries and other reference works. In practice, learning less than half of them will take you a long way in being able to look up kanji. You will find a list of the most popular classifiers followed by a list of all 214 Kangxi classifiers below. But first, here are fifteen examples.
The first example is the classifier with the most strokes 'panpipe', followed by three examples of the six one-stroke classifiers, two two-stroke classifiers, two three-stroke classifiers, then the most used four-stroke classifer. The remaining six are typical examples that include the elements needed to write the famous kanji wen and zi which we been discussing. See [XXXXX] above.
Table 4.1.:
The first example of classifier is 龠, the last of the 214 Kangxi classifiers, appropriately referred to as c214. C214 is last in the Kangxi list because it is the classifier that is composed with the most strokes, seventeen.
C214, 龠, looks like a segmentable graph, a zi, rather than a wen. It looks like we could easily dissect it into two hemigrams. For example, it seems that we could lop off the top, removing the wen for 'human', classifier 009, the sixth member of the list of examples of classifiers above. There are about a dozen cases where lopping off a 'human' c009 from the top would actually be the correct way to dissect the kanji, e.g., 介 今 令 会 企 侖 余 倉 傘 僉. In all these cases, by cutting 人 from the rest of the graph, we would have correctly bisected the kanji by identifying 人 in its role as classifier.
In fact, 龠yue is not a zi, it cannot be divided. It is now written with strokes and parts that look like they can be extracted, but they can't. Understanding this, helps us to understand why Professor Peter Boodberg has said that for Chinese, "the number of graphemes runs from 500 to 800, estimated on a purely graphic basis, and to over 2000, if reckoned on an organic-structural, historical, and phonosemantic basis.”
[Boodberg's Cedules 015-541120] When we identify graphemes on a purely graphic basis we are very much dependent on the script style. For c214, 龠, the topmost element looks exactly like one commonly encountered form of c009, 人, but this simply how these strokes have become formalized in kai style script. Behind the current form of yue in kai script, is the original pictograph of a 'panpipe', probably made from tubes of bamboo. Those three little 'boxes' that align horizontally across the middle of the graph would appear to be three instances of the classifier 口 c030, 'oral' but they are not. They, too, are an integral part of the original pictograph for 'panpipe'.
C214, 龠, is pronounced 'yaku' (ON reading) or 'fue' (KUN reading) in Japanese, yue4 in current Chinese Mandarin and means, 'flute', 'pipes'. 龠
Second in the list above is 一, yi, the first (c001) of the 214 classifiers. Like all wen, it is non-segmentable, and looks it, unlike c214 just discussed. It is the first of six single-stroke graphs that head the list of 214 classifiers. As a classifier, it adds the meaning 'to unite, single, unique, "to unit" (i.e., to make a unit)' to the zi in which it occurs as hemigram. Standing alone, it means 'monad-unit', 'one', etc. It also can be thought of as the 'horizontal grammic'.
Third in the list above is 丨, pronounced gun3 in Mandarin. It is classifier number two (c002) in the list of 214 classifiers. Devoid of much discernible meaning, we can simply refer to it as the 'vertical grammic'.
Fourth in the list above is 亅pronounced jue2 in Mandarin. It is classifier number six (c006) and last of the single-stroke classifiers. Traditionally, it is used to classify only a tiny handful of kanji. Note that the left hook at the bottom is what distinguishes c006 from c002 above. Like c002, there is not much in the way of meaning here, so we refer to it as the 'uncinate grammic', i.e. the "hooked; furnished with hooks; unciform" [OED] grammic. We could also call it the 'hooked grammic'.
Fifth in the list above is 二, er4, c007. 二 begins the group of two-stroke classifiers, twenty-three in all. It means 'dyad-dual' and is the 'parallel grammic'. Standing alone as a character in its own right, it means 'two', 'dual'.
Sixth in the list above is 人, ren2, c009. Unlike c002 丨and c006亅, c009人 is one of the most popular classifiers. It classifies hundreds of kanji. It means 'human' 'humanoid'.
Seventh is 冖, c014, 'kalyptroid', i.e. having the form or likeness of (-oid) a cover, covering, sheath; Gk. ka^lupt-êr, êros, ho, XXXXXX covering, sheath [Liddell-Scott-Jones Lexicon of Classical Greek]; kaluptos, 'put round so as to cover, enfolding, enveloping, Soph.' [Liddell & Scott Intermediate Lexicon].
Eighth is 彡, c059, the thirtieth of thirty-one three-stroke classifiers, meaning 'piliform-polumeous', i.e., having the form of (-form) felt, or being full of (-ous) plumes, ornamental feathers.
Ninth is木 c075, a four-stroke classifier meaning 'dendrological'. As an independent graph it means 'woody bushes', 'wood', 'tree' or 'trees'. (The plural is not explicit in kanji.)
Tenth is 缶 c121, a six-stroke classifier meaning 'keramic', i.e., of or relating to (-ic) clay or potter's earth, pottery.
Eleventh is 臼 c134, a six-stroke classifier meaning, 'quasi-bimarnual' or 'quasi-myloid', i.e, somewhat like a mortar or a mill-stone.
Twelfth is 鬯 c192, a ten-stroke classifier meaning, hyssopic, i.e., relating to hyssop, "a woody plant having spikes of small blue flowers and aromatic leaves." [AHD] This character is a fairly complicated zograph, like c214, and even though it looks like it can be dissected, it is properly seen as indivisible, a wen. C192 is almost never used except to classify a single fairly popular kanji which we will examine soon. You will need to consult a big dictionary to find more than one or two kanji classified under c192.
Thirteenth is 文 c067, a four-stroke classifier meaning, 'depictive- delineative- dichroic-diapoecilic- diazogram- diagraphic- decorative- design' and "pied beauty". It is pronounced 'wen2' in Chinese Mandarin. And, yes, this is the same 'wen' that we have already made so much of, the 'wen' meaning 'indivisible graph', 'hologram', the wen of 'wen zi' that we introduced in the opening paragraph of this essay. See XXXXX above.
Fourteenth is 宀 c040, a three-stroke classifier meaning 'hypostegoid', i.e., resembling (-oid), beneath (hypo-) a roof, a covering, a shelter (steg-). The little vertical stroke on the top may depict the ridgepole of a roof. Distinguish from the seventh example, c007, 冖, above. 宀 vs.冖 in 'small seal' style script.
The fifteenth and last example in the list of classifiers above is 子 c039, a three-stroke classifier which, as one of the '12-branches', can be glossed as 'algate-dorsal', 'ramal-1' , and which means 'apogenic' (offspring), 'murine', otherwise.
Drawing from the list of fifteen examples of wen above we can show how zi, the divisible graphs, are constructed.
We may begin by combining the last two wen examples, classifiers c040 宀and c039 子 .
Zi 字 is the result. There we see c040 on top, with c039 below. This composite character is none other than the 'zi' of 'wen zi' that we have been referring to. It means, 'tmetagram', i.e., a divisible graph. (Cf. Gk. tem-, tom-, tm-, to cut. (See 'tome' in the OED.) Into English as, for example, appendectomy, lobotomy, tome; perhaps 'schizogram' would be clearer.:-) There are additional meanings, of course.
Note that the graph zi itself is a zi and, wen (c067) itself is a wen.
彣 is composed with 'wen' on the left and c059, ''piliform-polumeous' on the right. This character means 'designs and coloring', a meaning arising from the respective independent meanings of the two hemigrams.
孖 is an example of a tmetagram (zi) formed by duplicating a single wen, in this case c039, 'apogenic', 'offspring'. The main meaning here is 'twins'. Duplicating a grapheme to form a new kanji is called 'diplogram' as discussed above. XXXXX
孨 is a rare kanji in which the duplication process is carried one step further. The early meaning is 'weak', a tripling of the 'weakness' inherent in each c039. This develops into the meaning of 'circumspect', 'humble'. Arranging three instances of a grapheme, usually as a triangle, to form a new kanji is called a 'triplogram'. Ibid. XXXXXX
林 is another example of a tmetagram (zi) formed by duplicating a single wen, in this case c075, 'dendrological', 'dendritic'. The meaning of this zi is 'woods', 'grove'.
森 shows this process carried one step further. The meaning is 'forest', a tripling of the 'tree' of c075.
丁 in its current form seems to combine c001 on top with c006 below so we might mistake it for a zi. But originally it was a single, non-segmentable wen pictograph of a nail. It carries the meaning of 'nail down', 'secure', 'strong'.
丄 is the original graph for 'up', 'above', 'ascend'. It falls into a tiny minority of graphs call dactyliograms, i.e. those that point to an 'idea'. This handful of graphs may constitute the only true 'ideograms' or 'symbols' among the more than 50,000 graphs that make up the Chinese character repertoire. I think this is properly considered as a wen, although for lexigraphic and calligraphic purposes, it has been cut into two halves, c002 above and c001 below.
丅 likewise, is the original graph for 'down', 'below', 'descend'. It, too, has been cut into two halves, c001 above and c002 below, for lexigraphic and calligraphic purposes, but its original form can best be taken as a wen.
To end our introduction to zi, here is 鬱 ! All the elements that are used to compose this kanji are included in the list of fifteen Examples of Classifiers above. 鬱 is probably the most complex graph commonly used in current Japanese (and modern Chinese for that matter) consisting of 29 strokes.
Which element is the classifier? Not an easy question. It turns out that the classifier is the almost never used (you will need to consult a big dictionary to find more than just this single kanji classified under it today) ten-stroke classifier 'hyssopic' 鬯 (c192) occupying roughly the lower left quadrant of the graph. To the right of 'hyssopic' (c192) is 'piliform-polumeous' 彡 (c059). Both of these two lower elements are under the cover of 'kalyptroid' 冖 (c014). On top is 'keramic' 缶 (c121) flanked by 'dendrological' 木 (c075), left and right, i.e. 'dendrological' doubled yielding 'woods' 林, but split apart for calligraphic purposes. 鬱 means, among other things, 'dense growth', 'over-grown', 'rampant'. Here it is again at a larger font size:
鬱 is pronounced yv4 in Mandarin, UTSU in Japanese.