[Home] [About me] [Chinese] [Research] [Software] [Electronics] [Radio]

The Cangjie Input Method (倉頡輸入法)

Introduction

Cangjie is a method to input Chinese characters into a computer. It is based on a relatively simple geometric decomposition of characters, where each key on the keyboard (which can be a perfectly ordinary western computer keyboard) represents a certain shape. All you need to know is the shape of the character, and the rules of Cangjie decomposition. This has the big advantage that you do not need to know the pronunciation of the character in some language, nor do you need to know its stroke order. Furthermore, since Cangjie codes are typically five characters long, each code typically represents only one Chinese character, as opposed to some other input methods where codes are shorter and you need to select from a list of many possible characters.

Now, some reasons why you would not want to use Cangjie. First of all, it requires some learning. If you already know e.g. Mandarin or Japanese and want to get typing right away, you should use a phonetic input method. Second, if you forget how to write a character you can't type it, even if you know the pronunciation. That said, it's usually enough to remember only parts of the character near the corners, since in complex characters the middle parts are normally skipped. The reason I use Cangjie is because my main interest is classical Chinese, which is quite remote from modern spoken languages. I should mention that Cangjie was created in Taiwan for traditional/complex characters, and I have no personal experience of variants using simplified characters.

This Wikipedia article has descriptions of each shape and a picture of the keyboard layout. I recommend drawing/printing this and putting it next to your computer, so that you can learn to touch type Chinese. There is also a summary of the decomposition rules.

Getting started is simple, but there are many shapes that do not obviously correspond to the character listed. Some common examples of characters that I find non-obvious based on the Wikipedia information are 之 (戈弓人) and 作 (人人尸). The painful way of learning by trial-and-error, or looking up the Cangjie codes, can probably never be avoided entirely. However, I hope that this document will make your training less painful than mine was.

Shapes

First of all, you should learn which character each letter on your keyboard represents. After all, you will want to type several of these frequently. This is the easy part, at the end of your first day you should be able to type these at a reasonable speed.

Now starts the interesting part, where you memorize (and get an intuitive feeling for) the decomposition rules, and learn the various shapes a given letter can represent. I will list each of these here, along with a summary of its shapes and some examples.

A small disclaimer: most of what I write below is based on guesswork, so some of it may be wrong or incomplete. Corrections are most welcome.

手 (hand)

Used as the hand radical, like in 授 (手月月水). Generally used for the class of shapes with a vertical line crossed by two horizontal lines, such as lower part of 年 (人手), the middle of 用 (月手) and the upper parts of 夫 (手人) and 馬 (尸手尸火). Combined with 大 it forms the upper part of 春 (手大日), and combined with 一 it forms the upper part of 青.

田 (field)

Used for the field radical, like in 謂 (卜口田月) and 當 (火月口田), as well as similar shapes, like the right part of 神, the center of 東 (木田) and 會 (人一田日). A more important use is that of the square enclosure, like in 四 (田金) and 國 (田戈口一). Used also for the net radical, like in 罪 (田中中一卜).

水 (water)

Used for the water radical, like in 泰 (手大水), 永 (戈弓水) and 冰 (戈一水), including its compressed form, like in 江 (水一). A common use is like in the lower-right part of 反 (竹水), commonly used together with 竹 to form e.g. the upper right part of 路 (口一竹水口).

口 (mouth)

Used for the mouth radical, like in 品 (口口口), and other empty squares, like in 官 (十口中口).

廿 (twenty)

Used for twenty itself, and several vaguely similar shapes, the most common probably being the grass radical, like in 花 (廿人心). Other examples include 華 (廿一廿十) and characters with added horizontal lines such as 其 (廿一一金) and 甘 (廿一). Vertical lines can also be added, like in 無 (人廿火) and 卅 (廿十). Less obvious are some other mesh-like patterns, such as 井 (廿廿) and 並 (廿廿金). Finally, it is commonly used for a horizontal line with two smaller strokes on top of it, like the lower part of 立 (卜廿).

卜 (divination)

Used for the divination radical, and various rotations of a longer line with a small stroke at the middle, like the upper part of 立 (卜廿), 方 (卜竹尸), 言 (卜一一口) and 上 (卜一), or the bottom part of 下 (一卜), or the middle of 走 (土卜人). Also used for pairs of down-right strokes, like the ice radical (bottom part) of 冬 (竹水卜), and the shape in the upple middle part of 與 (竹重卜金) and the upper right part of 龍 (卜月卜尸心). Also used for the compressed form of the walk radical, like in 道 (卜廿竹山).

山 (mountain)

Used for the mountain radical, like in 崇 (山十一火), and for the box radical, like in 出 (山山) and 凶 (山大). Used for the down-then-left shape seen in the lower right part of 元 (一一山), 己 (尸山), 孔 (弓木山) and 七 (十山). Also used with 月 to form 目 (月山).

戈 (spear)

Used for the spear radical like in 我 (竹手戈). The most important use is for the simple dot, like in 氷 (戈水), 之 (戈弓人) and 為 (戈大弓火). Sometimes the dot is attached to something else, like in the bottom right of 玄 (卜女戈). Also used for the shape in the bottom part of 公 (金戈). Can also represent the shape used in the upper/left part of 床 (戈木).

人 (person)

Used for the person radical, like in 从 (人人) and 以 (女戈人) and its compressed form, like in 仁 (人一一). Also used for the upper part of 無 (人廿火) and 年 (人手), which can also be directly connected to other parts as in the upper left part of 知 (人大口). Used with 竹 to form 入 (人竹) and 八 (竹人). However, note that 八 is most often represented by 金, like in 谷 (金人口). 人 can represent the bottom part of a more complex shape, like in 夫 (手人) and 漢 (水卜中人). Another use of 人 is the bottom stroke of 之 (戈弓人) and the bottom right stroke of 家 (十一尸人), these are the same type of stroke as the right stroke of 人 itself. It is also used in 丘 (人一), whatever the reasoning behind this might be.

心 (heart)

Used for the heart radical, like in 志 (竹心) and 必 (心竹), and its compressed form, like in 惟 (心人土). Also used for both the components of 比 (心心), another example of this is 北 (中一心). Used for the two strokes in the lower right part of 代 (人戈心) and of 民 (口女心). Used in the right/bottom parts of 也 (心木) and 世 (心廿). Used for the first two strokes of 勿 (心竹竹). Used for the bottom right part of 龍 (卜月卜尸心).

日 (sun)

Used for the sun (日) and say (曰) radicals, like in 書 (中土日, this is the say radical) and 晝 (中土日一, this is the sun radical). Used to form some other shapes, like 巴 (日山), 門 (日弓) and 艮 (日女), including its compressed form 既 (日戈一女山).

尸 (corpse)

Used for the corpse radical, like in 屋 (尸一戈土). Also used to form the door radical 戶 (竹尸), the pig radical 豕 (一尸竹人) and for the common shape 尹 (尸大). Used for general right-then-down strokes, like in the lower right part of 方 (卜竹尸) and the upper/right part of 刀 (尸竹) and 司 (尸一口). Note however that the compressed form of the knife radical is coded 中弓, as in 刖 (月中弓). Also used for the upper right corner shape, like in 馬 (尸手尸火) and 臣 (尸中尸中). In both these cases 尸 occurs twice, representing these last two different shapes. Used to form 耳 (尸十). Also used to form the vertical line with horizontal lines on its right side seen in the lower right part of 作 (人人尸) and the upper part of 長 (尸一女). While 非 (中一尸) itself is encoded using 尸, when it forms a compound the right part is coded using 一 and 卜, like in 罪 (田中中一卜). These details seem to vary between different Cangjie versions.

木 (tree)

Used for the tree radical, like in 林 (木木) or 余 (人一木). Other things are frequently added to the basic tree form, for instance 禾 (竹木), 未 (十木), 末 (十木), 東 (木田) and 來 (木人人). Also used for a vertical stroke with a hook at the bottom, which is crossed by a horizontal line, like in 于 (一木), 乎 (竹火木), 爭 (月尸木), 五 (一木一) and 子 (弓木). A common case of this is the radical 寸 (木戈). Also used in 也 (心木).

火 (fire)

Used for the fire radical, like in 炎 (火火), and its compressed form, like in the bottom dots of 為 (戈大弓火). Although 小 (弓金) is encoded differently, when used in compounds 火 is used, for instance 少 (火竹) and 糸 (女戈火). The latter's compressed form is the left side of 細 (女火田), where the bottom left part is also represented by 火. The common reveal radical is encoded as 戈火, but this may not be obvious from its common compressed form like in the left side of 神 (戈火中田中). Also used for the general shape of a vertical line with dots on the side, as in 平 (一火十).

土 (earth)

Used for the earth radical, and (in compound characters) for the similar scholar radical 士 (十一). For instance, 至 (一戈土, earth radical), 地 (土心木, compressed earth radical) and 吉 (土口, scholar radical). Also used as a part of various shapes, such as 王 (一土), 書 (中土日), 先 (竹土竹山) and 羔 (廿土火). It forms the radical 隹 (人土) in an unexpected way.

竹 (bamboo)

Used for the bamboo radical and its common compressed form, seen at the top of characters such as 笑 (竹竹大). Also used for down-and-left stokes, such as the top of 重 (竹十田土), the upper left part of 行 (竹人一一弓), the upper part of 白 (竹日) and 千 (竹十), the lower left stroke of 及 (弓竹水), 刀 (尸竹) and 見 (月山竹山), etc. It is part of 臼 (竹重), which forms several other forms, like the upper sides of 與 (竹重卜金), 鼠 (竹重女卜女) and 學 (竹月弓木). Also used for 身 (竹重竹).

十 (ten)

Used for the ten radical, like the lower part of 千 (竹十). Also used for cross shapes in the uppermost part of 者 (十大日) and 事 (十中中弓), the upper right part of 使 (人十中大) and in two places each in 軍 (月十田十) and 南 (十月廿十). Used for the roof radical, at the top of 官 (十口中口) and 字 (十弓木). While it's used in 士 (十一), when this radical is used in compounds it is encoded as 土 (see above). The horizontal stroke may be more tilted, like in 七 (十山)

大 (big)

Used for the big radical, like in 天 (一大) and 太 (大戈). May be used to construct more complex forms with a 大-like shape at the bottom, such as the bottom of the upper part of 春 (手大日) and the bottom left part of 知 (人大口). Also used for various X-like shapes, like in 凶 (山大), 者 (十大日), 有 (大月), 使 (人十中大), 故 (十口人大), 文 (卜大), 又 (弓大) and 九 (大弓). Less obvious examples include 為 (戈大弓火), 君 (尸大口), the upper part of the compressed dog radical in 犯 (大竹尸山), the lower left part of 建 (弓大中手).

中 (middle)

Used for the 中 shape, like in 虫 (中一戈), 史 (中大), 事 (十中中弓) and the center right part of 漢 (水廿中人). Also used for vertical lines, like in the lower part of 甲 (田中), the two bottom middle lines of 而 (一月中中), the lines in 州 (戈中戈中), the connecting lines in 臣 (尸中尸中) and the line connecting the boxes in 官 (十口中口). Used for the hand shape in the upper part of 書 (中土日), and the bottom of 事 (十中中弓). Also used for the compressed clothes radical, as in 複 (中人日水), not to be confused with the compressed reveal radical which is encoded as 戈火, like in 神 (戈火中田中).

金 (metal)

Used for the metal radical, like in 銀 (金日女). Most commonly used for a pair of short strokes, like the ones at the top of 谷 (金人口) and 公 (金戈), or the bottom of 與 (竹重卜金) and 其 (廿一一金). Also used for the more elaborate strokes inside 四 (田金). Used when these strokes are located on different sides of another shape as well, like in 亦 (卜中弓金) and 並 (廿廿金).

女 (woman)

Used for the woman radical, like in 如 (女口) and 好 (女弓木). Also used for the vertical stroke with an up-and-right hook at the bottom, like in 以 (女戈人) and 民 (日女心). In some cases there may be extra strokes included in 女, like in 艮 (日女) and 長 (尸一女). In the compresed version, where the two extra strokes are replaced by a dot, 戈 is used in the encoding: 飲 (人戈弓人). Used for the first stroke of 女, Japanese く, like in 幺 (女戈). Recall that 厶 (女戈) is encoded as 戈 in compounds.

月 (moon)

Used for the moon radical, like in 朋 (月月), and the compressed meat radical, like in 刖 (月中弓). Also used for the upper part of 愛 (月月心水) and 受 (月月水), or for the upper left part of 然 (月大火) and 祭 (月人一一火). Note that the similar 夕 (弓戈) is coded differently. Used in general for shapes similar to the outer part (first two strokes) of 月, such as 用 (月手), 市 (卜中月), 同 (月一口), 周 (月土口) and 丹 (月卜). This includes the roof without a dot on the top (which is encoded with 十), like in the top of 軍 (月十田十). Combined with 山 in generates the eye radical 目 (月山) and related shapes such as the shell radical 貝 (月山金). With 一 it generates 且 (月一). With 廿 it generates 皿 (月廿).

弓 (bow)

Used for the bow radical, like in 引 (弓中). More commonly used for a vertical line with a hook at the bottom (and rotations of this), like in the bottom of 事 (十中中弓) and 丁 (一弓), or the left part of the compressed knife radical, like in 則 (月金中弓). Also used for the first stroke of 子 (弓木) and 又 (弓大). 予 (弓戈弓弓) demonstrates both these functions. Also used for the right stroke of 几 (竹弓) and 九 (大弓).

一 (one)

Used for the one radical, like in 三 (一一一), and for horizontal lines in general, like in 丁 (一弓) and 示 (一一火). Also used for the 工 (一中一) element in compounds, like 江 (水一). Sometimes there may be a small stroke behind the horizontal line that is included in 一, like in 而 (一月中中) and 百 (一日).

Installing Cangjie

This obviously depends on your system, I will just briefly describe how I use Cangjie on my Debian system. The software I use is SCIM, the Smart Common Input Method. The packages you need in Debian are scim, scim-gtk2-immodule, and scim-tables-zh. After installing these, you should create the file /etc/X11/Xsession.d/95xinput with the contents:

export XMODIFIERS="@im=SCIM"
export GTK_IM_MODULE="scim"
export QT_IM_MODULE="scim"

Now you can run scim-setup and configure SCIM. Personally, I use Control+space as trigger, and Control+n to switch input method. In the Global Setup section, enable Cangjie 5. Apply these changes, and you should be able to press Control+space to enable SCIM (and possibly Control+n one or a few times to enable Cangjie).

While we're at it, I recommend xfce4-terminal to work with Chinese, since it's relatively lightweight yet supports the nice, large antialiased fonts you likely want to use.