<<Up     Contents

CJK

In the field of software and communications internationalization, CJK is a collective term for the majority of East Asian languages: Chinese, Japanese, and Korean.

These languages all share the fact that their writing systems are based partly on Han (Chinese) characters -- Hanzi in Chinese, Kanji in Japanese, and Hanja in Korean --, which require between 4000 characters for a basic vocabulary to 40,000 characters for reasonably complete coverage. This number of characters cannot fit in the 256-character code space of 8-bit encodings, and therefore requires at least a 16-bit fixed width character encoding or multi-byte variable-length encodings.

CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as pinyin, bopomofo, hiragana, katakana, and hangeul.

CJK character encodings include:

The CJK character sets take up the bulk of the Unicode code space. There is much controversy among Chinese language specialists about the desirability and technical merit of the "Han unification" process used to map multiple Chinese and Japanese characters sets into a single set of unified glyphs.

The term CJKV is used to mean CJK plus Vietnamese, which used Chinese characters prior to adopting a written language solely on Romanization.

See also:

References

External links


This article was originally based on material from FOLDOC, used with permission. Update as needed.

wikipedia.org dumped 2003-03-17 with terodump