Note: [1][2] Range was initially part of the Private Use Area in Unicode 1.0.0,[3] and removed from it in Unicode 1.0.1.
CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.
The block has dozens of ideographic variation sequences registered in the Unicode Ideographic Variation Database (IVD).[4][5]
These sequences specify the desired glyph variant for a given Unicode character.
Character sources
Sources for the original collection of CJK Compatibility Ideographs include:
South Korean KS X 1001 (U+F900–U+FA0B, 268 characters)
19 are unifiable with characters in the URO, and are therefore compatibility ideographs in the strict sense.
One (U+FA20蘒CJK COMPATIBILITY IDEOGRAPH-FA20) is a kyūjitai form of a kokuji whose extended shinjitai form exists in the URO (U+8612蘒CJK UNIFIED IDEOGRAPH-8612). Both are hyōgai kanji, and are variants of the jinmeiyō kanjiU+8429萩CJK UNIFIED IDEOGRAPH-8429 (i.e. Kummerowia). U+FA20 was assigned a normalisation to U+8612, even though the 龜 and 亀 components, while both forms of radical 213, are not usually considered unifiable.[6]
The remaining 12 are kokuji characters which are actually unified ideographs (with the Unified_Ideograph property, and which do not change upon normalisation). In spite of their inclusion in the CJK Compatibility Ideographs block and their algorithmically generated character names beginning with "CJK COMPATIBILITY IDEOGRAPH", they are not duplicates of characters in the original CJK Unified Ideographs block in any respect;[7][8] 11 of these 12 are completely non-duplicate, while U+FA23﨣CJK COMPATIBILITY IDEOGRAPH-FA23 was later unintentionally duplicated in CJK Unified Ideographs Extension B as U+27EAF𧺯CJK UNIFIED IDEOGRAPH-27EAF. They are as follows:
Sato, T. K.; Kobayashi, Tatsuo; Pak, Tong Gi (2002-05-22), Proposal to add 122 compatibility Hanja code table of the D P R of Korea into the CJK Compatibility Ideographs of ISO/IEC 10646-1:2000
Suignard, Michel (2002-12-12), "USA T.5 e, USA T.8", Proposed disposition of comments on SC2 N 3624 (FPDAM text for Amendment 2 to ISO/IEC 10646-1:2000)
^Freytag, Asmus; McGowan, Rick; Whistler, Ken (2021-06-14). "Known Anomalies in Unicode Character Names". Unicode Consortium. Unicode Technical Note #27. These 12 characters are unified CJK ideographs, not compatibility ideographs, despite their names.
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Not unified
Not unified
Not unified
Not unified
Not unified
Not unified
Not unified
12 are unified
Not unified
Not unified
Not unified
Han
Han
Han
Han
Han
Han
Han
Han
Han
Han
Han
Han Common
Han, Hangul, Common, Inherited
Common
Hangul, Katakana, Common
Katakana, Common
Han
Common Hiragana, Common
Han