To provide a common technical basis for the processing of electronic information in various languages, the International Organization for Standardization (ISO) has developed an international coding standard called ISO 10646. The ISO 10646 provides a unified standard for the coding of characters in all major languages in the world including traditional and simplified Chinese characters.

The ISO 10646 standard provides a unified character-coding standard for the communication and exchange of electronic information. By adopting the ISO 10646 standard, various computer systems in different parts of the world will be able to more accurately store, process, transmit and display electronic information in different languages, thus facilitating the flow of electronic information and the conduct of electronic transactions across geographical areas.

The ISO released the first version of the ISO 10646 standard in 1993. It was called ISO/IEC 10646-1:1993. In 2000, the ISO released ISO/IEC 10646-1:2000, which is an updated version of ISO/IEC 10646-1:1993. ISO/IEC 10646-1:2000 contains 27,484 ideographic characters consisting of the 20,902 ideographic characters of ISO/IEC 10646-1:1993 plus 6,582 newly defined ideographic characters in the Extension A. In November 2001, the ISO released ISO/IEC 10646-2:2001 as a supplement to ISO/IEC 10646-1:2000. ISO/IEC 10646-2:2001 contains 42,711 newly defined ideographic characters in the Extension B, bringing the total number of ideographic characters contained in the ISO 10646 standard to exceed 70,000.

In April 2004, ISO published the ISO/IEC 10646:2003, which is a single publication as the result of the merger of the previous two releases of ISO 10646 standard: ISO/IEC 10646-1:2000 and its supplement ISO/IEC 10646-2:2001. Therefore, the ideographic characters in the ISO/IEC 10646:2003 standard are the same as those in ISO/IEC 10646-1:2000 cum ISO/IEC 10646-2:2001.

In November 2005, ISO released the ISO/IEC 10646:2003/Amd 1:2005. This release publishes a subset of ideographic characters, named as International Ideographs Core (IICORE). The formulation of this subset of ideographs can provide conveniences for day-to-day electronic communication in Chinese on resource-limited devices. In addition, it also facilitates global electronic communications in Chinese by providing a shared subset of ideographs. The IICORE contains 9,810 characters and it can be implemented in devices with limited memory, input/output capability, and/or applications where the use of complete ISO 10646 ideographs repertoire is not feasible.

The information related to the development of the IICORE is available for reference at http://www.cse.cuhk.edu.hk/~irg/irg/IICore/IICore.htm

For standardisation purposes, some ISO documents (including the ISO 10646 international coding standard and its amendments ) are made freely available by the ISO at the following website: http://standards.iso.org/ittf/PubliclyAvailableStandards/

Unicode and its relationship with ISO 10646 standard

Unicode is a character coding system designed by the Unicode Consortium to support the interchange, processing and display of the written texts of all major languages in the world. Members of the Unicode Consortium are mainly hardware and software vendors.

In 1991, the ISO and the Unicode Consortium decided to cooperate in defining a universal coding standard for multilingual texts. Since then, the two organizations have been working very closely to extend the ISO 10646 standard and Unicode, and to keep them synchronized. The ISO releases information of characters and code points in the ISO 10646 standard, while the Unicode Consortium supplements the characters and code points with implementation algorithms and semantics information. The ISO 10646 standard and Unicode are code-to-code identical. Unicode can be regarded as the implementation version of the ISO 10646 standard. Therefore, products supporting Unicode also support the ISO 10646 standard.

Unicode 3.0 was officially released by the Unicode Consortium in February 2000. It contains 49,194 characters of different languages, in which 27,484 are East Asian (Han) ideographic characters. Unicode 3.0 is synchronized with ISO/IEC 10646-1:2000.

Unicode 3.1 was released in March 2001. The main feature of Unicode 3.1 is the addition of 44,946 new characters, in which 42,711 are ideographic characters. Together with the existing characters in Unicode 3.0, Unicode 3.1 has 94,140 characters, in which more than 70,000 are ideographic characters.

Unicode 4.0 was released in April 2003. It covers 1,226 new characters but the ideographic characters included are still the same as Unicode 3.1. Unicode 4.0 is synchronized with the ISO/IEC 10646:2003.

Unicode 4.1 was released by the Unicode Consortium in March 2005, which corresponds with the ISO/IEC 10646:2003 and its Amendments. The latest version of the Unicode Standard is version 5.0 released in July 2006, in which the ideographic characters included are the same as Unicode 4.1.

Hong Kong Supplementary Character Set (HKSCS)

The HKSCS-2008 is an updated version of the Hong Kong Supplementary Character Set-2004 (HKSCS-2004) published in May 2005. The HKSCS-2008 is aligned technically with the ISO/IEC 10646:2003 and its Amendment 1 published in April 2004 by the International Organization for Standardization (ISO).

The current version of the ISO 10646 standard includes all the 5,009 characters of the HKSCS. (Notice : Revised Principle for the Inclusion of Characters in the HKSCS)


