Universal Coded Character Set

The Universal Coded Character Set is a standard set of characters defined by the International Standard ISO/IEC 10646, Information technology — Universal Coded Character Set , which is the basis of many character encodings. The latest version contains over 136,000 abstract characters, each identified by an unambiguous name and an integer number called its code point. This ISO/IEC 10646 standard is maintained in conjunction with The Unicode Standard, and they are code-for-code identical.
Characters from the many languages, scripts, and traditions of the world are represented in the UCS with unique code points. The inclusiveness of the UCS is continually improving as characters from previously unrepresented writing systems are added.
The UCS has over 1.1 million possible code points available for use/allocation, but only the first 65,536 had entered into common use before 2000. This situation began changing when the People's Republic of China ruled in 2006 that all software sold in its jurisdiction would have to support GB 18030. This required software intended for sale in the PRC to move beyond the BMP.
The system deliberately leaves many code points not assigned to characters, even in the BMP. It does this to allow for future expansion or to minimize conflicts with other encoding forms.

Encoding forms

ISO/IEC 10646 defines several character encoding forms for the Universal Coded Character Set. The simplest, UCS-2, uses a single code value between 0 and 65,535 for each character, and allows exactly two bytes to represent that value. UCS-2 thereby permits a binary representation of every code point in the BMP that represents a character. UCS-2 cannot represent code points outside the BMP.
The first amendment to the original edition of the UCS defined UTF-16, an extension of UCS-2, to represent code points outside the BMP. A range of code points in the S Zone of the BMP remains unassigned to characters. UCS-2 disallows use of code values for these code points, but UTF-16 allows their use in pairs. Unicode also adopted UTF-16, but in Unicode terminology, the high-half zone elements become "high surrogates" and the low-half zone elements become "low surrogates".
Another encoding, UCS-4, uses four bytes. UCS-4 allows representation of each value as exactly four bytes. UCS-4 thereby permits a binary representation of every code point in the UCS, including those outside the BMP. As in UCS-2, every encoded character has a fixed length in bytes, which makes it simple to manipulate, but of course it requires twice as much storage as UCS-2.
Currently, the dominant UCS encoding is UTF-8, which is a variable-width encoding designed for backward compatibility with ASCII, and for avoiding the complications of endianness and byte-order marks in UTF-16 and UTF-32. More than 93% of all Web pages are encoded in UTF-8. The Internet Engineering Task Force requires all Internet protocols to identify the encoding used for character data, and the supported character encodings must include UTF-8. The Internet Mail Consortium recommends that all e-mail programs be able to display and create mail using UTF-8. It is also increasingly being used as the default character encoding in operating systems, programming languages, APIs, and software applications.
See also Comparison of Unicode encodings.

History

The International Organization for Standardization set out to compose the universal character set in 1989, and published the draft of ISO 10646 in 1990. Hugh McGregor Ross was one of its principal architects. That standard differed markedly from the current one. It defined:

128 groups of
256 planes of
256 rows of
256 cells,

for an apparent total of 2,147,483,648 characters, but actually the standard could code only 679,477,248 characters, as the policy forbade byte values of C0 and C1 control codes in any one of the four bytes specifying a group, plane, row and cell. The Latin capital letter A, for example, had a location in group 0x20, plane 0x20, row 0x20, cell 0x41.
One could code the characters of this primordial ISO 10646 standard in one of three ways:

UCS-4, four bytes for every character, enabling the simple encoding of all characters;
UCS-2, two bytes for every character, enabling the encoding of the first plane, 0x20, the Basic Multilingual Plane, containing the first 36,864 codepoints, straightforwardly, and other planes and groups by switching to them with ISO 2022 escape sequences;
UTF-1, which encodes all the characters in sequences of bytes of varying length.

In 1990, therefore, two initiatives for a universal character set existed: Unicode, with 16 bits for every character, and ISO 10646. The software companies refused to accept the complexity and size requirement of the ISO standard and were able to convince a number of ISO National Bodies to vote against it. The ISO standardizers realized they could not continue to support the standard in its current state and negotiated the unification of their standard with Unicode. Two changes took place: the lifting of the limitation upon characters, thus opening code points like 0x0000101F for allocation; and the synchronization of the repertoire of the Basic Multilingual Plane with that of Unicode.
Meanwhile, in the passage of time, the situation changed in the Unicode standard itself: 65,536 characters came to appear insufficient, and the standard from version 2.0 and onwards supports encoding of 1,112,064 code points from 17 planes by means of the UTF-16 surrogate mechanism. For that reason, ISO 10646 was limited to contain as many characters as could be encoded by UTF-16 and no more, that is, a little over a million characters instead of over 679 million. The UCS-4 encoding of ISO 10646 was incorporated into the Unicode standard with the limitation to the UTF-16 range and under the name UTF-32, although it has almost no use outside programs' internal data.
Rob Pike and Ken Thompson, the designers of the Plan 9 operating system, devised a new, fast and well-designed mixed-width encoding, which came to be called UTF-8,
currently the most popular UCS encoding.

Differences from Unicode

ISO 10646 and Unicode have an identical repertoire and numbers—the same characters with the same numbers exist on both standards, although Unicode releases new versions and adds new characters more often. Unicode has rules and specifications outside the scope of ISO 10646. ISO 10646 is a simple character map, an extension of previous standards like ISO 8859. In contrast, Unicode adds rules for collation, normalization of forms, and the bidirectional algorithm for right-to-left scripts such as Arabic and Hebrew. For interoperability between platforms, especially if bidirectional scripts are used, it is not enough to support ISO 10646; Unicode must be implemented.
To support these rules and algorithms, Unicode adds many properties to each character in the set such as properties determining a character's default bidirectional class and properties to determine how the character combines with other characters. If the character represents a numeric value such as the European number ‘8’, or the vulgar fraction ‘¼’, that numeric value is also added as a property of the character. Unicode intends these properties to support interoperable text handling with a mixture of languages.
Some applications support ISO 10646 characters but do not fully support Unicode. One such application, Xterm, can properly display all ISO 10646 characters that have a one-to-one character-to-glyph mapping and a single directionality. It can handle some combining marks by simple overstriking methods, but cannot display Hebrew, Devanagari or Arabic. Most GUI applications use standard OS text drawing routines which handle such scripts, although the applications themselves still do not always handle them correctly.

Citing the Universal Coded Character Set

ISO 10646, a general, informal citation for the ISO/IEC 10646 family of standards, is acceptable in most prose. And even though it is a separate standard, the term Unicode is used just as often, informally, when discussing the UCS. However, any normative references to the UCS as a publication should cite the year of the edition in the form ISO/IEC 10646:, for example: ISO/IEC 10646:2014.

Relationship with Unicode

Since 1991, the Unicode Consortium and the ISO have developed The Unicode Standard and ISO/IEC 10646 in tandem. The repertoire, character names, and code points of Unicode Version 2.0 exactly match those of ISO/IEC 10646-1:1993 with its first seven published amendments. After Unicode 3.0 was published in February 2000, corresponding new and updated characters entered the UCS via ISO/IEC 10646-1:2000. In 2003, parts 1 and 2 of ISO/IEC 10646 were combined into a single part, which has since had a number of amendments adding characters to the standard in approximate synchrony with the Unicode standard.

ISO/IEC 10646-1:1993 = Unicode 1.1
ISO/IEC 10646-1:1993 plus Amendments 5 to 7 = Unicode 2.0
ISO/IEC 10646-1:1993 plus Amendments 5 to 7 = Unicode 2.1 excluding Euro Sign and Object Replacement Character, which are included in Amendment 18
ISO/IEC 10646-1:2000 = Unicode 3.0
ISO/IEC 10646-1:2000 and ISO/IEC 10646-2:2001 = Unicode 3.1
ISO/IEC 10646-1:2000 plus Amendment 1 and ISO/IEC 10646-2:2001 = Unicode 3.2
ISO/IEC 10646:2003 = Unicode 4.0
ISO/IEC 10646:2003 plus Amendment 1 = Unicode 4.1
ISO/IEC 10646:2003 plus Amendments 1 to 2 = Unicode 5.0 excluding Devanagari Letters GGA, JJA, DDDA and BBA, which are included in Amendment 3
ISO/IEC 10646:2003 plus Amendments 1 to 4 = Unicode 5.1
ISO/IEC 10646:2003 plus Amendments 1 to 6 = Unicode 5.2
ISO/IEC 10646:2003 plus Amendments 1 to 8 = ISO/IEC 10646:2011 = Unicode 6.0 excluding Indian Rupee Sign
ISO/IEC 10646:2012 = Unicode 6.1
ISO/IEC 10646:2012 = Unicode 6.2 excluding Turkish Lira Sign, which is included in Amendment 1
ISO/IEC 10646:2012 = Unicode 6.3 excluding Turkish Lira Sign, which is included in Amendment 1, and five bidirectional control characters, which are included in Amendment 2
ISO/IEC 10646:2012 plus Amendments 1 and 2 = Unicode 7.0 excluding the Ruble sign
ISO/IEC 10646:2014 plus Amendment 1 = Unicode 8.0 excluding the Lari sign, nine CJK unified ideographs, and 41 emoji characters
ISO/IEC 10646:2014 plus Amendments 1 and 2 = Unicode 9.0 excluding Adlam, Newa, Japanese TV symbols, and 74 emoji and symbols
ISO/IEC 10646:2017 = Unicode 10.0 excluding 285 Hentaigana characters, 3 Zanabazar Square characters, and 56 emoji symbols

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...