Text - What is UTF-8, UTF-16, UTF-32?

They are character encodings for Unicode.

What does UTF stand for?

The acronym UTF is short for Unicode Transformation Format.
A Unicode Transformation Format is a character encoding for the Unicode character set.
A character encoding describes how to transform a Unicode code point (a character) to a sequence of bytes that can be stored in a document.

UTF-8, UTF-16 and UTF-32 are the three most popular character encodings for Unicode.

UTF-8 can be read as Unicode Transformation Format - 8.
UTF-16 can be read as Unicode Transformation Format - 16.
UTF-32 can be read as Unicode Transformation Format - 32.

UTF-8 code points are variable-width: 1,2,3 or 4-bytes. The 1-byte code points are backwards compatible with ASCII.
UTF-16 code points are variable-width: 2-bytes or 4-bytes.
UTF-32 code points are fixed-width: 4-bytes.



Ads by Google


Ask a question, send a comment, or report a problem - click here to contact me.

© Richard McGrath