eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'CharacterEncoderImplementations::ISO10646_to_UTF8':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: ISO10646_to_UTF8 (in CharacterEncoderImplementations)


Inheritance:

   Object
   |
   +--CharacterEncoder
      |
      +--CharacterEncoderImplementations::VariableBytesEncoder
         |
         +--CharacterEncoderImplementations::ISO10646_to_UTF8
            |
            +--CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
            |
            +--CharacterEncoderImplementations::ISO10646_to_XMLUTF8

Package:
stx:libbasic
Category:
Collections-Text-Encodings
Version:
rev: 1.31 date: 2018/01/19 13:43:21
user: stefan
file: CharacterEncoderImplementations__ISO10646_to_UTF8.st directory: libbasic
module: stx stc-classLibrary: libbasic

Description:


I can encode unicode characters into utf-8 and
decode utf-8 characters into unicode.

Notice the naming (many are confused):
    Unicode is the set of number-to-glyph assignments
whereas:
    UTF8 is a concrete way of xmitting Unicode codePoints (numbers).
UTF16 is another concrete encoding, for example.    
    
ST/X NEVER uses UTF8 internally - all characters are full 24bit characters.
Only when exchanging data, are these converted into UTF8 (or other) byte sequences.


Class protocol:

instance creation
o  flushSingleton
flushes the cached singleton

usage example(s):

     self flushSingleton

o  new
returns a singleton

o  theOneAndOnlyInstance
returns a singleton

queries
o  bytesToReadFor: firstByte


Instance protocol:

encoding & decoding
o  decodeString: aStringOrByteCollection
given a string in UTF8 encoding,
return a new string containing the same characters, in Unicode encoding.
Returns either a normal String, a Unicode16String or a Unicode32String instance.
This is only useful, when reading from external sources or communicating with
other systems
(ST/X never uses utf8 internally, but always uses strings of fully decoded unicode characters).
This only handles up-to 30bit characters.

o  encodeString: aUnicodeString
return the UTF-8 representation of a Unicode string.
The resulting string is only useful to be stored on some external file,
not for being used inside ST/X.

queries
o  characterSize: charOrCodePoint
return the number of bytes required to encode codePoint

o  nameOfEncoding

stream support
o  encodeCharacter: aUnicodeCharacter on: aStream
given a character in unicode, encode it onto aStream.

o  encodeString: aUnicodeString on: aStream
given a string in unicode, encode it onto aStream.

o  readNext: charactersToReadArg charactersFrom: aStream
decode the next charactersToRead on aStream from utf-8 to unicode

o  readNextCharacterFrom: aStream
decode the next character or byte on aStream from utf-8 to unicode


Examples:


Encoding (unicode to utf8) ISO10646_to_UTF8 encodeString:'hello'. Decoding (utf8 to unicode): |t| t := ISO10646_to_UTF8 encodeString:'Helloœ'. ISO10646_to_UTF8 decodeString:t.

ST/X 7.2.0.0; WebServer 1.670 at bd0aa1f87cdd.unknown:8081; Fri, 26 Apr 2024 12:07:34 GMT