eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'CharacterEncoderImplementations::ISO10646_to_UTF16BE':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: ISO10646_to_UTF16BE (in CharacterEncoderImplementations)


Inheritance:

   Object
   |
   +--CharacterEncoder
      |
      +--CharacterEncoderImplementations::VariableBytesEncoder
         |
         +--CharacterEncoderImplementations::ISO10646_to_UTF16BE
            |
            +--CharacterEncoderImplementations::ISO10646_to_UTF16LE

Package:
stx:libbasic
Category:
Collections-Text-Encodings
Version:
rev: 1.12 date: 2019/05/28 12:48:39
user: stefan
file: CharacterEncoderImplementations__ISO10646_to_UTF16BE.st directory: libbasic
module: stx stc-classLibrary: libbasic

Description:


encodes/decodes UTF16 BigEndian (big-end-first)

Notice the naming (many are confused):
    Unicode is the set of number-to-glyph assignments
whereas:
    UTF8, UTF16 etc. are a concrete way of xmitting Unicode codePoints (numbers).

ST/X NEVER uses UTF8 or UTF16 internally - all characters are full 24bit characters.
Only when exchanging data, are these converted into UTF8 (or other) byte sequences.


Instance protocol:

encoding & decoding
o  decodeString: aStringOrByteCollection
given a byteArray (2-bytes per character) or unsignedShortArray in UTF16 encoding,
return a new string containing the same characters, in 8, 16bit (or more) encoding.
Returns either a normal String, a TwoByte- or a FourByte-String instance.
Only useful, when reading from external sources.
This only handles up-to 30bit characters.

usage example(s):

     self new decodeString:#[ 16r00 16r42 ]            
     self new decodeString:#[ 16r01 16r42 ]            
     self new decodeString:#[ 16r00 16r48
                              16r00 16r69  
                              16rD8 16r00  
                              16rDC 16r00  
                              16r00 16r21  
                              16r00 16r21  
                            ]            

     self new decodeString:#( 16r0048
                              16r0069  
                              16rD800  
                              16rDC00  
                              16r0021  
                              16r0021  
                            )

o  encode: aCode

o  encodeString: aUnicodeString
return the UTF-16 representation of a aUnicodeString.
The resulting string is only useful to be stored on some external file,
not for being used inside ST/X.

usage example(s):

     (self encodeString:'hello')                                         #[0 104 0 101 0 108 0 108 0 111]
     (self encodeString:(Character value:16r40) asString)                #[0 64]
     (self encodeString:(Character value:16rFF) asString)                #[0 255]
     (self encodeString:(Character value:16r100) asString)               #[1 0]
     (self encodeString:(Character value:16r1000) asString)              #[16 0]
     (self encodeString:(Character value:16r2000) asString)              #[32 0]
     (self encodeString:(Character value:16r4000) asString)              #[64 0]
     (self encodeString:(Character value:16r8000) asString)              #[128 0]
     (self encodeString:(Character value:16rD7FF) asString)              #[215 255]
     (self encodeString:(Character value:16rE000) asString)              #[224 0]
     (self encodeString:(Character value:16rFFFF) asString)              #[255 255]
     (self encodeString:(Character value:16r10000) asString)             #[216 64 220 0]
     (self encodeString:(Character value:16r10FFF) asString)             #[216 67 223 255]
     (self encodeString:(Character value:16r1FFFF) asString)             #[216 127 223 255]
     (self encodeString:(Character value:16r10FFFF) asString)            #[219 255 223 255]             
    error cases:
     (self encodeString:(Character value:16rD800) asString) 
     (self encodeString:(Character value:16rD801) asString) 
     (self encodeString:(Character value:16rDFFF) asString) 
     (self encodeString:(Character value:16r110000) asString)   

private
o  nextTwoByteValueFrom: aStream

queries
o  characterSize: charOrCodePoint
return the number of bytes required to encode codePoint

o  nameOfEncoding

stream support
o  encodeCharacter: aUnicodeCharacter on: aStream
given a string in unicode, encode it onto aStream.

o  encodeString: aUnicodeString on: aStream
given a string in unicode, encode it onto aStream.

o  readNextCharacterFrom: aStream


Examples:


Encoding (unicode to utf16BE) ISO10646_to_UTF16BE encodeString:'hello'. Decoding (utf16BE to unicode): |t| t := ISO10646_to_UTF16BE encodeString:'ÄÖÜß'. ISO10646_to_UTF16BE decodeString:t. Decoding (utf16LE-Bytes to unicode): ISO10646_to_UTF16LE decodeString:#[ 16r40 0 16r41 0 16r42 0 16r43 0 16r44 0 ]. ISO10646_to_UTF16BE decodeString:#[ 16r40 0 16r41 0 16r42 0 16r43 0 16r44 0 ] copy swapBytes.

ST/X 7.2.0.0; WebServer 1.670 at bd0aa1f87cdd.unknown:8081; Thu, 28 Mar 2024 09:53:42 GMT