eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'CharacterEncoderImplementations::ISO10646_to_UTF16BE':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: ISO10646_to_UTF16BE (in CharacterEncoderImplementations)


Inheritance:

   Object
   |
   +--CharacterEncoder
      |
      +--CharacterEncoderImplementations::VariableBytesEncoder
         |
         +--CharacterEncoderImplementations::ISO10646_to_UTF16BE
            |
            +--CharacterEncoderImplementations::ISO10646_to_UTF16LE

Package:
stx:libbasic
Category:
Collections-Text-Encodings
Version:
rev: 1.16 date: 2024/01/29 16:07:44
user: stefan
file: CharacterEncoderImplementations__ISO10646_to_UTF16BE.st directory: libbasic
module: stx stc-classLibrary: libbasic

Description:


encodes/decodes UTF16 BigEndian (big-end-first)

Notice the naming (many are confused):
    Unicode is the set of number-to-glyph assignments
whereas:
    UTF8, UTF16 etc. are a concrete way of xmitting Unicode codePoints (numbers).

ST/X NEVER uses UTF8 or UTF16 internally - all characters are full 24bit characters.
Only when exchanging data, are these converted into UTF8 (or other) byte sequences.

copyright

COPYRIGHT (c) 2005 by eXept Software AG All Rights Reserved This software is furnished under a license and may be used only in accordance with the terms of that license and with the inclusion of the above copyright notice. This software may not be provided or otherwise made available to, or used by, any other person. No title to or ownership of the software is hereby transferred.

Class protocol:

queries
o  bomBytes
(comment from inherited method)
return the BOM (byte order mark) bytes or nil.
Only applicable for UTF encoders.


Instance protocol:

encoding & decoding
o  decodeString: aStringOrByteCollection
given a byteArray (2-bytes per character) or unsignedShortArray in UTF16 encoding,
return a new string containing the same characters, in 8, 16bit (or more) encoding.
Returns either a normal String, a TwoByte- or a FourByte-String instance.
Only useful, when reading from external sources.
This only handles up-to 30bit characters.

Usage example(s):

     self new decodeString:#[ 16r00 16r42 ]            
     self new decodeString:#[ 16r01 16r42 ]            
     self new decodeString:#[ 16r00 16r48
                              16r00 16r69  
                              16rD8 16r00  
                              16rDC 16r00  
                              16r00 16r21  
                              16r00 16r21  
                            ]            

     self new decodeString:#( 16r0048
                              16r0069  
                              16rD800  
                              16rDC00  
                              16r0021  
                              16r0021  
                            )

o  encode: aCode

o  encodeString: aUnicodeString
return the UTF-16 representation of a aUnicodeString.
The resulting string is only useful to be stored on some external file,
not for being used inside ST/X.

Usage example(s):

     (self encodeString:'hello')                                         #[0 104 0 101 0 108 0 108 0 111]
     (self encodeString:(Character value:16r40) asString)                #[0 64]
     (self encodeString:(Character value:16rFF) asString)                #[0 255]
     (self encodeString:(Character value:16r100) asString)               #[1 0]
     (self encodeString:(Character value:16r1000) asString)              #[16 0]
     (self encodeString:(Character value:16r2000) asString)              #[32 0]
     (self encodeString:(Character value:16r4000) asString)              #[64 0]
     (self encodeString:(Character value:16r8000) asString)              #[128 0]
     (self encodeString:(Character value:16rD7FF) asString)              #[215 255]
     (self encodeString:(Character value:16rE000) asString)              #[224 0]
     (self encodeString:(Character value:16rFFFF) asString)              #[255 255]
     (self encodeString:(Character value:16r10000) asString)             #[216 64 220 0]
     (self encodeString:(Character value:16r10FFF) asString)             #[216 67 223 255]
     (self encodeString:(Character value:16r1FFFF) asString)             #[216 127 223 255]
     (self encodeString:(Character value:16r10FFFF) asString)            #[219 255 223 255]             
    error cases:
     (self encodeString:(Character value:16rD800) asString) 
     (self encodeString:(Character value:16rD801) asString) 
     (self encodeString:(Character value:16rDFFF) asString) 
     (self encodeString:(Character value:16r110000) asString)   

private
o  nextTwoByteValueFrom: aStream

queries
o  characterSize: charOrCodePoint
return the number of bytes required to encode codePoint

o  nameOfEncoding

stream support
o  encodeCharacter: aUnicodeCharacter on: aStream
given a string in unicode, encode it onto aStream.

o  encodeString: aUnicodeString on: aStream
given a string in unicode, encode it onto aStream.

o  readNextCharacterFrom: aStream

testing
o  isUtf16Encoder
answer true, if this encodes from/to UTF-16 (regardless of byte-order)

o  isUtfEncoder
answer true, if this encodes from/to any UTF (regardless of how many bytes and byte-order).
In other words: does it make sense to prepend a BOM?


Examples:


Encoding (unicode to utf16BE)
   ISO10646_to_UTF16BE encodeString:'hello'.


Decoding (utf16BE to unicode):
   |t|

   t := ISO10646_to_UTF16BE encodeString:'ÄÖÜß'.
   ISO10646_to_UTF16BE decodeString:t.

Decoding (utf16LE-Bytes to unicode):
   ISO10646_to_UTF16LE decodeString:#[ 16r40 0 16r41 0 16r42 0 16r43 0 16r44 0 ].
   ISO10646_to_UTF16BE decodeString:#[ 16r40 0 16r41 0 16r42 0 16r43 0 16r44 0 ] copy swapBytes.


ST/X 7.7.0.0; WebServer 1.702 at 20f6060372b9.unknown:8081; Wed, 22 Jan 2025 08:57:59 GMT