|
Class: ISO10646_to_UTF16BE (in CharacterEncoderImplementations)
Object
|
+--CharacterEncoder
|
+--CharacterEncoderImplementations::VariableBytesEncoder
|
+--CharacterEncoderImplementations::ISO10646_to_UTF16BE
|
+--CharacterEncoderImplementations::ISO10646_to_UTF16LE
- Package:
- stx:libbasic
- Category:
- Collections-Text-Encodings
- Version:
- rev:
1.12
date: 2019/05/28 12:48:39
- user: stefan
- file: CharacterEncoderImplementations__ISO10646_to_UTF16BE.st directory: libbasic
- module: stx stc-classLibrary: libbasic
encodes/decodes UTF16 BigEndian (big-end-first)
Notice the naming (many are confused):
Unicode is the set of number-to-glyph assignments
whereas:
UTF8, UTF16 etc. are a concrete way of xmitting Unicode codePoints (numbers).
ST/X NEVER uses UTF8 or UTF16 internally - all characters are full 24bit characters.
Only when exchanging data, are these converted into UTF8 (or other) byte sequences.
encoding & decoding
-
decodeString: aStringOrByteCollection
-
given a byteArray (2-bytes per character) or unsignedShortArray in UTF16 encoding,
return a new string containing the same characters, in 8, 16bit (or more) encoding.
Returns either a normal String, a TwoByte- or a FourByte-String instance.
Only useful, when reading from external sources.
This only handles up-to 30bit characters.
usage example(s):
self new decodeString:#[ 16r00 16r42 ]
self new decodeString:#[ 16r01 16r42 ]
self new decodeString:#[ 16r00 16r48
16r00 16r69
16rD8 16r00
16rDC 16r00
16r00 16r21
16r00 16r21
]
self new decodeString:#( 16r0048
16r0069
16rD800
16rDC00
16r0021
16r0021
)
|
-
encode: aCode
-
-
encodeString: aUnicodeString
-
return the UTF-16 representation of a aUnicodeString.
The resulting string is only useful to be stored on some external file,
not for being used inside ST/X.
usage example(s):
(self encodeString:'hello') #[0 104 0 101 0 108 0 108 0 111]
(self encodeString:(Character value:16r40) asString) #[0 64]
(self encodeString:(Character value:16rFF) asString) #[0 255]
(self encodeString:(Character value:16r100) asString) #[1 0]
(self encodeString:(Character value:16r1000) asString) #[16 0]
(self encodeString:(Character value:16r2000) asString) #[32 0]
(self encodeString:(Character value:16r4000) asString) #[64 0]
(self encodeString:(Character value:16r8000) asString) #[128 0]
(self encodeString:(Character value:16rD7FF) asString) #[215 255]
(self encodeString:(Character value:16rE000) asString) #[224 0]
(self encodeString:(Character value:16rFFFF) asString) #[255 255]
(self encodeString:(Character value:16r10000) asString) #[216 64 220 0]
(self encodeString:(Character value:16r10FFF) asString) #[216 67 223 255]
(self encodeString:(Character value:16r1FFFF) asString) #[216 127 223 255]
(self encodeString:(Character value:16r10FFFF) asString) #[219 255 223 255]
error cases:
(self encodeString:(Character value:16rD800) asString)
(self encodeString:(Character value:16rD801) asString)
(self encodeString:(Character value:16rDFFF) asString)
(self encodeString:(Character value:16r110000) asString)
|
private
-
nextTwoByteValueFrom: aStream
-
queries
-
characterSize: charOrCodePoint
-
return the number of bytes required to encode codePoint
-
nameOfEncoding
-
stream support
-
encodeCharacter: aUnicodeCharacter on: aStream
-
given a string in unicode, encode it onto aStream.
-
encodeString: aUnicodeString on: aStream
-
given a string in unicode, encode it onto aStream.
-
readNextCharacterFrom: aStream
-
Encoding (unicode to utf16BE)
ISO10646_to_UTF16BE encodeString:'hello'.
Decoding (utf16BE to unicode):
|t|
t := ISO10646_to_UTF16BE encodeString:'ÄÖÜß'.
ISO10646_to_UTF16BE decodeString:t.
Decoding (utf16LE-Bytes to unicode):
ISO10646_to_UTF16LE decodeString:#[ 16r40 0 16r41 0 16r42 0 16r43 0 16r44 0 ].
ISO10646_to_UTF16BE decodeString:#[ 16r40 0 16r41 0 16r42 0 16r43 0 16r44 0 ] copy swapBytes.
|