Smalltalk/X Webserver

Documentation of class 'Character':

Class: Character

Inheritance
Description
Class protocol
Instance protocol

Inheritance:

   Object
   |
   +--Magnitude
      |
      +--Character

Package:: stx:libbasic

Category:: Magnitude-General

Version:: rev: 1.265 date: 2024/02/21 13:26:34; user: stefan; file: Character.st directory: libbasic; module: stx stc-classLibrary: libbasic

Description:

This class represents characters.

Notice, that actual character objects are not used when characters
are stored in strings, symbols etc.
These only store a character's codePoint for a more compact representation.

The word 'asciiValue' is a historic leftover - actually, any integer
code is allowed and actually used (i.e. characters are not limited to 8bit).
Also, the encoding is actually Unicode, of which ascii is a subset and the same encoding value
for the first 128 characters (codePoint 0 to 127 are the same in ascii).

Some heavily used Characters are kept as singletons; i.e. for every asciiValue (0..N),
there exists exactly one instance of Character, which is shared and immutable.
Character codePoint:xxx checks for this, and returns a reference to an existing instance.
For N<=255, this is guaranteed; i.e. in all Smalltalks, the single byte characters are always
handled like this, and you can therefore safely compare them using == (identity compare).

Other characters (i.e. codepoint > N) are not guaranteed to be shared;
i.e. these may or may not be created as required.
Actually, do NOT depend on which characters are and which are not shared.
Always compare using #= if there is any chance of a non-ascii character being involved.

Once again (because beginners sometimes make this mistake):
This means: you may compare characters using #== ONLY IFF you are certain,
that the characters ranges is 0..255.
Otherwise, you HAVE TO compare using #=. (if in doubt, always compare using #=).
Sorry for this inconvenience, but it is (practically) impossible to keep
the possible maximum of 2^32 characters (Unicode) around, for that convenience alone.

In ST/X, N is (currently) 1024. This means that all the latin characters and some others are
kept as singleton in the CharacterTable class variable (which is also used by the VM when characters
are instantiated).

Methods marked as (JS) come from the manchester Character goody
(CharacterComparing) by Jan Steinman, which allow Characters to be used as
Interval elements (i.e. ($a to:$z) do:[...] );
They are not a big deal, but convenient add-ons.
Some of these have been modified a bit.

WARNING: characters are known by compiler and runtime system -
do not change the instance layout.

Also, although you can create subclasses of Character, the compiler always
creates instances of Character for literals ...
... and other classes are hard-wired to always return instances of characters
in some cases (i.e. String>>at:, Symbol>>at: etc.).
Therefore, it may not make sense to create a character-subclass.

Case Mapping in Unicode:
There are a number of complications to case mappings that occur once the repertoire
of characters is expanded beyond ASCII.

* Because of the inclusion of certain composite characters for compatibility,
such as U+01F1 'DZ' capital dz, there is a third case, called titlecase,
which is used where the first letter of a word is to be capitalized
(e.g. Titlecase, vs. UPPERCASE, or lowercase).
For example, the title case of the example character is U+01F2 'Dz' capital d with small z.

* Case mappings may produce strings of different length than the original.
For example, the German character U+00DF small letter sharp s expands when uppercased to
the sequence of two characters 'SS'.
This also occurs where there is no precomposed character corresponding to a case mapping.
*** This is not yet implemented (in 5.2) ***

* Characters may also have different case mappings, depending on the context.
For example, U+03A3 capital sigma lowercases to U+03C3 small sigma if it is not followed
by another letter, but lowercases to 03C2 small final sigma if it is.
*** This is not yet implemented (in 5.2) ***

* Characters may have case mappings that depend on the locale.
For example, in Turkish the letter 0049 'I' capital letter i lowercases to 0131 small dotless i.
*** This is not yet implemented (in 5.2) ***

* Case mappings are not, in general, reversible.
For example, once the string 'McGowan' has been uppercased, lowercased or titlecased,
the original cannot be recovered by applying another uppercase, lowercase, or titlecase operation.

Collation Sequence:
*** This is not yet implemented (in 5.2) ***

[instance variables:]
asciivalue obvious { InstanceVariable: asciivalue Class: SmallInteger }

This software is furnished under a license and may be used
only in accordance with the terms of that license and with the
inclusion of the above copyright notice. This software may not
be provided or otherwise made available to, or used by, any
other person. No title to or ownership of the software is
hereby transferred.

Class protocol:

Compatibility-Squeak

escape
( an extension from the stx:libcompat package ): return the escape character (squeak compatibility)

accessing untypeable characters

controlCharacter: char

Answer the Character representing ctrl-char.
ctrl-a -> 1; ctrl-@ -> 0

Usage example(s):

     self controlCharacter:$@ -> 0
     self controlCharacter:$a -> 1
     self controlCharacter:$d -> 4
     self controlCharacter:$z -> 26
     self controlCharacter:$[ -> 27
     self controlCharacter:$\ -> 28
     self controlCharacter:$] -> 29
     self controlCharacter:$_ -> 31

endOfInput

Answer the Character representing ctrl-d (Unix-EOF).

leftParenthesis

Answer the Character representing a left parenthesis.

period

Answer the Character representing a period character.

poundSign

Answer the Character representing a pound sign (hash).

rightParenthesis

Answer the Character representing a right parenthesis.

constants

backspace

return the backspace character

bell

return the bell character

cr

return the lineEnd character
- actually (in unix) this is a newline character

del

return the delete character

doubleQuote

return the double-quote character

esc

return the escape character

etx

return the end-of-text character

euro

The Euro currency sign (notice: not all fonts support it).
The Unicode encoding is U+20AC

Usage example(s):

     Transcript font:(Font family:'courier' size:12 encoding:'iso10646-1').
     Transcript showCR:Character euro

excla

return the exclamation-mark character

ff

return the form-feed character

lf

return the newline/linefeed character

linefeed

squeak compatibility: return the newline/linefeed character

maxImmediateCodePoint

return the maximum codePoint until which the characters are shared

Usage example(s):

      self maxImmediateCodePoint

maxValue

return the maximum codePoint a character may have

nbsp

return the non-breaking space character

newPage

return the form-feed (newPage) character

nl

return the newline character

null

return the null character;
Notice, that in ST/X strings have an invisible (and w.r.t the string's size uncounted)
terminating NULL character, to make it easier to pass strings to C-functions.
However, this is ONLY true for single-byte strings.

pageUp

return the pageUp control character

quote

return the single-quote character

return

return the (carriage) return character.
In ST/X, this is different from cr - for Unix reasons.

space

return the blank character

tab

return the tabulator character

constants non-printable

byteOrderMark

the unicode UTF BOM character as a singleton.
The UTF-8 Encoder connverts this character to the byte sequence #[16rEF 16rBB 16rBF].

Usage example(s):

     self byteOrderMark
     self byteOrderMark utf8Encoded asByteArray hexPrintString
     ((CharacterEncoder encoderFor:#utf16le) encodeCharacter:self byteOrderMark) asByteArray hexPrintString
     ((CharacterEncoder encoderFor:#utf16be) encodeCharacter:self byteOrderMark) asByteArray hexPrintString
     (self codePoint:16rFEFF) == self byteOrderMark

popDirectionalFormatting

the unicode popDirectionalFormatting pops previously set directional formatting.
See https://en.wikipedia.org/wiki/Bidirectional_text#Table_of_possible_BiDi_character_types

rightToLeftMark

the unicode rightToLeftMark marks the preceeding character as right-to-left (Hebrew/Arabic).
See https://en.wikipedia.org/wiki/Bidirectional_text#Table_of_possible_BiDi_character_types

Usage example(s):

     self rightToLeftMark

rightToLeftOverride

the unicode rightToLeftOverride marks the following text as right-to-left (Hebrew/Arabic).
See https://en.wikipedia.org/wiki/Bidirectional_text#Table_of_possible_BiDi_character_types

Usage example(s):

     self rightToLeftOverride

instance creation

basicNew

catch new - Characters cannot be created with new

codePoint: anInteger

return a character with codePoint anInteger,
Codepoints from 0 to CharacterTable size (1024) are mapped to singletons.

Usage example(s):

      Character codePoint:16r34.     -> $4
      Character codePoint:16r3455.   -> (Character codePoint:16r3455)
      (Character codePoint:16rFEFF)  -> (Character codePoint:16rFEFF)
      Character codePoint:16rFFFFFFFFFFFFFFFFFFF.  -> error

digitValue: anInteger

return a character that corresponds to anInteger.
0-9 map to $0-$9, 10-35 map to $A-$Z

readFrom: aStringOrStream onError: exceptionBlock

return a new Character, reading a printed representation from aStringOrStream.

Usage example(s):

        self readFrom:'$a'.
        self readFrom:'$ß'.     
        self readFrom:'(Character backspace)'.     
        self readFrom:'(Character codePoint:16r444)'.

utf8DecodeFrom: aStream

read and return a single unicode character from an UTF8 encoded stream.
Answer nil, if Stream>>#next answers nil.

Usage example(s):

      Character utf8DecodeFrom:'a' readStream
      Character utf8DecodeFrom:#[195 188] asString readStream

value: anInteger

return a character with codePoint anInteger - backward compatibility

primitive input

fromUser: return a character from the keyboard (C's standard input stream)
- this should only be used for emergency evaluators and the like.

queries

allCharacters

added for squeak compatibility: return a collection of all singleton chars.
Notice, for memory efficiency reasons, only some of the low-codepoint characters
are actually kept as singletons. less frequently used character instances are created on the fly,
as wide string elements are accessed (and hopefully garbage collected sooner or later)

Usage example(s):

     Character allCharacters

hasSharedInstances

return true if this class can share instances when stored binary,
that is, instances with the same value can be stored by reference.
Although not always shared (TwoByte CodePoint-Characters), these should be treated
so, to be independent of the number of the underlying implementation

isBuiltInClass

return true if this class is known by the run-time-system.
Here, true is returned for myself, false for subclasses.

isLegalUnicodeCodePoint: anInteger

answer true, if anInteger is a valid unicode code point

maxVal

separators

return a collection of separator chars.
Added for squeak compatibility

Usage example(s):

     Character separators

Instance protocol:

Compatibility-Dolphin

isAlphaNumeric

Compatibility method for ANSI.
Return true, if I am a letter or a digit
Same as isLetterOrDigit.

isAlphabetic

Compatibility method - do not use in new code.
Return true, if I am a letter.
Please use isLetter for compatibility reasons (which is ANSI).

isControl

Compatibility method - do not use in new code.
Return true if I am a control character (i.e. ascii value < 32)

isPunctuation

Compatibility method - do not use in new code.
The code below is not unicode aware.

Q: are digits really punctuation?

Usage example(s):

	(1 to:255) collect:[:i| i asCharacter] thenSelect:[:c| c isPunctuation].

Compatibility-Squeak

asUnicode
( an extension from the stx:libcompat package ): the same as #codePoint
charCode
( an extension from the stx:libcompat package ): (self asInteger bitAnd: 16r3FFFFF).

accessing

codePoint: return the codePoint of myself.
Traditionally, this was named 'asciiValue';
however, characters are not limited to 8bit characters.
instVarAt: index put: anObject: catch instvar access - asciivalue may not be changed

arithmetic

+ aMagnitude

Return the Character that is <aMagnitude> higher than the receiver.
Wrap if the resulting value is not a legal Character value. (JS)

Usage example(s):

     $A + 5

- aMagnitude

Return the Character that is <aMagnitude> lower than the receiver.
Wrap if the resulting value is not a legal Character value. (JS)
claus:
return the difference as integer, if the argument is another character.
If the argument is a number, a character is returned.

Usage example(s):

     $z - $a
     $d - 3

// aMagnitude

Return the Character who's value is the receiver divided by <aMagnitude>.
Wrap if the resulting value is not a legal Character value. (JS)

\\ aMagnitude

Return the Character who's value is the receiver modulo <aMagnitude>.
Wrap if the resulting value is not a legal Character value. (JS)

comparing

< aMagnitude

return true, if the arguments asciiValue is greater than the receiver's

<= aMagnitude

return true, if the arguments asciiValue is greater or equal to the receiver's

= aCharacter

return true, if the argument, aCharacter is the same character
Redefined to take care of character sizes > 8bit.

Usage example(s):

	$A = (Character value:65)
	$A = (Character codePoint:65)
	$A = ($B-1)
	$A = 65

> aMagnitude

return true, if the arguments asciiValue is less than the receiver's

>= aMagnitude

return true, if the arguments asciiValue is less or equal to the receiver's

hash

return an integer useful for hashing

identityHash

return an integer useful for hashing on identity

Usage example(s):

      $a identityHash.
      (Character value:1234) identityHash

sameAs: aCharacter

return true, if the argument, aCharacter is the same character,
ignoring case differences.

Usage example(s):

      $x sameAs:$x  
      $x sameAs:$X  
      (Character value:345) sameAs:(Character value:345)
      (Character value:345) asUppercase sameAs:(Character value:345) asLowercase
      $Ж sameAs:$ж     -- u0416 - u0436
      $ж sameAs:$Ж     -- u0436 - u0416

~= aCharacter

return true, if the argument, aCharacter is not the same character
Redefined to take care of character sizes > 8bit.

converting

asCharacter

usually sent to integers, but redefined here to allow integers
and characters to be used commonly without a need for a test.

Usage example(s):

     32 asCharacter

asInteger

the same as #codePoint.
Use #asInteger, if you need protocol compatibility with Numbers etc..
Use #codePoint in any other case for better stc optimization

asLowercase

return a character with same letter as the receiver, but in lowercase.
Returns the receiver if it is already lowercase or if there is no lowercase equivalent.
CAVEAT:
for now, this method is only correct for unicode characters up to u+1d6ff (Unicode3.1).
(which is more than mozilla does, btw. ;-)

Usage example(s):

     $A asLowercase                                     => $a 
     $a asLowercase                                     => $a 
     $ß asLowercase                                     => (Character codePoint:16rDF) 
     $ß asUppercase asLowercase                         => (Character codePoint:16rDF)
     $ÿ asUppercase asLowercase                         => (Character codePoint:16rFF)
     $Þ asLowercase                                     => (Character codePoint:16rFE) 
     $Ý asLowercase                                     => (Character codePoint:16rFD)
     $À asLowercase                                     => (Character codePoint:16rE0) 
     (Character value:16r01F5) asUppercase asLowercase  => (Character codePoint:16r1F5) 
     (Character value:16r0205) asUppercase asLowercase  => (Character codePoint:16r205) 
     (Character value:16r03B1) asUppercase asLowercase  => (Character codePoint:16r3B1) 
     (Character value:16r1E00) asLowercase              => (Character codePoint:16r1E01)

asString

return a string of len 1 with myself as contents

Usage example(s):

     (Character value:16rB5) asString
     (Character value:16r1B5) asString

asSymbol

Return a unique symbol with the name taken from the receiver's characters.
Here, a single character symbol is returned.

asTitlecase

return a character with same letter as the receiver, but in titlecase.
Returns the receiver if it is already titlecase or if there is no titlecase equivalent.

Usage example(s):

     $A asTitlecase
     $a asTitlecase
     (Character value:16r01F1) asTitlecase
     (Character value:16r01F2) asTitlecase

asUnicodeString

return a unicode string of len 1 with myself as contents.
This will vanish, as we now (rel5.2.x) use Unicode as default.

asUppercase

return a character with same letter as the receiver, but in uppercase.
Returns the receiver if it is already uppercase or if there is no uppercase equivalent.
CAVEAT:
for now, this method is only correct for unicode characters up to u+1d6ff (Unicode3.1).
(which is more than mozilla does, btw. ;-)

Usage example(s):

     $A asLowercase                            => $a 
     $a asUppercase                            => $A 
     $ß asUppercase                            => (Character codePoint:16r1E9E)
     $ß isUppercase                            => false
     $ß isLowercase                            => true
     $ß asUppercase isUppercase                => true
     $ß asUppercase isLowercase                => false
     (Character value:16r01F5) asUppercase
     (Character value:16r0205) asUppercase
     (Character value:16r03B1) asUppercase

digitValue

return my digitValue for any base (up to 37).
The digitValue is the value of me interpreted as a digit in a number-string.
Notice: in case of an invalid character,
ST/X is not X3J20 conform:
ST/X raises an error,
X3J20 returns -1

digitValueRadix: base

return my digitValue for base.
Return nil, if it is not a valid character for that base

Usage example(s):

     self assert:($0 digitValueRadix:10) == 0.
     self assert:($9 digitValueRadix:10) == 9.
     self assert:($a digitValueRadix:10) == nil.
     self assert:($a digitValueRadix:11) == 10.
     self assert:($A digitValueRadix:11) == 10.
     self assert:($a digitValueRadix:16) == 10.
     self assert:($A digitValueRadix:16) == 10.
     self assert:($f digitValueRadix:16) == 15.
     self assert:($F digitValueRadix:16) == 15.
     self assert:($g digitValueRadix:16) == nil.
     self assert:($G digitValueRadix:16) == nil.
     self assert:($g digitValueRadix:17) == 16.
     self assert:($G digitValueRadix:17) == 16.

literalArrayEncoding

encode myself as an array literal, from which a copy of the receiver
can be reconstructed with #decodeAsLiteralArray.

to: aMagnitude

Return an Interval over the characters from the receiver to <aMagnitude>.
Wrap <aMagnitude> if it is not a legal Character value. (JS)
CG: why wrap - is this a good idea?

to: aMagnitude by: inc

Return an Interval over the characters from the receiver to <aMagnitude>.
Wrap <aMagnitude> if it is not a legal Character value. (JS)
CG: why wrap - is this a good idea?

utf8Encoded

convert a character to its UTF-8 encoding.
This returns an 8-bit String

Usage example(s):

     $¤ utf8Encoded asByteArray = #[16rC2 16rA4]
     $a utf8Encoded = 'a'
     (Character value:0xFF) utf8Encoded

withoutDiacritics

return a character with same letter as the receiver, but in without diacritics modifiers
(mapping e.g. Ã„ to A).
Returns the receiver if it has no diacritics modifiers.

copying

, aStringOrCharacter

return a string containing the concatenation of the receiver character
and the argument, a string or character.
Added for symmetry, as we allow string,char also char,string should be allowed

Usage example(s):

     $. , $:
     $. , 'abc' , $.

      Time millisecondsToRun:[ 10000000 timesRepeat:[ $a , $b ]]
      Time millisecondsToRun:[ 10000000 timesRepeat:[ $a , 'b' ]]
      Time millisecondsToRun:[ 10000000 timesRepeat:[ 'a' , 'b' ]]
      Time millisecondsToRun:[ 10000000 timesRepeat:[ 'a' , $b ]]

,* n

return a string formed from concatenating the receiver n times,
with 0 returning an empty collection, 1 returning a single char string, etc.

Usage example(s):

     $a ,* 5
     $a ,* 0
     $a ,* 1

     'a' ,* 5
     'a' ,* 0
     'a' ,* 1

     'abc' ,* 5
     'abc' ,* 0
     'abc' ,* 1

copy

return a copy of myself
reimplemented since characters are unique

deepCopyUsing: aDictionary postCopySelector: postCopySelector

return a deep copy of myself
reimplemented since characters are immutable

shallowCopy

return a shallow copy of myself
reimplemented since characters are immutable

simpleDeepCopy

return a deep copy of myself
reimplemented since characters are immutable

dependents access

addDependent: someOne: It doesn't make sense to add dependents to a shared instance.
Silently ignore ...
onChangeSend: selector to: someOne: It doesn't make sense to add dependents to a constant; will never change.
Silently ignore ...

encoding

rot13

Usenet: from `rotate alphabet 13 places']
The simple Caesar-cypher encryption that replaces each English
letter with the one 13 places forward or back along the alphabet,
so that 'The butler did it!' becomes 'Gur ohgyre qvq vg!'
Most Usenet news reading and posting programs include a rot13 feature.
It is used to enclose the text in a sealed wrapper that the reader must choose
to open -- e.g., for posting things that might offend some readers, or spoilers.
A major advantage of rot13 over rot(N) for other N is that it
is self-inverse, so the same code can be used for encoding and decoding.

Usage example(s):

     $h rot13
     $h rot13 rot13
     'The butler did it!' rot13             -> 'Gur ohgyre qvq vg!'
     'The butler did it!' rot13 rot13       -> 'The butler did it!'

rot: n

Usenet: from `rotate alphabet N places']
The simple Caesar-cypher encryption that replaces each English
letter with the one N places forward or back along the alphabet,
so that 'The butler did it!' becomes 'Gur ohgyre qvq vg!' by rot:13
Most Usenet news reading and posting programs include a rot13 feature.
It is used to enclose the text in a sealed wrapper that the reader must choose
to open -- e.g., for posting things that might offend some readers, or spoilers.
A major advantage of rot13 over rot(N) for other N is that it
is self-inverse, so the same code can be used for encoding and decoding.

Usage example(s):

     'The butler did it!' rot:13                -> 'Gur ohgyre qvq vg!'
     ('The butler did it!' rot:13) rot:13       -> 'The butler did it!'

inspecting

inspector2TabCharSet
( an extension from the stx:libtool package ): (Character value:2045) inspect
(Character value:0x3C0) inspect
inspectorExtraAttributes
( an extension from the stx:libtool package ): extra (pseudo instvar) entries to be shown in an inspector.
inspectorExtraMenuOperations
( an extension from the stx:libtool package ): extra (pseudo instvar) entries to be shown in an inspector.
inspectorValueListIconFor: anInspector
( an extension from the stx:libtool package ): returns the icon to be shown alongside the value list of an inspector
inspectorValueStringInListFor: anInspector
( an extension from the stx:libtool package ): returns a string to be shown in the inspector's list

obsolete

asciiValue: return the asciivalue of myself.
The name 'asciiValue' is a historic leftover:
characters are not limited to 8bit characters.
So the actual value returned is a codePoint (i.e. full potential for 31bit encoding).
PP has removed this method with 4.1 and providing asInteger instead.
ANSI defines #codePoint, please use this method

** This is an obsolete interface - do not use it (it may vanish in future versions) **

printing & storing

displayOn: aGCOrStream: Compatibility
append a printed desription on some stream (Dolphin, Squeak)
OR:
display the receiver in a graphicsContext at 0@0 (ST80).
This method allows for any object to be displayed in some view
(although the fallBack is to display its printString ...)
isLiteral: return true, if the receiver can be used as a literal constant in ST syntax
(i.e. can be used in constant arrays)
print: print myself on stdout.
If Stdout is nil, this method does NOT (by purpose) use the stream classes and
will therefore work even in case of emergency or very early startup (but only, if Stdout is nil).
printOn: aStream: print myself on aStream
printString: return a string to print me
storeOn: aStream: store myself on aStream

private-accessing

setCodePoint: anInteger: very private - set the codePoint.
- use this only for newly created characters with codes > MAX_IMMEDIATE_CHARACTER.
DANGER alert:
funny things happen, if this is applied to
one of the shared characters with codePoints 0..MAX_IMMEDIATE_CHARACTER.

queries

bitsPerCharacter

return the number of bits I require for storage.
(i.e. am I an Ascii/ISO8859-1 Character or will I need more
bits for storage.

bytesPerCharacter

return the number of bytes I require for storage

characterSize

return the number of bits I require for storage.
Protocol compatibility with CharacterArray.

isSafeForHTTP
( an extension from the stx:libcompat package )

whether a character is 'safe', or needs to be escaped when used, eg, in a URL

stringSpecies

return the type of string that is needed to store me

unicodeBlock

return the name of the unicode block in which this character is.
incomplete

Usage example(s):

     (Character value:16r200) unicodeBlock

utf8BytesPerCharacter

return the number of bytes I require for storage in utf-8 encoding

testing

isCharacter

return true, if the receiver is some kind of character

isControlCharacter

return true if I am a control character (i.e. ascii value < 32 or == 16rFF)

Usage example(s):

     (Character value:1) isControlCharacter
     $a isControlCharacter

isDigit

return true, if I am a digit (i.e. $0 .. $9)

isDigitRadix: r

return true, if I am a digit of a base r number

Usage example(s):

     $0 isDigitRadix:2   
     $1 isDigitRadix:2   
     $2 isDigitRadix:2   

     $7 isDigitRadix:8   
     $8 isDigitRadix:8   
     $8 isDigitRadix:10   
     $8 isDigitRadix:16  
     $8 isDigitRadix:30  
     $a isDigitRadix:8   
     $a isDigitRadix:10  
     $a isDigitRadix:11   
     $A isDigitRadix:11
     $F isDigitRadix:16
     $g isDigitRadix:16

isEndOfLineCharacter

return true if I am a line delimitting character

isHexDigit

return true, if I am a digit of a hex number (i.e. $0 .. $9, $a .. $f, $A .. $F)

Usage example(s):

     '0123456789abcdefABCDEF' allSatisfy:[:c| c isHexDigit].
     '0123456789abcdefABCDEF' allSatisfy:#isHexDigit.
     '0123456789@abcdefABCDEF' allSatisfy:#isHexDigit.
     '0123456789xabcdefABCDEF' allSatisfy:[:c| c isHexDigit].
     'gG' noneSatisfy:[:c| c isHexDigit].

isImmediate

return true if I am an immediate object
i.e. I am represented in the pointer itself and
no real object header/storage is used by me.
For VW compatibility, shared characters (i.e. in the range 0..MAX_IMMEDIATE_CHARACTER)
also return true here

Usage example(s):

	$a isImmediate.
	(Character value:255) isImmediate.
	(Character value:256) isImmediate.
	(Character value:1566) isImmediate.

isLetter

return true, if I am a letter in the 'a'..'z' range.
Use isNationalLetter, if you are interested in those.

isLetterOrDigit

return true, if I am a letter (a..z or A..Z) or a digit (0..9)
Use isNationalAlphaNumeric, if you are interested in those.

isLetterOrDigitOrUnderline

return true, if I am a letter (a..z or A..Z) or a digit (0..9)
Use isNationalAlphaNumeric, if you are interested in those.

isLetterOrUnderline

return true, if I am a letter or $_

isLowercase

return true, if I am a lower-case letter.
This one does care for national characters.
Caveat:
only returns the correct value for codes up to u+1d6ff (Unicode3.1).
(which is more than mozilla does, btw. ;-)

Usage example(s):

     $A isLowercase                          => false
     $a isLowercase                          => true
     $ß isLowercase                          => true
     $ß asUppercase isLowercase              => false
     $ä isLowercase                          => true
     $Ä isLowercase                          => false
     (Character value:16r01F5) isLowercase   => true   (g with accent)
     (Character value:16r0205) isLowercase   => true   (e with backw. diareses)
     (Character value:16r03B1) isLowercase   => true   (greek alpha)

isPrintable

return true, if the receiver is a useful printable character
(see fileBrowser's showFile:-method on how it can be used)

isSeparator

return true if I am a space, cr, tab, nl, or newPage

isSpace

return true if I am the space character

isUppercase

return true, if I am an upper-case letter.
This one does care for national characters.
Caveat:
only returns the correct value for codes up to u+1d6ff (Unicode3.1).
(which is more than mozilla does, btw. ;-)

Usage example(s):

     $A isUppercase                          => true
     $a isUppercase                          => false
     $ß isUppercase                          => false
     $ß asUppercase isUppercase              => true
     $ä isUppercase                          => false
     $Ä isUppercase                          => true
     (Character value:16r01F5) isUppercase   => false   (g with accent)
     (Character value:16r0205) isUppercase   => false   (e with backw. diareses)
     (Character value:16r03B1) isUppercase   => false   (greek alpha)
     (Character value:16r03B1) asUppercase isUppercase => true   (greek alpha)

isVowel

return true, if I am a vowel (lower- or uppercase)

isWhitespace

same as isSeparator:
return true if I am a space, cr, tab, nl, or newPage

testing - national

asNonDiacritical

return a new character which represents the receiver without diacritics.
This is used with string search and when lists are to be ordered/sorted by base character order.
CAVEAT:
for now, this method is only correct for unicode characters up to u+2FF,
i.e. latin languages

Usage example(s):

     $e asNonDiacritical
     $é asNonDiacritical
     $ä asNonDiacritical
     $å asNonDiacritical

isGreekLetter

return true, if the receiver is a greek letter (alpha, beta,...).

Usage example(s):

     $µ isGreekLetter  -- in latin page
     $a isGreekLetter
     $π isGreekLetter  -- pi
     $Ω isGreekLetter  -- omega

isNationalAlphaNumeric

return true, if the receiver is a letter or digit.
This assumes unicode encoding.

isNationalDigit

return true, if the receiver is a digit.
This assumes unicode encoding.
WARNING: this method is not complete.

isNationalLetter

return true, if the receiver is a letter.
CAVEAT:
for now, this method is only correct for unicode characters up to u+1d6ff (Unicode3.1).
(which is more than mozilla does, btw. ;-)

tracing

traceInto: aRequestor level: level from: referrer: double dispatch into tracer, passing my type implicitely in the selector

visiting

acceptVisitor: aVisitor with: aParameter: dispatch for visitor pattern; send #visitCharacter:with: to aVisitor

ST/X 7.7.0.0; WebServer 1.702 at 20f6060372b9.unknown:8081; Sat, 26 Jul 2025 11:04:50 GMT