|
Class: Character
Object
|
+--Magnitude
|
+--Character
- Package:
- stx:libbasic
- Category:
- Magnitude-General
- Version:
- rev:
1.265
date: 2024/02/21 13:26:34
- user: stefan
- file: Character.st directory: libbasic
- module: stx stc-classLibrary: libbasic
This class represents characters.
Notice, that actual character objects are not used when characters
are stored in strings, symbols etc.
These only store a character's codePoint for a more compact representation.
The word 'asciiValue' is a historic leftover - actually, any integer
code is allowed and actually used (i.e. characters are not limited to 8bit).
Also, the encoding is actually Unicode, of which ascii is a subset and the same encoding value
for the first 128 characters (codePoint 0 to 127 are the same in ascii).
Some heavily used Characters are kept as singletons; i.e. for every asciiValue (0..N),
there exists exactly one instance of Character, which is shared and immutable.
Character codePoint:xxx checks for this, and returns a reference to an existing instance.
For N<=255, this is guaranteed; i.e. in all Smalltalks, the single byte characters are always
handled like this, and you can therefore safely compare them using == (identity compare).
Other characters (i.e. codepoint > N) are not guaranteed to be shared;
i.e. these may or may not be created as required.
Actually, do NOT depend on which characters are and which are not shared.
Always compare using #= if there is any chance of a non-ascii character being involved.
Once again (because beginners sometimes make this mistake):
This means: you may compare characters using #== ONLY IFF you are certain,
that the characters ranges is 0..255.
Otherwise, you HAVE TO compare using #=. (if in doubt, always compare using #=).
Sorry for this inconvenience, but it is (practically) impossible to keep
the possible maximum of 2^32 characters (Unicode) around, for that convenience alone.
In ST/X, N is (currently) 1024. This means that all the latin characters and some others are
kept as singleton in the CharacterTable class variable (which is also used by the VM when characters
are instantiated).
Methods marked as (JS) come from the manchester Character goody
(CharacterComparing) by Jan Steinman, which allow Characters to be used as
Interval elements (i.e. ($a to:$z) do:[...] );
They are not a big deal, but convenient add-ons.
Some of these have been modified a bit.
WARNING: characters are known by compiler and runtime system -
do not change the instance layout.
Also, although you can create subclasses of Character, the compiler always
creates instances of Character for literals ...
... and other classes are hard-wired to always return instances of characters
in some cases (i.e. String>>at:, Symbol>>at: etc.).
Therefore, it may not make sense to create a character-subclass.
Case Mapping in Unicode:
There are a number of complications to case mappings that occur once the repertoire
of characters is expanded beyond ASCII.
* Because of the inclusion of certain composite characters for compatibility,
such as U+01F1 'DZ' capital dz, there is a third case, called titlecase,
which is used where the first letter of a word is to be capitalized
(e.g. Titlecase, vs. UPPERCASE, or lowercase).
For example, the title case of the example character is U+01F2 'Dz' capital d with small z.
* Case mappings may produce strings of different length than the original.
For example, the German character U+00DF small letter sharp s expands when uppercased to
the sequence of two characters 'SS'.
This also occurs where there is no precomposed character corresponding to a case mapping.
*** This is not yet implemented (in 5.2) ***
* Characters may also have different case mappings, depending on the context.
For example, U+03A3 capital sigma lowercases to U+03C3 small sigma if it is not followed
by another letter, but lowercases to 03C2 small final sigma if it is.
*** This is not yet implemented (in 5.2) ***
* Characters may have case mappings that depend on the locale.
For example, in Turkish the letter 0049 'I' capital letter i lowercases to 0131 small dotless i.
*** This is not yet implemented (in 5.2) ***
* Case mappings are not, in general, reversible.
For example, once the string 'McGowan' has been uppercased, lowercased or titlecased,
the original cannot be recovered by applying another uppercase, lowercase, or titlecase operation.
Collation Sequence:
*** This is not yet implemented (in 5.2) ***
[instance variables:]
asciivalue obvious { InstanceVariable: asciivalue Class: SmallInteger }
copyrightCOPYRIGHT (c) 1988 by Claus Gittinger
All Rights Reserved
This software is furnished under a license and may be used
only in accordance with the terms of that license and with the
inclusion of the above copyright notice. This software may not
be provided or otherwise made available to, or used by, any
other person. No title to or ownership of the software is
hereby transferred.
Compatibility-Squeak
-
escape
( an extension from the stx:libcompat package )
-
return the escape character (squeak compatibility)
accessing untypeable characters
-
controlCharacter: char
-
Answer the Character representing ctrl-char.
ctrl-a -> 1; ctrl-@ -> 0
Usage example(s):
self controlCharacter:$@ -> 0
self controlCharacter:$a -> 1
self controlCharacter:$d -> 4
self controlCharacter:$z -> 26
self controlCharacter:$[ -> 27
self controlCharacter:$\ -> 28
self controlCharacter:$] -> 29
self controlCharacter:$_ -> 31
|
-
endOfInput
-
Answer the Character representing ctrl-d (Unix-EOF).
-
leftParenthesis
-
Answer the Character representing a left parenthesis.
-
period
-
Answer the Character representing a period character.
-
poundSign
-
Answer the Character representing a pound sign (hash).
-
rightParenthesis
-
Answer the Character representing a right parenthesis.
constants
-
backspace
-
return the backspace character
-
bell
-
return the bell character
-
cr
-
return the lineEnd character
- actually (in unix) this is a newline character
-
del
-
return the delete character
-
doubleQuote
-
return the double-quote character
-
esc
-
return the escape character
-
etx
-
return the end-of-text character
-
euro
-
The Euro currency sign (notice: not all fonts support it).
The Unicode encoding is U+20AC
Usage example(s):
Transcript font:(Font family:'courier' size:12 encoding:'iso10646-1').
Transcript showCR:Character euro
|
-
excla
-
return the exclamation-mark character
-
ff
-
return the form-feed character
-
lf
-
return the newline/linefeed character
-
linefeed
-
squeak compatibility: return the newline/linefeed character
-
maxImmediateCodePoint
-
return the maximum codePoint until which the characters are shared
Usage example(s):
self maxImmediateCodePoint
|
-
maxValue
-
return the maximum codePoint a character may have
-
nbsp
-
return the non-breaking space character
-
newPage
-
return the form-feed (newPage) character
-
nl
-
return the newline character
-
null
-
return the null character;
Notice, that in ST/X strings have an invisible (and w.r.t the string's size uncounted)
terminating NULL character, to make it easier to pass strings to C-functions.
However, this is ONLY true for single-byte strings.
-
pageUp
-
return the pageUp control character
-
quote
-
return the single-quote character
-
return
-
return the (carriage) return character.
In ST/X, this is different from cr - for Unix reasons.
-
space
-
return the blank character
-
tab
-
return the tabulator character
constants non-printable
-
byteOrderMark
-
the unicode UTF BOM character as a singleton.
The UTF-8 Encoder connverts this character to the byte sequence #[16rEF 16rBB 16rBF].
Usage example(s):
self byteOrderMark
self byteOrderMark utf8Encoded asByteArray hexPrintString
((CharacterEncoder encoderFor:#utf16le) encodeCharacter:self byteOrderMark) asByteArray hexPrintString
((CharacterEncoder encoderFor:#utf16be) encodeCharacter:self byteOrderMark) asByteArray hexPrintString
(self codePoint:16rFEFF) == self byteOrderMark
|
-
popDirectionalFormatting
-
the unicode popDirectionalFormatting pops previously set directional formatting.
See https://en.wikipedia.org/wiki/Bidirectional_text#Table_of_possible_BiDi_character_types
-
rightToLeftMark
-
the unicode rightToLeftMark marks the preceeding character as right-to-left (Hebrew/Arabic).
See https://en.wikipedia.org/wiki/Bidirectional_text#Table_of_possible_BiDi_character_types
Usage example(s):
-
rightToLeftOverride
-
the unicode rightToLeftOverride marks the following text as right-to-left (Hebrew/Arabic).
See https://en.wikipedia.org/wiki/Bidirectional_text#Table_of_possible_BiDi_character_types
Usage example(s):
instance creation
-
basicNew
-
catch new - Characters cannot be created with new
-
codePoint: anInteger
-
return a character with codePoint anInteger,
Codepoints from 0 to CharacterTable size (1024) are mapped to singletons.
Usage example(s):
Character codePoint:16r34. -> $4
Character codePoint:16r3455. -> (Character codePoint:16r3455)
(Character codePoint:16rFEFF) -> (Character codePoint:16rFEFF)
Character codePoint:16rFFFFFFFFFFFFFFFFFFF. -> error
|
-
digitValue: anInteger
-
return a character that corresponds to anInteger.
0-9 map to $0-$9, 10-35 map to $A-$Z
-
readFrom: aStringOrStream onError: exceptionBlock
-
return a new Character, reading a printed representation from aStringOrStream.
Usage example(s):
self readFrom:'$a'.
self readFrom:'$ß'.
self readFrom:'(Character backspace)'.
self readFrom:'(Character codePoint:16r444)'.
|
-
utf8DecodeFrom: aStream
-
read and return a single unicode character from an UTF8 encoded stream.
Answer nil, if Stream>>#next answers nil.
Usage example(s):
Character utf8DecodeFrom:'a' readStream
Character utf8DecodeFrom:#[195 188] asString readStream
|
-
value: anInteger
-
return a character with codePoint anInteger - backward compatibility
primitive input
-
fromUser
-
return a character from the keyboard (C's standard input stream)
- this should only be used for emergency evaluators and the like.
queries
-
allCharacters
-
added for squeak compatibility: return a collection of all singleton chars.
Notice, for memory efficiency reasons, only some of the low-codepoint characters
are actually kept as singletons. less frequently used character instances are created on the fly,
as wide string elements are accessed (and hopefully garbage collected sooner or later)
Usage example(s):
-
hasSharedInstances
-
return true if this class can share instances when stored binary,
that is, instances with the same value can be stored by reference.
Although not always shared (TwoByte CodePoint-Characters), these should be treated
so, to be independent of the number of the underlying implementation
-
isBuiltInClass
-
return true if this class is known by the run-time-system.
Here, true is returned for myself, false for subclasses.
-
isLegalUnicodeCodePoint: anInteger
-
answer true, if anInteger is a valid unicode code point
-
maxVal
-
-
separators
-
return a collection of separator chars.
Added for squeak compatibility
Usage example(s):
Compatibility-Dolphin
-
isAlphaNumeric
-
Compatibility method for ANSI.
Return true, if I am a letter or a digit
Same as isLetterOrDigit.
-
isAlphabetic
-
Compatibility method - do not use in new code.
Return true, if I am a letter.
Please use isLetter for compatibility reasons (which is ANSI).
-
isControl
-
Compatibility method - do not use in new code.
Return true if I am a control character (i.e. ascii value < 32)
-
isPunctuation
-
Compatibility method - do not use in new code.
The code below is not unicode aware.
Q: are digits really punctuation?
Usage example(s):
(1 to:255) collect:[:i| i asCharacter] thenSelect:[:c| c isPunctuation].
|
Compatibility-Squeak
-
asUnicode
( an extension from the stx:libcompat package )
-
the same as #codePoint
-
charCode
( an extension from the stx:libcompat package )
-
(self asInteger bitAnd: 16r3FFFFF).
accessing
-
codePoint
-
return the codePoint of myself.
Traditionally, this was named 'asciiValue';
however, characters are not limited to 8bit characters.
-
instVarAt: index put: anObject
-
catch instvar access - asciivalue may not be changed
arithmetic
-
+ aMagnitude
-
Return the Character that is <aMagnitude> higher than the receiver.
Wrap if the resulting value is not a legal Character value. (JS)
Usage example(s):
-
- aMagnitude
-
Return the Character that is <aMagnitude> lower than the receiver.
Wrap if the resulting value is not a legal Character value. (JS)
claus:
return the difference as integer, if the argument is another character.
If the argument is a number, a character is returned.
Usage example(s):
-
// aMagnitude
-
Return the Character who's value is the receiver divided by <aMagnitude>.
Wrap if the resulting value is not a legal Character value. (JS)
-
\\ aMagnitude
-
Return the Character who's value is the receiver modulo <aMagnitude>.
Wrap if the resulting value is not a legal Character value. (JS)
comparing
-
< aMagnitude
-
return true, if the arguments asciiValue is greater than the receiver's
-
<= aMagnitude
-
return true, if the arguments asciiValue is greater or equal to the receiver's
-
= aCharacter
-
return true, if the argument, aCharacter is the same character
Redefined to take care of character sizes > 8bit.
Usage example(s):
$A = (Character value:65)
$A = (Character codePoint:65)
$A = ($B-1)
$A = 65
|
-
> aMagnitude
-
return true, if the arguments asciiValue is less than the receiver's
-
>= aMagnitude
-
return true, if the arguments asciiValue is less or equal to the receiver's
-
hash
-
return an integer useful for hashing
-
identityHash
-
return an integer useful for hashing on identity
Usage example(s):
$a identityHash.
(Character value:1234) identityHash
|
-
sameAs: aCharacter
-
return true, if the argument, aCharacter is the same character,
ignoring case differences.
Usage example(s):
$x sameAs:$x
$x sameAs:$X
(Character value:345) sameAs:(Character value:345)
(Character value:345) asUppercase sameAs:(Character value:345) asLowercase
$Ж sameAs:$ж -- u0416 - u0436
$ж sameAs:$Ж -- u0436 - u0416
|
-
~= aCharacter
-
return true, if the argument, aCharacter is not the same character
Redefined to take care of character sizes > 8bit.
converting
-
asCharacter
-
usually sent to integers, but redefined here to allow integers
and characters to be used commonly without a need for a test.
Usage example(s):
-
asInteger
-
the same as #codePoint.
Use #asInteger, if you need protocol compatibility with Numbers etc..
Use #codePoint in any other case for better stc optimization
-
asLowercase
-
return a character with same letter as the receiver, but in lowercase.
Returns the receiver if it is already lowercase or if there is no lowercase equivalent.
CAVEAT:
for now, this method is only correct for unicode characters up to u+1d6ff (Unicode3.1).
(which is more than mozilla does, btw. ;-)
Usage example(s):
$A asLowercase => $a
$a asLowercase => $a
$ß asLowercase => (Character codePoint:16rDF)
$ß asUppercase asLowercase => (Character codePoint:16rDF)
$ÿ asUppercase asLowercase => (Character codePoint:16rFF)
$Þ asLowercase => (Character codePoint:16rFE)
$Ý asLowercase => (Character codePoint:16rFD)
$À asLowercase => (Character codePoint:16rE0)
(Character value:16r01F5) asUppercase asLowercase => (Character codePoint:16r1F5)
(Character value:16r0205) asUppercase asLowercase => (Character codePoint:16r205)
(Character value:16r03B1) asUppercase asLowercase => (Character codePoint:16r3B1)
(Character value:16r1E00) asLowercase => (Character codePoint:16r1E01)
|
-
asString
-
return a string of len 1 with myself as contents
Usage example(s):
(Character value:16rB5) asString
(Character value:16r1B5) asString
|
-
asSymbol
-
Return a unique symbol with the name taken from the receiver's characters.
Here, a single character symbol is returned.
-
asTitlecase
-
return a character with same letter as the receiver, but in titlecase.
Returns the receiver if it is already titlecase or if there is no titlecase equivalent.
Usage example(s):
$A asTitlecase
$a asTitlecase
(Character value:16r01F1) asTitlecase
(Character value:16r01F2) asTitlecase
|
-
asUnicodeString
-
return a unicode string of len 1 with myself as contents.
This will vanish, as we now (rel5.2.x) use Unicode as default.
-
asUppercase
-
return a character with same letter as the receiver, but in uppercase.
Returns the receiver if it is already uppercase or if there is no uppercase equivalent.
CAVEAT:
for now, this method is only correct for unicode characters up to u+1d6ff (Unicode3.1).
(which is more than mozilla does, btw. ;-)
Usage example(s):
$A asLowercase => $a
$a asUppercase => $A
$ß asUppercase => (Character codePoint:16r1E9E)
$ß isUppercase => false
$ß isLowercase => true
$ß asUppercase isUppercase => true
$ß asUppercase isLowercase => false
(Character value:16r01F5) asUppercase
(Character value:16r0205) asUppercase
(Character value:16r03B1) asUppercase
|
-
digitValue
-
return my digitValue for any base (up to 37).
The digitValue is the value of me interpreted as a digit in a number-string.
Notice: in case of an invalid character,
ST/X is not X3J20 conform:
ST/X raises an error,
X3J20 returns -1
-
digitValueRadix: base
-
return my digitValue for base.
Return nil, if it is not a valid character for that base
Usage example(s):
self assert:($0 digitValueRadix:10) == 0.
self assert:($9 digitValueRadix:10) == 9.
self assert:($a digitValueRadix:10) == nil.
self assert:($a digitValueRadix:11) == 10.
self assert:($A digitValueRadix:11) == 10.
self assert:($a digitValueRadix:16) == 10.
self assert:($A digitValueRadix:16) == 10.
self assert:($f digitValueRadix:16) == 15.
self assert:($F digitValueRadix:16) == 15.
self assert:($g digitValueRadix:16) == nil.
self assert:($G digitValueRadix:16) == nil.
self assert:($g digitValueRadix:17) == 16.
self assert:($G digitValueRadix:17) == 16.
|
-
literalArrayEncoding
-
encode myself as an array literal, from which a copy of the receiver
can be reconstructed with #decodeAsLiteralArray.
-
to: aMagnitude
-
Return an Interval over the characters from the receiver to <aMagnitude>.
Wrap <aMagnitude> if it is not a legal Character value. (JS)
CG: why wrap - is this a good idea?
-
to: aMagnitude by: inc
-
Return an Interval over the characters from the receiver to <aMagnitude>.
Wrap <aMagnitude> if it is not a legal Character value. (JS)
CG: why wrap - is this a good idea?
-
utf8Encoded
-
convert a character to its UTF-8 encoding.
This returns an 8-bit String
Usage example(s):
$¤ utf8Encoded asByteArray = #[16rC2 16rA4]
$a utf8Encoded = 'a'
(Character value:0xFF) utf8Encoded
|
-
withoutDiacritics
-
return a character with same letter as the receiver, but in without diacritics modifiers
(mapping e.g. Ã to A).
Returns the receiver if it has no diacritics modifiers.
copying
-
, aStringOrCharacter
-
return a string containing the concatenation of the receiver character
and the argument, a string or character.
Added for symmetry, as we allow string,char also char,string should be allowed
Usage example(s):
$. , $:
$. , 'abc' , $.
Time millisecondsToRun:[ 10000000 timesRepeat:[ $a , $b ]]
Time millisecondsToRun:[ 10000000 timesRepeat:[ $a , 'b' ]]
Time millisecondsToRun:[ 10000000 timesRepeat:[ 'a' , 'b' ]]
Time millisecondsToRun:[ 10000000 timesRepeat:[ 'a' , $b ]]
|
-
,* n
-
return a string formed from concatenating the receiver n times,
with 0 returning an empty collection, 1 returning a single char string, etc.
Usage example(s):
$a ,* 5
$a ,* 0
$a ,* 1
'a' ,* 5
'a' ,* 0
'a' ,* 1
'abc' ,* 5
'abc' ,* 0
'abc' ,* 1
|
-
copy
-
return a copy of myself
reimplemented since characters are unique
-
deepCopyUsing: aDictionary postCopySelector: postCopySelector
-
return a deep copy of myself
reimplemented since characters are immutable
-
shallowCopy
-
return a shallow copy of myself
reimplemented since characters are immutable
-
simpleDeepCopy
-
return a deep copy of myself
reimplemented since characters are immutable
dependents access
-
addDependent: someOne
-
It doesn't make sense to add dependents to a shared instance.
Silently ignore ...
-
onChangeSend: selector to: someOne
-
It doesn't make sense to add dependents to a constant; will never change.
Silently ignore ...
encoding
-
rot13
-
Usenet: from `rotate alphabet 13 places']
The simple Caesar-cypher encryption that replaces each English
letter with the one 13 places forward or back along the alphabet,
so that 'The butler did it!' becomes 'Gur ohgyre qvq vg!'
Most Usenet news reading and posting programs include a rot13 feature.
It is used to enclose the text in a sealed wrapper that the reader must choose
to open -- e.g., for posting things that might offend some readers, or spoilers.
A major advantage of rot13 over rot(N) for other N is that it
is self-inverse, so the same code can be used for encoding and decoding.
Usage example(s):
$h rot13
$h rot13 rot13
'The butler did it!' rot13 -> 'Gur ohgyre qvq vg!'
'The butler did it!' rot13 rot13 -> 'The butler did it!'
|
-
rot: n
-
Usenet: from `rotate alphabet N places']
The simple Caesar-cypher encryption that replaces each English
letter with the one N places forward or back along the alphabet,
so that 'The butler did it!' becomes 'Gur ohgyre qvq vg!' by rot:13
Most Usenet news reading and posting programs include a rot13 feature.
It is used to enclose the text in a sealed wrapper that the reader must choose
to open -- e.g., for posting things that might offend some readers, or spoilers.
A major advantage of rot13 over rot(N) for other N is that it
is self-inverse, so the same code can be used for encoding and decoding.
Usage example(s):
'The butler did it!' rot:13 -> 'Gur ohgyre qvq vg!'
('The butler did it!' rot:13) rot:13 -> 'The butler did it!'
|
inspecting
-
inspector2TabCharSet
( an extension from the stx:libtool package )
-
(Character value:2045) inspect
(Character value:0x3C0) inspect
-
inspectorExtraAttributes
( an extension from the stx:libtool package )
-
extra (pseudo instvar) entries to be shown in an inspector.
-
inspectorExtraMenuOperations
( an extension from the stx:libtool package )
-
extra (pseudo instvar) entries to be shown in an inspector.
-
inspectorValueListIconFor: anInspector
( an extension from the stx:libtool package )
-
returns the icon to be shown alongside the value list of an inspector
-
inspectorValueStringInListFor: anInspector
( an extension from the stx:libtool package )
-
returns a string to be shown in the inspector's list
obsolete
-
asciiValue
-
return the asciivalue of myself.
The name 'asciiValue' is a historic leftover:
characters are not limited to 8bit characters.
So the actual value returned is a codePoint (i.e. full potential for 31bit encoding).
PP has removed this method with 4.1 and providing asInteger instead.
ANSI defines #codePoint, please use this method
** This is an obsolete interface - do not use it (it may vanish in future versions) **
printing & storing
-
displayOn: aGCOrStream
-
Compatibility
append a printed desription on some stream (Dolphin, Squeak)
OR:
display the receiver in a graphicsContext at 0@0 (ST80).
This method allows for any object to be displayed in some view
(although the fallBack is to display its printString ...)
-
isLiteral
-
return true, if the receiver can be used as a literal constant in ST syntax
(i.e. can be used in constant arrays)
-
print
-
print myself on stdout.
If Stdout is nil, this method does NOT (by purpose) use the stream classes and
will therefore work even in case of emergency or very early startup (but only, if Stdout is nil).
-
printOn: aStream
-
print myself on aStream
-
printString
-
return a string to print me
-
storeOn: aStream
-
store myself on aStream
private-accessing
-
setCodePoint: anInteger
-
very private - set the codePoint.
- use this only for newly created characters with codes > MAX_IMMEDIATE_CHARACTER.
DANGER alert:
funny things happen, if this is applied to
one of the shared characters with codePoints 0..MAX_IMMEDIATE_CHARACTER.
queries
-
bitsPerCharacter
-
return the number of bits I require for storage.
(i.e. am I an Ascii/ISO8859-1 Character or will I need more
bits for storage.
-
bytesPerCharacter
-
return the number of bytes I require for storage
-
characterSize
-
return the number of bits I require for storage.
Protocol compatibility with CharacterArray.
-
isSafeForHTTP
( an extension from the stx:libcompat package )
-
whether a character is 'safe', or needs to be escaped when used, eg, in a URL
-
stringSpecies
-
return the type of string that is needed to store me
-
unicodeBlock
-
return the name of the unicode block in which this character is.
incomplete
Usage example(s):
(Character value:16r200) unicodeBlock
|
-
utf8BytesPerCharacter
-
return the number of bytes I require for storage in utf-8 encoding
testing
-
isCharacter
-
return true, if the receiver is some kind of character
-
isControlCharacter
-
return true if I am a control character (i.e. ascii value < 32 or == 16rFF)
Usage example(s):
(Character value:1) isControlCharacter
$a isControlCharacter
|
-
isDigit
-
return true, if I am a digit (i.e. $0 .. $9)
-
isDigitRadix: r
-
return true, if I am a digit of a base r number
Usage example(s):
$0 isDigitRadix:2
$1 isDigitRadix:2
$2 isDigitRadix:2
$7 isDigitRadix:8
$8 isDigitRadix:8
$8 isDigitRadix:10
$8 isDigitRadix:16
$8 isDigitRadix:30
$a isDigitRadix:8
$a isDigitRadix:10
$a isDigitRadix:11
$A isDigitRadix:11
$F isDigitRadix:16
$g isDigitRadix:16
|
-
isEndOfLineCharacter
-
return true if I am a line delimitting character
-
isHexDigit
-
return true, if I am a digit of a hex number (i.e. $0 .. $9, $a .. $f, $A .. $F)
Usage example(s):
'0123456789abcdefABCDEF' allSatisfy:[:c| c isHexDigit].
'0123456789abcdefABCDEF' allSatisfy:#isHexDigit.
'0123456789@abcdefABCDEF' allSatisfy:#isHexDigit.
'0123456789xabcdefABCDEF' allSatisfy:[:c| c isHexDigit].
'gG' noneSatisfy:[:c| c isHexDigit].
|
-
isImmediate
-
return true if I am an immediate object
i.e. I am represented in the pointer itself and
no real object header/storage is used by me.
For VW compatibility, shared characters (i.e. in the range 0..MAX_IMMEDIATE_CHARACTER)
also return true here
Usage example(s):
$a isImmediate.
(Character value:255) isImmediate.
(Character value:256) isImmediate.
(Character value:1566) isImmediate.
|
-
isLetter
-
return true, if I am a letter in the 'a'..'z' range.
Use isNationalLetter, if you are interested in those.
-
isLetterOrDigit
-
return true, if I am a letter (a..z or A..Z) or a digit (0..9)
Use isNationalAlphaNumeric, if you are interested in those.
-
isLetterOrDigitOrUnderline
-
return true, if I am a letter (a..z or A..Z) or a digit (0..9)
Use isNationalAlphaNumeric, if you are interested in those.
-
isLetterOrUnderline
-
return true, if I am a letter or $_
-
isLowercase
-
return true, if I am a lower-case letter.
This one does care for national characters.
Caveat:
only returns the correct value for codes up to u+1d6ff (Unicode3.1).
(which is more than mozilla does, btw. ;-)
Usage example(s):
$A isLowercase => false
$a isLowercase => true
$ß isLowercase => true
$ß asUppercase isLowercase => false
$ä isLowercase => true
$Ä isLowercase => false
(Character value:16r01F5) isLowercase => true (g with accent)
(Character value:16r0205) isLowercase => true (e with backw. diareses)
(Character value:16r03B1) isLowercase => true (greek alpha)
|
-
isPrintable
-
return true, if the receiver is a useful printable character
(see fileBrowser's showFile:-method on how it can be used)
-
isSeparator
-
return true if I am a space, cr, tab, nl, or newPage
-
isSpace
-
return true if I am the space character
-
isUppercase
-
return true, if I am an upper-case letter.
This one does care for national characters.
Caveat:
only returns the correct value for codes up to u+1d6ff (Unicode3.1).
(which is more than mozilla does, btw. ;-)
Usage example(s):
$A isUppercase => true
$a isUppercase => false
$ß isUppercase => false
$ß asUppercase isUppercase => true
$ä isUppercase => false
$Ä isUppercase => true
(Character value:16r01F5) isUppercase => false (g with accent)
(Character value:16r0205) isUppercase => false (e with backw. diareses)
(Character value:16r03B1) isUppercase => false (greek alpha)
(Character value:16r03B1) asUppercase isUppercase => true (greek alpha)
|
-
isVowel
-
return true, if I am a vowel (lower- or uppercase)
-
isWhitespace
-
same as isSeparator:
return true if I am a space, cr, tab, nl, or newPage
testing - national
-
asNonDiacritical
-
return a new character which represents the receiver without diacritics.
This is used with string search and when lists are to be ordered/sorted by base character order.
CAVEAT:
for now, this method is only correct for unicode characters up to u+2FF,
i.e. latin languages
Usage example(s):
$e asNonDiacritical
$é asNonDiacritical
$ä asNonDiacritical
$å asNonDiacritical
|
-
isGreekLetter
-
return true, if the receiver is a greek letter (alpha, beta,...).
Usage example(s):
$µ isGreekLetter -- in latin page
$a isGreekLetter
$π isGreekLetter -- pi
$Ω isGreekLetter -- omega
|
-
isNationalAlphaNumeric
-
return true, if the receiver is a letter or digit.
This assumes unicode encoding.
-
isNationalDigit
-
return true, if the receiver is a digit.
This assumes unicode encoding.
WARNING: this method is not complete.
-
isNationalLetter
-
return true, if the receiver is a letter.
CAVEAT:
for now, this method is only correct for unicode characters up to u+1d6ff (Unicode3.1).
(which is more than mozilla does, btw. ;-)
tracing
-
traceInto: aRequestor level: level from: referrer
-
double dispatch into tracer, passing my type implicitely in the selector
visiting
-
acceptVisitor: aVisitor with: aParameter
-
dispatch for visitor pattern; send #visitCharacter:with: to aVisitor
|