|
Class: String
Object
|
+--Collection
|
+--SequenceableCollection
|
+--ArrayedCollection
|
+--UninterpretedBytes
|
+--CharacterArray
|
+--String
|
+--ISO8859L1String
|
+--ImmutableString
|
+--JavaScriptEnvironment::String
|
+--Symbol
- Package:
- stx:libbasic
- Category:
- Collections-Text
- Version:
- rev:
1.452
date: 2019/08/10 15:12:42
- user: cg
- file: String.st directory: libbasic
- module: stx stc-classLibrary: libbasic
- Author:
- Claus Gittinger
Strings are ByteArrays storing Characters.
Strings are kind of kludgy: to allow for easy handling by C functions,
there is always one 0-byte added at the end, which is not counted
in the string's size, and is not accessible from the Smalltalk level.
This guarantees, that a Smalltalk string can always be passed to a
C- or a system api function without danger
(of course, this does not prevent a nonsense contents...)
You cannot add any instvars to String, since the run time system & compiler
creates literal strings and knows that strings have no named instvars.
If you really need strings with instVars, you have to create a subclass
of String (the access functions defined here can handle this).
A little warning though: not all Smalltalk systems allow subclassing String,
so your program may become unportable if you do so.
Strings have an implicit (assumed) encoding of ISO-8859-1.
For strings with other encodings, either keep the encoding separately,
or use instances of encodedString.
Be careful when using the 0-byte in a String. This is not prohibited, but
the implementations of some String methods use C functions and may
therefore yield unexpected results (e.g. compareWith:collating:) when
processing a String containing the 0-byte.
Text
StringCollection
TwoByteString
JISEncodedString
Symbol
Compatibility-Dolphin
-
lineDelimiter
-
Dolphin compatibility: answer CR LF
Compatibility-Squeak
-
cr
-
return a string consisting of the cr-Character
usage example(s):
and all cr's are really returns (instead of nl's).
|
-
crlf
-
return a string consisting of the cr-lf Characters
-
lf
-
return a string consisting of the lf Character
-
return
-
return a string consisting of the cr-Character
-
space
-
return a string consisting of a single space Character
-
stringHash: aString initialHash: speciesHash
-
for squeak compatibility only; this is NOT the same hash as my instances use
-
tab
-
return a string consisting of the tab-Character
Javascript support
-
fromCharCode: code ( an extension from the stx:libjavascript package )
-
return a string consisitng of a single character, given its code
-
js_new: argument ( an extension from the stx:libjavascript package )
-
redefinable JS-new:
instance creation
-
basicNew: anInteger
-
return a new empty string with anInteger characters.
In contrast to other smalltalks, this returns a string filled
with spaces (instead of a string filled with 0-bytes).
This makes much more sense, in that a freshly created string
can be directly used as separator or for formatting.
-
new: n
-
return a new empty string with n characters.
In contrast to other smalltalks, this returns a string filled
with spaces (instead of a string filled with 0-bytes).
This makes much more sense, in that a freshly created string
can be directly used as separator or for formatting.
Redefined here with exactly the same code as in Behavior for
better performance.
-
readFrom: aStreamOrString onError: exceptionBlock
-
read & return the next String from the (character-)stream aStream;
skipping all whitespace first; return the value of exceptionBlock,
if no string can be read. The sequence of characters as read from the
stream must be one as stored via storeOn: or storeString.
usage example(s):
String readFrom:('''hello world''' readStream)
String readFrom:('''hello '''' world''' readStream)
String readFrom:('1 ''hello'' ' readStream)
String readFrom:('1 ''hello'' ' readStream) onError:['foobar']
|
-
uninitializedNew: anInteger
-
return a new string with anInteger characters but undefined contents.
Use this, if the string is filled anyway with new data, for example, if
used as a stream buffer.
usage example(s):
String uninitializedNew:100
|
queries
-
defaultPlatformClass
-
dummy for ST-80 compatibility
-
isBuiltInClass
-
return true if this class is known by the run-time-system.
Here, true is returned for myself, false for subclasses.
Compatibility - Squeak
-
asSwikiLink ( an extension from the stx:goodies/webServer/comanche package )
-
-
skipDelimiters: delimiters startingAt: start ( an extension from the stx:goodies/webServer/comanche package )
-
Answer the index of the character within the receiver, starting at start, that does NOT match one of the delimiters. If the receiver does not contain any of the delimiters, answer size + 1. Assumes the delimiters to be a non-empty string.
-
squeakAsInteger ( an extension from the stx:goodies/webServer/comanche package )
-
Answer the Integer created by interpreting the receiver as the string representation of an integer. Answer nil if no digits, else find the first digit and then all consecutive digits after that
-
translateWith: aTable ( an extension from the stx:goodies/webServer/comanche package )
-
'Hallo' translateWith:(String withAll: (Character allCharacters collect: [:c | c asLowercase]))
-
trimNullAndStar ( an extension from the stx:goodies/webServer/comanche package )
-
' * string *** ' -------> 'string'
-
unescapePercents ( an extension from the stx:goodies/webServer/comanche package )
-
change each %XY substring to the character with ASCII value XY in hex. This is the opposite of #encodeForHTTP
Compatibility-Squeak
-
asEnglishPlural ( an extension from the stx:libcompat package )
-
Answer the plural of the receiver. Assumes the receiver is an English noun.
For a more comprehensive algorithm please refer to ''An Algorithmic Approach
to English Pluralization'' by Damian Conway.
-
deepFlattenInto: stream ( an extension from the stx:libcompat package )
-
-
piecesCutWhere: aBlock ( an extension from the stx:libcompat package )
-
Evaluate testBlock for successive pairs of the receiver elements,
breaking the receiver into pieces between elements where
the block evaluated to true, and return an OrderedCollection of
those pieces.
usage example(s):
'A sentence. Another sentence... Yet another sentence.'
piecesCutWhere: [:each :next | each = $. and: [next = Character space]]
|
-
piecesCutWhereCamelCase ( an extension from the stx:libcompat package )
-
Breaks apart words written in camel case.
It's not simply using piecesCutWhere: because we want
to also deal with abbreviations and thus we need to
decide based on three characters, not just on two:
('FOOBar') piecesCutWhereCamelCase asArray = #('FOO' 'Bar').
('FOOBar12AndSomething') piecesCutWhereCamelCase asArray = #('FOO' 'Bar' '12' 'And' 'Something')
-
replaceSuffix: suffix with: replacement ( an extension from the stx:libcompat package )
-
-
withInternetLineEndings ( an extension from the stx:libcompat package )
-
generate a copy with all cr's replaced by crnl
Compatibility-VW5.4
-
asByteString ( an extension from the stx:libcompat package )
-
-
asGUID ( an extension from the stx:libcompat package )
-
return self as a GUID (or UUID if not present)
usage example(s):
'{EAB22AC0-30C1-11CF-A7EB-0000C05BAE0B}' asGUID
|
accessing
-
at: index
-
return the character at position index, an Integer.
Reimplemented here to avoid the additional at:->basicAt: send
(which we can do here, since at: is obviously not redefined in a subclass).
This method is the same as at:.
-
at: index put: aCharacter
-
store the argument, aCharacter at position index, an Integer.
Return aCharacter (sigh).
Reimplemented here to avoid the additional at:put:->basicAt:put: send
(but only for Strings, since subclasses may redefine basicAt:put:).
This method is the same as basicAt:put:.
-
basicAt: index
-
return the character at position index, an Integer
- reimplemented here since we return characters
-
basicAt: index put: aCharacter
-
store the argument, aCharacter at position index, an Integer.
Returns aCharacter (sigh).
- reimplemented here since we store characters
-
first
-
return the first character.
Reimplemented here for speed
usage example(s):
character searching
-
identityIndexOf: aCharacter
-
return the index of the first occurrences of the argument, aCharacter
in the receiver or 0 if not found - reimplemented here for speed.
usage example(s):
'hello world' identityIndexOf:(Character space)
'hello world' identityIndexOf:$d
'hello world' identityIndexOf:1
#[0 0 1 0 0] asString identityIndexOf:(Character value:1)
#[0 0 1 0 0] asString identityIndexOf:(Character value:0)
|
-
identityIndexOf: aCharacter startingAt: index
-
return the index of the first occurrences of the argument, aCharacter
in the receiver or 0 if not found - reimplemented here for speed.
usage example(s):
'hello world' identityIndexOf:(Character space)
'hello world' identityIndexOf:$d
'hello world' identityIndexOf:1
#[0 0 1 0 0] asString identityIndexOf:(Character value:1)
#[0 0 1 0 0] asString identityIndexOf:(Character value:0)
|
-
includes: aCharacter
-
return true, if the receiver includes aCharacter.
- redefined here for speed
usage example(s):
'hello world' includes:$l
'hello world' includes:$W
|s|
s := String new:1024.
s atAllPut:$a.
s at:512 put:(Character space).
Time millisecondsToRun:[
1000000 timesRepeat:[ s includes:(Character space) ]
]
timing (ms):
bcc OSX(2007 powerbook)
110
|
-
includesAny: aCollection
-
return true, if the receiver includes any of the characters in the
argument, aCollection.
- redefined for speed if the argument is a String; especially optimized,
if the searched collection has less than 6 characters.
usage example(s):
'hello world' includesAny:'abcd'
'hello world' includesAny:'xyz'
'hello world' includesAny:'xz'
'hello world' includesAny:'od'
'hello world' includesAny:'xd'
'hello world' includesAny:'dx'
'hello world' includesAny:(Array with:$a with:$b with:$d)
'hello world' includesAny:(Array with:$x with:$y)
'hello world' includesAny:(Array with:1 with:2)
|s|
s := String new:1000 withAll:$a.
Time millisecondsToRun:[
1000000 timesRepeat:[
s includesAny:'12'
]
].540 680 550 850 890 850
|s|
s := String new:2000 withAll:$a.
Time millisecondsToRun:[
1000000 timesRepeat:[
s includesAny:'12'
]
]. 1030 1060 1650 1690
|s|
s := 'hello world'.
Time millisecondsToRun:[
1000000 timesRepeat:[
s includesAny:'12'
]
].70 60
|
-
indexOf: aCharacter startingAt: start
-
return the index of the first occurrence of the argument, aCharacter
in myself starting at start, anInteger or 0 if not found;
- reimplemented here for speed
usage example(s):
'hello world' indexOf:$0 startingAt:1
'hello world' indexOf:$l startingAt:1
'hello world' indexOf:$l startingAt:5
'hello world' indexOf:$d startingAt:5
#[0 0 1 0 0] asString indexOf:(Character value:1) startingAt:1
#[0 0 1 0 0] asString indexOf:(Character value:0) startingAt:3
'1234567890123456a' indexOf:$a
'1234567890123456a' indexOf:$b
|s|
s := '12345678901234b'.
self assert:(s indexOf:$x) == 0.
self assert:(s indexOf:$1) == 1.
self assert:(s indexOf:$2) == 2.
self assert:(s indexOf:$3) == 3.
self assert:(s indexOf:$4) == 4.
self assert:(s indexOf:$5) == 5.
self assert:(s indexOf:$0) == 10.
self assert:(s indexOf:$b) == 15.
|s|
s := ''.
self assert:(s indexOf:$1) == 0.
s := '1'.
self assert:(s indexOf:$1) == 1.
self assert:(s indexOf:$2) == 0.
s := '12'.
self assert:(s indexOf:$1) == 1.
self assert:(s indexOf:$2) == 2.
self assert:(s indexOf:$3) == 0.
s := '123'.
self assert:(s indexOf:$1) == 1.
self assert:(s indexOf:$2) == 2.
self assert:(s indexOf:$3) == 3.
self assert:(s indexOf:$4) == 0.
s := '1234'.
self assert:(s indexOf:$1) == 1.
self assert:(s indexOf:$2) == 2.
self assert:(s indexOf:$3) == 3.
self assert:(s indexOf:$4) == 4.
self assert:(s indexOf:$5) == 0.
s := '12345'.
self assert:(s indexOf:$1) == 1.
self assert:(s indexOf:$2) == 2.
self assert:(s indexOf:$3) == 3.
self assert:(s indexOf:$4) == 4.
self assert:(s indexOf:$5) == 5.
self assert:(s indexOf:$6) == 0.
s := '123456'.
self assert:(s indexOf:$1) == 1.
self assert:(s indexOf:$2) == 2.
self assert:(s indexOf:$3) == 3.
self assert:(s indexOf:$4) == 4.
self assert:(s indexOf:$5) == 5.
self assert:(s indexOf:$6) == 6.
self assert:(s indexOf:$7) == 0.
s := '1234567'.
self assert:(s indexOf:$1) == 1.
self assert:(s indexOf:$2) == 2.
self assert:(s indexOf:$3) == 3.
self assert:(s indexOf:$4) == 4.
self assert:(s indexOf:$5) == 5.
self assert:(s indexOf:$6) == 6.
self assert:(s indexOf:$7) == 7.
self assert:(s indexOf:$8) == 0.
s := '12345678'.
self assert:(s indexOf:$1) == 1.
self assert:(s indexOf:$2) == 2.
self assert:(s indexOf:$3) == 3.
self assert:(s indexOf:$4) == 4.
self assert:(s indexOf:$5) == 5.
self assert:(s indexOf:$6) == 6.
self assert:(s indexOf:$7) == 7.
self assert:(s indexOf:$8) == 8.
self assert:(s indexOf:$9) == 0.
s := '123456789'.
self assert:(s indexOf:$1) == 1.
self assert:(s indexOf:$2) == 2.
self assert:(s indexOf:$3) == 3.
self assert:(s indexOf:$4) == 4.
self assert:(s indexOf:$5) == 5.
self assert:(s indexOf:$6) == 6.
self assert:(s indexOf:$7) == 7.
self assert:(s indexOf:$8) == 8.
self assert:(s indexOf:$9) == 9.
self assert:(s indexOf:$0) == 0.
self assert:(s indexOf:$b) == 0.
|s|
s := String new:1024.
s atAllPut:$a.
s at:512 put:(Character space).
Time millisecondsToRun:[
1000000 timesRepeat:[ s indexOf:(Character space) ]
]
timing (ms):
bcc OSX(2007 powerbook)
v1: 1763 normal
2340 +unroll
3308 memsrch ! 90
v2: 1045 150
|
-
indexOfAny: aCollectionOfCharacters startingAt: start
-
return the index of the first occurrence of any character in aCollectionOfCharacters,
in myself starting at start, anInteger or 0 if not found;
- reimplemented here for speed if aCollectionOfCharacters is a string.
usage example(s):
'hello world' indexOfAny:'eoa' startingAt:1
'hello world' indexOfAny:'eoa' startingAt:6
'hello world' indexOfAny:'AOE' startingAt:1
'hello world' indexOfAny:'o' startingAt:6
'hello world' indexOfAny:'o' startingAt:6
'hello world§' indexOfAny:'#§$' startingAt:6
|
-
indexOfControlCharacterStartingAt: start
-
return the index of the next control character;
that is a character with asciiValue < 32.
Return 0 if none is found.
usage example(s):
'hello world' indexOfControlCharacterStartingAt:1
'hello world\foo' withCRs indexOfControlCharacterStartingAt:1
'1\' withCRs indexOfControlCharacterStartingAt:1
'1\' withCRs indexOfControlCharacterStartingAt:2
|
-
indexOfNonSeparatorStartingAt: start
-
return the index of the next non-whiteSpace character, 0 if none found
-
indexOfSeparatorStartingAt: start
-
return the index of the next separator (whitespace) character; 0 if none found
usage example(s):
123456789012
'hello world ' indexOfSeparatorStartingAt:1 -> 6
'hello world ' indexOfSeparatorStartingAt:3 -> 6
'hello world ' indexOfSeparatorStartingAt:7 -> 12
'hello world' indexOfSeparatorStartingAt:7 -> 0
'helloworld' indexOfSeparatorStartingAt:1 -> 0
|
-
occurrencesOf: aCharacter
-
count the occurrences of the argument, aCharacter in myself
- reimplemented here for speed
usage example(s):
'hello world' occurrencesOf:$a
'hello world' occurrencesOf:$w
'hello world' occurrencesOf:$l
'hello world' occurrencesOf:$x
'hello world' occurrencesOf:1
Time millisecondsToRun:[
1000000 timesRepeat:[ 'abcdefghijklmn' occurrencesOf:$x ]
]. 219 203 156 203 204 204 219 172 187 187 141
|
comparing
-
< aString
-
Compare the receiver with the argument and return true if the
receiver is greater than the argument. Otherwise return false.
No national variants are honored; use after: for this.
In contrast to ST-80, case differences are NOT ignored, thus
'foo' < 'Foo' will return false.
This may change.
-
= aString
-
Compare the receiver with the argument and return true if the
receiver is equal to the argument. Otherwise return false.
This compare is case-sensitive (i.e. 'Foo' is NOT = 'foo').
Use sameAs: to compare with case ignored.
usage example(s):
'foo' = 'Foo'
'foo' sameAs: 'Foo'
#[0 0 1 0 0] asString = #[0 0 1 0 0] asString
|
-
> aString
-
Compare the receiver with the argument and return true if the
receiver is greater than the argument. Otherwise return false.
No national variants are honored; use after: for this.
In contrast to ST-80, case differences are NOT ignored, thus
'foo' > 'Foo' will return true.
This may change.
-
compareCaselessWith: aString
-
Compare the receiver against the argument, ignoring case.
Return 1 if the receiver is greater, 0 if equal and -1 if less than the argument.
usage example(s):
'aaa' compareCaselessWith:'aaaa' -1
'aaaa' compareCaselessWith:'aaa' 1
'aaaa' compareCaselessWith:'aaaA' 0
'aaaA' compareCaselessWith:'aaaa' 0
'aaaAB' compareCaselessWith:'aaaa' 1
'aaaaB' compareCaselessWith:'aaaA' 1
'aaaa' compareCaselessWith:'aaaAB' -1
'aaaA' compareCaselessWith:'aaaaB' -1
'aaaa' compareCaselessWith:'aaax' -1
'aaaa' compareCaselessWith:'aaaX' -1
|
-
compareCollatingWith: aString
-
Compare the receiver with the argument and return 1 if the receiver is
greater, 0 if equal and -1 if less than the argument in a sorted list.
The comparison is language specific, depending on the value of
LC_COLLATE, which is in the shell environment.
usage example(s):
'hallo' compareWith:'hällo'
'hbllo' compareWith:'hällo'
'hallo' compareCollatingWith:'hällo'
'hbllo' compareCollatingWith:'hällo'
|
-
compareWith: aString
-
Compare the receiver with the argument and return 1 if the receiver is
greater, 0 if equal and -1 if less than the argument.
This comparison is based on the elements' codepoints -
i.e. upper/lowercase & national characters are NOT treated specially.
'foo' compareWith: 'Foo' will return 1.
while 'foo' sameAs:'Foo' will return true
-
compareWith: aString collating: collatingBoolean
-
Compare the receiver with the argument and return 1 if the receiver is
greater, 0 if equal and -1 if less than the argument.
If the collatingBoolean is true, the comparison will be based on the
current setting of LC_COLLATE in the locale (which is set in the shell environment);
otherwise, it will be a simple string-compare.
This comparison is based on the elements' codepoints -
i.e. upper/lowercase & national characters are NOT treated specially.
'foo' compareWith: 'Foo' will return 1.
while 'foo' sameAs:'Foo' will return true
-
endsWith: aStringOrChar
-
return true, if the receiver ends with something, aStringOrChar.
If aStringOrChar is an empty string, true is returned
usage example(s):
'hello world' endsWith:'world'
'hello world' endsWith:'earth'
'hello world' endsWith:$d
'hello world' endsWith:$e
'' endsWith:$d
'hello world' endsWith:#($r $l $d)
'hello world' endsWith:''
|
-
hash
-
return an integer useful as a hash-key.
This default method uses whichever hash algorithm
used in the ST/X VM (which is actually fnv-1a)
usage example(s):
'a' hash
'ab' hash = 'ab' asUnicode16String hash
|
-
hash_dragonBook
-
return an integer useful as a hash-key.
This method implements the dragon-book algorithm (aho, ullman).
-
hash_fnv1a
-
return an integer useful as a hash-key.
This method uses the fnv-1a algorithm
(which is actually a pretty good one).
Notice: this returns a 31bit value,
even on 64bit CPUs, only small 4-byte hashvalues are returned,
(so hash values are independent from the architecture)
usage example(s):
-
hash_fnv1a_64
-
return an integer useful as a hash-key.
This method uses the fnv-1a algorithm
(which is actually a pretty good one).
Notice: this returns 64 bit hashvalues
usage example(s):
'' hash_fnv1a_64
'a' hash_fnv1a_64
'77kepQFQ8Kl' hash_fnv1a_64
|
-
hash_java
-
return an integer useful as a hash-key.
This method uses the same algorithm as used in
the java virtual machine (which is actually a bad one).
usage example(s):
-
hash_sdbm
-
return an integer useful as a hash-key.
This method implements the sdbm algorithm.
-
levenshteinTo: aString s: substWeight k: kbdTypoWeight c: caseWeight i: insrtWeight d: deleteWeight
-
parametrized levenshtein. arguments are the costs for
substitution, case-change, insertion and deletion of a character.
usage example(s):
'ocmprt' levenshteinTo:'computer'
'computer' levenshteinTo:'computer'
'ocmputer' levenshteinTo:'computer'
'cmputer' levenshteinTo:'computer'
'computer' levenshteinTo:'cmputer'
'computer' levenshteinTo:'vomputer'
'computer' levenshteinTo:'bomputer'
'Computer' levenshteinTo:'computer'
|
-
sameAs: aString
-
Compare the receiver with the argument like =, but ignore case differences.
Return true or false.
usage example(s):
'hello' sameAs:'hello'
'hello' sameAs:'Hello'
'hello' sameAs:''
'' sameAs:'Hello'
'hello' sameAs:'hellO'
'hello' sameAs:'Hellx'
|
-
startsWith: aStringOrChar
-
return true, if the receiver starts with something, aStringOrChar.
If the argument is empty, true is returned.
Notice, that this is similar to, but slightly different from VW's and Squeak's beginsWith:,
which are both inconsistent w.r.t. an empty argument.
usage example(s):
'hello world' startsWith:'hello'
'hello world' startsWith:'hella'
'hello world' startsWith:'hi'
'hello world' startsWith:$h
'hello world' startsWith:$H
'hello world' startsWith:(Character value:16rFF00)
'hello world' startsWith:60
'hello world' startsWith:#($h $e $l)
'hello world' startsWith:''
|
-
~= aString
-
Compare the receiver with the argument and return true if the
receiver is not equal to the argument. Otherwise return false.
This compare is case-sensitive (i.e. 'Foo' is NOT = 'foo').
Actually, there is no need to redefine that method here,
the default (= not as inherited) works ok.
However, this may be heavily used and the redefinition saves an
extra message send.
converting
-
asAsciiZ
-
if the receiver does not end with a 0-valued character, return a copy of it,
with an additional 0-character. Otherwise return the receiver. This is sometimes
needed when a string has to be passed to C, which needs 0-terminated strings.
Notice, that all singleByte strings are already 0-terminated in ST/X, whereas wide
strings are not.
usage example(s):
'abc' asAsciiZ
'abc' asWideString asAsciiZ
|
-
asByteArray
-
return a new ByteArray with the receiver's elements.
This redefined method is faster than Collection>>#asByteArray
usage example(s):
-
asDenseUnicodeString
-
return the receiver as single-byte, double byte or 4-byte unicode string,
depending on the number of bits required to hold all characters in myself.
Use this to extract non-wide parts from a wide string,
i.e. after a substring has been copied out of a wide string
-
asExternalBytes
-
return a 0-terminated externalBytes collection containing
my characters.
The returned collection is save from being garbage collected;
i.t. it can be handed to a C-function, and must
(either there or here) be freed explicitly or unprotectedFromGC
-
asExternalBytesUnprotected
-
Like asExternalBytes, but does not register the bytes so
bytes are GARBAGE-COLLECTED!
-
asHttpResponseTo: request ( an extension from the stx:goodies/webServer/comanche package )
-
-
asImmutableCollection
-
return a write-protected copy of myself
-
asImmutableString
-
return a write-protected copy of myself
-
asLowercase
-
a tuned version for Strings with size < 255. Some apps call this very heavily.
We can do this for 8-bit strings, since the mapping is well known and lowercase chars
fit in one byte also.
usage example(s):
'Hello WORLD' asLowercase
(String new:300) asLowercase
#utf8 asLowercase
|
-
asSingleByteString
-
I am a string
-
asSingleByteStringIfPossible
-
I am a single-byte string
-
asSingleByteStringReplaceInvalidWith: replacementCharacter
-
return the receiver converted to a 'normal' string,
with invalid characters replaced by replacementCharacter.
Can be used to convert from 16-bit strings to 8-bit strings
and replace characters above code-255 with some replacement.
Dummy here, because I am already a single byte string.
-
asSymbol
-
Return a unique symbol with the name taken from the receiver's characters.
usage example(s):
-
asSymbolIfInterned
-
If a symbol with the receiver's characters is already known, return it. Otherwise, return nil.
This can be used to query for an existing symbol and is the same as:
self knownAsSymbol ifTrue:[self asSymbol] ifFalse:[nil]
but slightly faster, since the symbol lookup operation is only
performed once.
usage example(s):
'hello' asSymbolIfInterned
'fooBarBaz' asSymbolIfInterned
|
-
beImmutable
-
make myself write-protected
-
utf16Encoded
-
UTF-16 encoding is the same as UCS-2 (Unicode16String)
-
withTabsExpanded: numSpaces
-
return a string with the characters of the receiver where all tabulator characters
are expanded into spaces (assuming numSpaces-col tabs).
Notice: if the receiver does not contain any tabs, it is returned unchanged;
otherwise a new string is returned.
This does handle multiline strings.
Rewritten for speed - because this is very heavily used when reading
big files in the FileBrowser (and therefore speeds up fileReading considerably).
copying
-
, aStringOrCharacter
-
return the concatenation of myself and the argument, aStringOrCharacter as a String.
- reimplemented here for speed
usage example(s):
'hello' , ' world' asImmutableString
'hello ' , #world
'hello ' , $w
#[0 0 0 1] asString, #[0 0 0 2 0] asString
|
-
concatenate: string1 and: string2
-
return the concatenation of myself and the arguments, string1 and string2.
This is equivalent to self , string1 , string2
- generated by compiler when such a construct is detected
-
concatenate: string1 and: string2 and: string3
-
return the concatenation of myself and the string arguments.
This is equivalent to self , string1 , string2 , string3
- generated by compiler when such a construct is detected
-
copy
-
return a copy of the receiver
-
copyFrom: start
-
return a new collection consisting of receiver's elements from startIndex to the end of the collection.
This method will always return a string, even if the receiver
is a subclass-instance. This might change if there is a need.
- reimplemented here for speed
usage example(s):
'12345' copyFrom:3
'12345678' copyFrom:9 -> empty string
'12345678' copyFrom:0 -> error
|
-
copyFrom: start to: stop
-
return the substring starting at index start, anInteger and ending
at stop, anInteger. This method will always return a string, even
if the receiver is a subclass-instance. This might change if there is a need.
- reimplemented here for speed
usage example(s):
'12345678' copyFrom:3 to:7
'12345678' copyFrom:3 to:3
'12345678' copyFrom:3 to:2 -> empty string
'12345678' copyFrom:9 to:9 -> error
'12345678' copyFrom:3 to:9 -> error
'12345678' copyFrom:0 to:8 -> error
(Unicode16String with:(Character value:16r220) with:$a with:$b with:(Character value:16r221) with:(Character value:16r222))
copyFrom:2 to:3
((Unicode16String with:(Character value:16r220) with:$a with:$b with:(Character value:16r221) with:(Character value:16r222))
copyFrom:2 to:3) asSingleByteString
|
-
copyWith: aCharacter
-
return a new string containing the receiver's characters
and the single new character, aCharacter.
This is different from concatentation, which expects another string
as argument, but equivalent to copy-and-addLast.
Reimplemented here for more speed
usage example(s):
'1234567' copyWith:$8
'1234567' copyWith:(Character value:16r220)
|
-
deepCopy
-
return a copy of the receiver
usage example(s):
could be an instance of a subclass which needs deepCopy
of its named instvars ...
|
-
deepCopyUsing: aDictionary postCopySelector: postCopySelector
-
return a deep copy of the receiver - reimplemented to be a bit faster
-
shallowCopy
-
return a copy of the receiver
-
simpleDeepCopy
-
return a copy of the receiver
filling & replacing
-
atAllPut: aCharacter
-
replace all elements with aCharacter
- reimplemented here for speed
usage example(s):
(String new:10) atAllPut:$*
String new:10 withAll:$*
|
-
from: start to: stop put: aCharacter
-
fill part of the receiver with aCharacter.
- reimplemented here for speed
usage example(s):
(String new:10) from:1 to:10 put:$a
(String new:20) from:10 to:20 put:$b
(String new:20) from:1 to:10 put:$c
(String new:20) from:1 to:10 put:$c
(String new:100) from:2 to:99 put:$c
|
-
replaceAll: oldCharacter with: newCharacter
-
replace all oldCharacters by newCharacter in the receiver.
Notice: This operation modifies the receiver, NOT a copy;
therefore the change may affect all others referencing the receiver.
usage example(s):
'helloWorld' copy replaceAll:$o with:$O
'helloWorld' copy replaceAll:$d with:$*
'helloWorld' copy replaceAll:$h with:$*
|
-
replaceFrom: start to: stop with: aString startingAt: repStart
-
replace the characters starting at index start, anInteger and ending
at stop, anInteger with characters from aString starting at repStart.
Return the receiver.
- reimplemented here for speed
-
withoutSeparators
-
return a string containing the chars of myself
without leading and trailing whitespace.
If there is no whitespace, the receiver is returned.
Notice, this is different from String>>withoutSpaces.
usage example(s):
'hello' withoutSeparators
' hello' withoutSeparators
' hello ' withoutSeparators
' hello ' withoutSeparators
' hello ' withoutSeparators
' hello ' withoutSeparators
' ' withoutSeparators
|
-
withoutSpaces
-
return a string containing the characters of myself
without leading and trailing spaces.
If there are no spaces, the receiver is returned unchanged.
Notice, this is different from String>>withoutSeparators.
usage example(s):
' hello' withoutSpaces
' hello ' withoutSpaces
' hello ' withoutSpaces
' hello ' withoutSpaces
' hello ' withoutSpaces
' ' withoutSpaces
|
printing & storing
-
_errorPrint
-
Do not use this in user code.
Print the receiver on standard error.
This method does NOT (by purpose) use the stream classes and
will therefore work even in case of emergency during early startup
or in a crash situation (MiniDebugger).
-
_errorPrintCR
-
Do not use this in user code.
Print the receiver on standard error.
This method does NOT (by purpose) use the stream classes and
will therefore work even in case of emergency during early startup
or in a crash situation (MiniDebugger).
-
_print
-
Do not use this in user code.
Print the receiver on standard output.
This method does NOT (by purpose) use the stream classes and
will therefore work even in case of emergency during early startup
or in a crash situation (MiniDebugger).
-
_printCR
-
Do not use this in user code.
Print the receiver on standard output.
This method does NOT (by purpose) use the stream classes and
will therefore work even in case of emergency during early startup
or in a crash situation (MiniDebugger).
-
errorPrint
-
print the receiver on standard error, if the global Stderr is nil;
otherwise, fall back to the inherited errorPrint, which sends the string to
the Stderr stream or to a logger.
Redefined to be able to print during early startup,
when the stream classes have not yet been initialized (i.e. Stderr is nil).
usage example(s):
'hello world' asUnicode16String errorPrint
(Character value:356) asString errorPrint
'Bönnigheim' errorPrint
'Bönnigheim' asUnicodeString errorPrint
|
-
errorPrintCR
-
print the receiver on standard error, followed by a cr,
if the global Stderr is nil; otherwise, fall back to the inherited errorPrintCR,
which sends the string to the Stderr stream or to a logger.
Redefined to be able to print during early startup,
when the stream classes have not yet been initialized (i.e. Stderr is nil).
-
lowLevelErrorPrint
-
Do not use this in user code.
Print the receiver on standard error.
This method does NOT (by purpose) use the stream classes and
will therefore work even in case of emergency during early startup
or in a crash situation (MiniDebugger).
-
lowLevelErrorPrintCR
-
Do not use this in user code.
Print the receiver on standard error.
This method does NOT (by purpose) use the stream classes and
will therefore work even in case of emergency during early startup
or in a crash situation (MiniDebugger).
-
lowLevelPrint
-
Do not use this in user code.
Print the receiver on standard output.
This method does NOT (by purpose) use the stream classes and
will therefore work even in case of emergency during early startup
or in a crash situation (MiniDebugger).
-
lowLevelPrintCR
-
Do not use this in user code.
Print the receiver on standard output.
This method does NOT (by purpose) use the stream classes and
will therefore work even in case of emergency during early startup
or in a crash situation (MiniDebugger).
-
print
-
print the receiver on standard output, if the global Stdout is nil;
otherwise, fall back to the inherited print,
which sends the string to the Stdout stream.
Redefined to be able to print during early startup,
when the stream classes have not yet been initialized (i.e. Stdout is nil).
-
printCR
-
print the receiver on standard output, followed by a cr,
if the global Stdout is nil; otherwise, fall back to the inherited errorPrintCR,
which sends the string to the Stdout stream.
Redefined to be able to print during early startup,
when the stream classes have not yet been initialized (i.e. Stdout is nil).
-
printfPrintString: formatString
-
non-standard but sometimes useful.
Return a printed representation of the receiver as specified by formatString,
which is defined by printf.
If you use this, be aware, that the format string must be correct and something like %s.
This method is NONSTANDARD and may be removed without notice.
WARNNG: this goes directly to the C-printf function and may therefore be inherently unsafe.
Please use the printf: method, which is both safe
and completely implemented in Smalltalk.
usage example(s):
'hello' printfPrintString:'%%s -> %s'
(String new:900) printfPrintString:'%%s -> %s'
'hello' printfPrintString:'%%10s -> %10s'
'hello' printfPrintString:'%%-10s -> %-10s'
'hello' printfPrintString:'%%900s -> %900s'
'hello' printfPrintString:'%%-900s -> %-900s'
|
-
storeOn: aStream
-
put the storeString of myself onto a aStream
-
storeString
-
return a String for storing myself
queries
-
basicSize
-
return the number of characters in myself.
Redefined here to exclude the 0-byte at the end.
-
bitsPerCharacter
-
return the number of bits each character has.
Here, 8 is returned (storing single byte characters).
-
bytesPerCharacter
-
return the number of bytes each character has.
Here, 1 is returned (storing single byte characters).
-
bytesPerCharacterNeeded
-
return the actual underlying string's required bytesPerCharacter
(i.e. checks if all characters really need that depth)
-
characterSize
-
answer the size in bits of my largest character (actually only 7 or 8)
usage example(s):
'hello world' characterSize
'hello world' asUnicode16String characterSize
('hello world' , (Character value:16r88) asString) characterSize
|
-
containsNon7BitAscii
-
return true, if the underlying string contains 8BitCharacters (or widers)
(i.e. if it is non-ascii)
usage example(s):
'hello world' containsNon7BitAscii
'hello world' asTwoByteString containsNon7BitAscii
('hello world' , (Character value:16r88) asString) containsNon7BitAscii
|
-
containsNon8BitElements
-
return true, if the underlying string contains elements larger than a single byte
-
isBlank
-
return true, if the receiver's size is 0 or if it contains only spaces.
Q: should we care for whiteSpace in general here ?
-
isEmpty
-
return true if the receiver is empty (i.e. if size == 0)
Redefined here for performance
-
isWideString
-
true if I require more than one byte per character
-
knownAsSymbol
-
return true, if there is a symbol with same characters in the
system.
Can be used to check for existance of a symbol without creating one
usage example(s):
'hello' knownAsSymbol
'fooBarBaz' knownAsSymbol
|
-
notEmpty
-
return true if the receiver is not empty (i.e. if size ~~ 0)
Redefined here for performance
-
size
-
return the number of characters in myself.
Reimplemented here to avoid the additional size->basicSize send
(which we can do here, since size is obviously not redefined in a subclass).
This method is the same as basicSize.
-
stringSpecies
-
-
utf8DecodedMaxBytes
-
return the number of characters needed when this string is
decoded from UTF-8.
** This is an obsolete interface - do not use it (it may vanish in future versions) **
-
utf8DecodedSize
-
return the number of characters needed when this string is
decoded from UTF-8.
usage example(s):
'hello world' utf8DecodedSize
'ä' utf8Encoded utf8DecodedSize
'äΣΔΨӕἤῴ' utf8Encoded utf8DecodedSize
|
sorting & reordering
-
reverseFrom: startIndex to: endIndex
-
in-place reverse the characters of the string.
WARNING: this is a destructive operation, which modifies the receiver.
Please use reversed (with a d) for a functional version.
usage example(s):
'1234567890' copy reverseFrom:2 to:5
'1234567890' copy reverse
'1234567890' copy reversed
|t|
t := '1234567890abcdefghijk' copy.
t reverseFrom:1 to:10.
t reverseFrom:11 to:t size.
t reverseFrom:1 to:t size.
t
|t|
t := '1234567890abcdefghijk' copy.
t reverseFrom:1 to:2.
t reverseFrom:3 to:t size.
t reverseFrom:1 to:t size.
t
|
substring searching
-
caseInsensitiveIndexOfSubCollection: aSubString startingAt: startIndex ifAbsent: exceptionValue
-
naive search fallback (non-BM).
Private method to speed up caseInSensitive searches
usage example(s):
'abcdefg' caseInsensitiveIndexOfSubCollection:'abc' startingAt:1 ifAbsent:nil
'abcdefg' caseInsensitiveIndexOfSubCollection:'bcd' startingAt:1 ifAbsent:nil
'abcdefg' caseInsensitiveIndexOfSubCollection:'cde' startingAt:1 ifAbsent:nil
'abcabcg' caseInsensitiveIndexOfSubCollection:'abc' startingAt:2 ifAbsent:nil
'ABCDEFG' caseInsensitiveIndexOfSubCollection:'abc' startingAt:1 ifAbsent:nil
'ABCDEFG' caseInsensitiveIndexOfSubCollection:'Abc' startingAt:1 ifAbsent:nil
'ABCDEFG' caseInsensitiveIndexOfSubCollection:'aBC' startingAt:1 ifAbsent:nil
'ABCDEFG' caseInsensitiveIndexOfSubCollection:'ABC' startingAt:1 ifAbsent:nil
'ABCDEFG' caseInsensitiveIndexOfSubCollection:'a' startingAt:1 ifAbsent:nil
'ABCDEFG' caseInsensitiveIndexOfSubCollection:'A' startingAt:1 ifAbsent:nil
'ABCDEFG' caseInsensitiveIndexOfSubCollection:'bcd' startingAt:1 ifAbsent:nil
'ABCDEFG' caseInsensitiveIndexOfSubCollection:'cde' startingAt:1 ifAbsent:nil
'ABCABCG' caseInsensitiveIndexOfSubCollection:'abc' startingAt:2 ifAbsent:nil
'1234567890' caseInsensitiveIndexOfSubCollection:'abc' startingAt:1 ifAbsent:nil
'1234567890' caseInsensitiveIndexOfSubCollection:'123' startingAt:1 ifAbsent:nil
|
-
indexOfSubCollection: aSubString startingAt: startIndex ifAbsent: exceptionValue caseSensitive: caseSensitive
-
redefined as primitive for maximum speed (BM).
Compared to the strstr libc function, on my machine,
BM is faster for caseSensitive compares above around 8.5 searched characters.
For much longer searched strings, BM is much faster; 5times as fast for 20chars.
For caseInsensitive compares, strstr was found to be slower than caseInsensitiveIndexOf.
testing
-
isLiteral
-
return true, if the receiver can be used as a literal constant in ST syntax
(i.e. can be used in constant arrays)
-
isSingleByteString
-
returns true only for strings and immutable strings.
Must replace foo isMemberOf:String and foo class == String
tracing
-
traceInto: aRequestor level: level from: referrer
-
double dispatch into tracer, passing my type implicitely in the selector
|