|
Class: HTMLUtilities
Object
|
+--HTMLUtilities
- Package:
- stx:libbasic2
- Category:
- Net-Communication-Support
- Version:
- rev:
1.42
date: 2019/07/26 13:30:30
- user: stefan
- file: HTMLUtilities.st directory: libbasic2
- module: stx stc-classLibrary: libbasic2
Collected support functions to deal with HTML.
Used both by HTML generators (DocGenerator), HTMLParsers and the webServer.
Therefore, it has been put into libbasic2.
common actions
-
openLauncherOnDisplay: displayName
-
obsolete - do not use
** This is an obsolete interface - do not use it (it may vanish in future versions) **
constants
-
ampersandEscapes
-
non-breakable space - do something magic...
-
htmlEntityToCharacter
-
-
mathAmpersandEscapes
-
these are obsolete now, as HTML4 added the missing stuff in the meantime.
helpers
-
characterFromHtmlEntityNamed: anHtmlEntityName
-
where to get the mapping???
-
controlCharacters
-
EscapeControlCharacters at:$' put:'''.
-
copyReplaceCharactersWithHtmlEntitiesIn: aString
-
-
escapeCharacterEntities: aString
-
helper to escape invalid/dangerous characters in html strings.
These are:
control characters, '<', '&' and space -> %XX ascii as hex digits
% -> %%
usage example(s):
self escapeCharacterEntities:'a
|
-
escapeCharacterEntities: aString andControlCharacters: controlCharacters
-
helper to escape invalid/dangerous characters in html strings.
These are:
control characters, '<', '>', '&' and space -> %XX ascii as hex digits
% -> %%
usage example(s):
self escapeCharacterEntities:'a
|
-
escapeCharacterEntities: aString andControlCharacters: controlCharacters on: aWriteStream
-
helper to escape invalid/dangerous characters in html strings.
These are:
control characters, '<', '>', '&' and space -> %XX ascii as hex digits
% -> %%
usage example(s):
self escapeCharacterEntities:'a
|
-
escapeCharacterEntities: aString on: aStream
-
helper to escape invalid/dangerous characters in html strings.
These are:
control characters, '<', '&' and space -> %XX ascii as hex digits
% -> %%
usage example(s):
self escapeCharacterEntities:'a
|
-
extractCharSetEncodingFromContentType: contentTypeLine
-
self extractCharSetEncodingFromContentType:'text/html; charset=ascii'
self extractCharSetEncodingFromContentType:'text/html; charset='
self extractCharSetEncodingFromContentType:'text/html; fooBar=bla'
self extractCharSetEncodingFromContentType:'text/xml; charset=utf-8'
self extractCharSetEncodingFromContentType:'text/xml; charset=utf-8; bla=fasel'
-
extractMimeTypeFromContentType: contentTypeLine
-
self extractMimeTypeFromContentType:'text/html; charset=ascii'
self extractMimeTypeFromContentType:'text/html; '
self extractMimeTypeFromContentType:'text/html'
self extractMimeTypeFromContentType:'text/xml; charset=utf-8'
-
htmlEntityForCharacter: aCharacter
-
-
unEscape: aString
-
Convert escaped characters in an urls arguments or post fields back to their proper characters.
Undoes the effect of #urlEncoded: and #urlEncoded2:.
These are:
+ -> space
%XX ascii as hex digits
%uXXXX unicode as hex digits NOTE: %u is non-standard bit implemented in MS IIS
%% -> %
usage example(s):
self unEscape:'a%20b'
self unEscape:'a%%b'
self unEscape:'a+b'
self unEscape:'a%+b'
self unEscape:'a%'
self unEscape:'a%2'
self unEscape:'/Home/a%C3%A4%C3%B6%C3%BCa'
|
-
unescapeCharacterEntities: aString
-
helper to unescape character entities in a string.
Normally, this is done by the HTMLParser when it scans text,
but seems to be also used in post-data fields which contain non-ascii characters
(for example: the login postdata of expeccALM).
Sequences are:
&<specialName>;
&#<decimal>;
&#x<hex>
From Reference:
http://wiki.selfhtml.org/wiki/Referenz:HTML/Zeichenreferenz#HTML-eigene_Zeichen
usage example(s):
self unescapeCharacterEntities:'&;'
self unescapeCharacterEntities:'&16368;'
self unescapeCharacterEntities:'&16368;&16368'
self unescapeCharacterEntities:'&16368;<'
self unescapeCharacterEntities:'&16368;<'
self unescapeCharacterEntities:'꿾'
self unescapeCharacterEntities:'"<foo'
self unescapeCharacterEntities:'&funny;<foo'
|
-
urlDecoded: aString
-
Convert escaped characters in an urls arguments or post fields back to their proper characters.
Undoes the effect of #urlEncoded: and #urlEncoded2:.
These are:
+ -> space
%XX ascii as hex digits
%uXXXX unicode as hex digits NOTE: %u is non-standard bit implemented in MS IIS
%% -> %
usage example(s):
self urlDecoded:'a%20b'
self urlDecoded:'a%%b'
self urlDecoded:'a+b'
self urlDecoded:'a%+b'
self urlDecoded:'a%'
self urlDecoded:'a%2'
self urlDecoded:'/Home/a%C3%A4%C3%B6%C3%BCa'
|
-
urlEncode2: aStringOrStream on: ws
-
helper to escape invalid/dangerous characters in an urls arguments.
Similar to urlEncode, but treats '*','~' and spaces differently.
(some clients, such as bitTorrent seem to require this - time will tell...)
Any byte not in the set 0-9, a-z, A-Z, '.', '-', '_', is encoded using
the '%nn' format, where nn is the hexadecimal value of the byte.
see: RFC1738
** This is an obsolete interface - do not use it (it may vanish in future versions) **
-
urlEncode: aStringOrStream on: ws
-
helper to escape invalid/dangerous characters in an urlÄs argument or post-fields.
Any byte not in the set 0-9, a-z, A-Z, '.', '-', '_' and '~',
is encoded using the '%nn' format, where nn is the hexadecimal value of the byte.
Characters outside the ASCII range are encoded into utf8 first.
Spaces are encoded as '+'.
see: application/x-www-form-urlencoded
see: https://tools.ietf.org/html/rfc3986 (obsoletes RFC1738)
-
urlEncoded2: aString
-
helper to escape invalid/dangerous characters in an urls arguments or post-fields.
Similar to urlEncoded, but treats '*','~' and spaces differently.
(some clients, such as bitTorrent seem to require this - time will tell...)
Any byte not in the set 0-9, a-z, A-Z, '.', '-', '_' and '~', is encoded using
the '%nn' format, where nn is the hexadecimal value of the byte.
see: application/x-www-form-urlencoded
see: RFC1738
** This is an obsolete interface - do not use it (it may vanish in future versions) **
-
urlEncoded: aString
-
helper to escape invalid/dangerous characters in an urls arguments or post-fields.
Any byte not in the set 0-9, a-z, A-Z, '.', '-', '_' and '~', is encoded using
the '%nn' format, where nn is the hexadecimal value of the byte.
Characters outside the ASCII range are encoded into utf8 first.
Spaces are encoded as '+'.
see: application/x-www-form-urlencoded
see: https://tools.ietf.org/html/rfc3986 (obsoletes RFC1738)
usage example(s):
self unEscape:(self urlEncoded:'_-.*Frankfurt(Main) Hbf')
self urlEncoded:'_-.*Frankfurt(Main) Hbf'
self unEscape:(self urlEncoded:'-_.*%exept;')
self urlEncoded:'-_.*%exept;'
|
-
withAllSpecialHTMLCharactersEscaped: aStringOrCharacter
-
replace ampersand, less, greater and quotes by html-character escapes
usage example(s):
self withAllSpecialHTMLCharactersEscaped:'<>#&'
self withAllSpecialHTMLCharactersEscaped:$<
self withAllSpecialHTMLCharactersEscaped:$#
|
-
withSpecialHTMLCharactersEscaped: aStringOrCharacter
-
replace ampersand, less and greater by html-character escapes
usage example(s):
self withSpecialHTMLCharactersEscaped:'<>#&'
self withSpecialHTMLCharactersEscaped:$<
self withSpecialHTMLCharactersEscaped:$#
|
queries
-
isUtilityClass
-
serving-helpers
-
escape: aString
-
helper to escape invalid/dangerous characters in an url's arguments or post-fields.
These are:
control characters, dQuote, '+', ';', '?', '&' and space -> %XX ascii as hex digits
% -> %%
usage example(s):
self escape:'a b'
self escape:'a%b'
self escape:'a b'
self escape:'a+b'
self escape:'aäüöb'
|
text processing helpers
-
plainTextOfHTML: htmlString
-
given some HTML, extract the raw text.
Can be used to search for strings in some html text.
usage example(s):
self plainTextOfHTML:'
bla1 bla2 bla3 bla5bla6'
self plainTextOfHTML:'Hello World'
|
|