|
Class: HTMLUtilities
Object
|
+--HTMLUtilities
- Package:
- stx:libbasic2
- Category:
- Net-Communication-Support
- Version:
- rev:
1.69
date: 2024/04/22 17:41:45
- user: stefan
- file: HTMLUtilities.st directory: libbasic2
- module: stx stc-classLibrary: libbasic2
Collected support functions to deal with HTML.
Used both by HTML generators (DocGenerator), HTMLParsers and the webServer.
Therefore, it has been put into libbasic2.
copyrightCOPYRIGHT (c) 2007 by eXept Software AG
All Rights Reserved
This software is furnished under a license and may be used
only in accordance with the terms of that license and with the
inclusion of the above copyright notice. This software may not
be provided or otherwise made available to, or used by, any
other person. No title to or ownership of the software is
hereby transferred.
common actions
-
openLauncherOnDisplay: displayName
-
obsolete - do not use
** This is an obsolete interface - do not use it (it may vanish in future versions) **
constants
-
ampersandEscapes
-
AmpersandEscapes := nil.
self ampersandEscapes at:#nbsp
self ampersandEscapes at:#ordf
-
htmlEntityToCharacter
-
-
mathAmpersandEscapes
-
these are obsolete now, as HTML4 added the missing stuff in the meantime.
helpers
-
characterFromHtmlEntityNamed: anHtmlEntityName
-
where to get the mapping???
-
combine: previousChar withDiacriticalMark: markCharacter
-
in HTML, you may write à to combine the a with a diacritical mark.
This combines a mark with some previous character and returns a string.
Incomplete; only the most common one's are defined here; maybe someone completes it.
(see https://en.wikipedia.org/wiki/Combining_Diacritical_Marks)
Usage example(s):
self combine:$a withDiacriticalMark:(Character value:0x300).
self combine:$A withDiacriticalMark:(Character value:0x300).
self combine:$A withDiacriticalMark:(Character value:0x308).
self combine:$u withDiacriticalMark:(Character value:0x308).
self combine:$1 withDiacriticalMark:(Character value:0x308).
|
-
controlCharacters
-
Modified (comment): / 06-05-2015 / 16:17:31 / sr
-
copyReplaceCharactersWithHtmlEntitiesIn: aString
-
-
escapeCharacterEntities: aString
-
helper to escape invalid/dangerous characters in html strings.
These are:
control characters,
characters above 0x7F
'<', '&' and space -> %XX ascii as hex digits
% -> %%
Usage example(s):
self escapeCharacterEntities:'a 'a<b'
self escapeCharacterEntities:'aöb' => 'aöb'
|
-
escapeCharacterEntities: aString andControlCharacters: controlCharacters
-
helper to escape invalid/dangerous characters in html strings.
These are:
control characters,
characters above 0x7F
'<', '>', '&' and space -> %XX ascii as hex digits
% -> %%
Usage example(s):
self escapeCharacterEntities:'a
|
-
escapeCharacterEntities: aString andControlCharacters: controlCharacters on: aWriteStream
-
helper to escape invalid/dangerous characters in html strings.
These are:
control characters,
characters above 0x7F,
'<', '>', '&' and space -> %XX ascii as hex digits
% -> %%
Usage example(s):
self escapeCharacterEntities:'a
|
-
escapeCharacterEntities: aString on: aStream
-
helper to escape invalid/dangerous characters in html strings.
These are:
control characters, '<', '&' and space -> %XX ascii as hex digits
% -> %%
Usage example(s):
self escapeCharacterEntities:'a
|
-
extractCharSetEncodingFromContentType: contentTypeLine
-
self extractCharSetEncodingFromContentType:'text/html; charset=ascii'
self extractCharSetEncodingFromContentType:'text/html; charset='
self extractCharSetEncodingFromContentType:'text/html; fooBar=bla'
self extractCharSetEncodingFromContentType:'text/xml; charset=utf-8'
self extractCharSetEncodingFromContentType:'text/xml; charset=utf-8; bla=fasel'
-
extractMimeTypeFromContentType: contentTypeLine
-
self extractMimeTypeFromContentType:'text/html; charset=ascii'
self extractMimeTypeFromContentType:'text/html; '
self extractMimeTypeFromContentType:'text/html'
self extractMimeTypeFromContentType:'text/xml; charset=utf-8'
-
htmlEntityForCharacter: aCharacter
-
-
unEscape: aString
-
Convert escaped characters in an url's arguments or post fields back to their proper characters.
Undoes the effect of #urlEncoded: and #urlEncoded2:.
These are:
+ -> space
%XX ascii as hex digits
%uXXXX unicode as hex digits NOTE: %u is non-standard bit implemented in MS IIS
%% -> %
Usage example(s):
self unEscape:'a%20b'
self unEscape:'a%%b'
self unEscape:'a+b'
self unEscape:'a%+b'
self unEscape:'a%'
self unEscape:'a%2'
self unEscape:'/Home/a%C3%A4%C3%B6%C3%BCa'
|
-
unescapeCharacterEntities: aString
-
helper to unescape character entities in a string.
Normally, this is done by the HTMLParser when it scans text,
but seems to be also used in post-data fields which contain non-ascii characters
(for example: the login postdata of expeccALM).
Sequences are:
&<specialName>;
&#<decimal>;
&#x<hex>
From Reference:
http://wiki.selfhtml.org/wiki/Referenz:HTML/Zeichenreferenz#HTML-eigene_Zeichen
Usage example(s):
self unescapeCharacterEntities:'&;'
self unescapeCharacterEntities:'&16368;'
self unescapeCharacterEntities:'&16368;&16368'
self unescapeCharacterEntities:'&16368;<'
self unescapeCharacterEntities:'&16368;<'
self unescapeCharacterEntities:'Ϩ'
self unescapeCharacterEntities:'က'
self unescapeCharacterEntities:'꿾'
self unescapeCharacterEntities:'"<foo'
self unescapeCharacterEntities:'&funny;<foo'
|
-
urlDecoded: aString
-
Convert escaped characters in an urls arguments or post fields back to their proper characters.
Undoes the effect of #urlEncoded: and #urlEncoded2:.
These are:
+ -> space
%XX ascii as hex digits
%uXXXX unicode as hex digits NOTE: %u is non-standard bit implemented in MS IIS
%% -> %
Usage example(s):
self urlDecoded:'a%20b'
self urlDecoded:'a%%b'
self urlDecoded:'a+b'
self urlDecoded:'a%+b'
self urlDecoded:'a%'
self urlDecoded:'a%2'
self urlDecoded:'/Home/a%C3%A4%C3%B6%C3%BCa'
|
-
urlEncode2: aStringOrStream on: ws
-
helper to escape invalid/dangerous characters in an urls arguments.
Similar to urlEncode, but treats '*','~' and spaces differently.
(some clients, such as bitTorrent seem to require this - time will tell...)
Any byte not in the set 0-9, a-z, A-Z, '.', '-', '_', is encoded using
the '%nn' format, where nn is the hexadecimal value of the byte.
see: RFC1738
** This is an obsolete interface - do not use it (it may vanish in future versions) **
-
urlEncode: aStringOrStream on: ws
-
helper to escape invalid/dangerous characters in an url's argument or post-fields.
Any byte not in the set 0-9, a-z, A-Z, '.', '-', '_' and '~',
is encoded using the '%nn' format, where nn is the hexadecimal value of the byte.
Characters outside the ASCII range are encoded into utf8 first.
Spaces are encoded as '+'.
see: application/x-www-form-urlencoded
see: https://tools.ietf.org/html/rfc3986 (obsoletes RFC1738)
-
urlEncoded2: aString
-
helper to escape invalid/dangerous characters in an urls arguments or post-fields.
Similar to urlEncoded, but treats '*','~' and spaces differently.
(some clients, such as bitTorrent seem to require this - time will tell...)
Any byte not in the set 0-9, a-z, A-Z, '.', '-', '_' and '~', is encoded using
the '%nn' format, where nn is the hexadecimal value of the byte.
see: application/x-www-form-urlencoded
see: RFC1738
** This is an obsolete interface - do not use it (it may vanish in future versions) **
-
urlEncoded: aString
-
helper to escape invalid/dangerous characters in an urls arguments or post-fields.
Any byte not in the set 0-9, a-z, A-Z, '.', '-', '_' and '~', is encoded using
the '%nn' format, where nn is the hexadecimal value of the byte.
Characters outside the ASCII range are encoded into utf8 first.
Spaces are encoded as '+'.
see: application/x-www-form-urlencoded
see: https://tools.ietf.org/html/rfc3986 (obsoletes RFC1738)
Usage example(s):
self unEscape:(self urlEncoded:'_-.*Frankfurt(Main) Hbf')
self urlEncoded:'_-.*Frankfurt(Main) Hbf'
self unEscape:(self urlEncoded:'-_.*%exept;')
self urlEncoded:'-_.*%exept;'
self urlEncoded:'Не только в сервере, но и в ComSpec, чтобы дочерние КОНСОЛЬНЫЕ процессы могли пользоваться редиректами'
|
-
withAllSpecialHTMLCharactersEscaped: aStringOrCharacter
-
replace ampersand, less, greater and quotes by html-character escapes.
This DOES escape quote and doubleQuote characters.
Usage example(s):
self withAllSpecialHTMLCharactersEscaped:'<>#&'
self withAllSpecialHTMLCharactersEscaped:$<
self withAllSpecialHTMLCharactersEscaped:$#
|
-
withSpecialHTMLCharactersEscaped: aStringOrCharacter
-
replace ampersand, less and greater by html-character escapes.
Does NOT escape percent and control characters.
Does NOT escape quote and doubleQuote characters.
Usage example(s):
self withSpecialHTMLCharactersEscaped:'<>#&'
self withSpecialHTMLCharactersEscaped:$<
self withSpecialHTMLCharactersEscaped:$#
|
queries
-
isUtilityClass
-
(comment from inherited method)
a utility class is one which is not to be instantiated,
but only provides a number of utility functions on the class side.
It is usually also abstract
serving-helpers
-
escape: aString
-
helper to escape invalid/dangerous characters in an url's arguments or post-fields.
These are:
control characters, dQuote, '+', ';', '?', '&' and space -> %XX ascii as hex digits
% -> %%
Usage example(s):
self escape:'a b'
self escape:'a%b'
self escape:'a b'
self escape:'a+b'
self escape:'aäüöb'
|
text processing helpers
-
convertFromMarkDown: markDownString
-
given some MarkDown (Wiki), convert to html.
-
convertFromMarkDown: markDownString bodyTag: writeBodyTag
-
given some MarkDown, convert to html.
Usage example(s):
|mdString|
mdString := '
# To Do
## At Home
* Wash dishes
* Install winter tires
## At Work
* Finish Report
* Book Team **101** meeting'.
self convertFromMarkDown:mdString bodyTag:true.
|
-
convertFromWikiStyle: wikiStyleString
-
given some wiki text, convert to html.
-
convertFromWikiStyle: wikiStyleString bodyTag: writeBodyTag
-
given some wiki text, convert to html.
Usage example(s):
|wikiString|
wikiString := '== headline2
=== headline3
=== headline3b ===
* bullet1
* bullet2
line1
line2
line3
line4
line5
-'.
self convertFromWikiStyle:wikiString bodyTag:true.
|
-
plainTextOfHTML: htmlString
-
given some HTML, extract the raw text.
Can be used to search for strings in some html text.
Usage example(s):
self plainTextOfHTML:'
bla1 bla2 bla3 bla5bla6'
self plainTextOfHTML:'Hello World'
self plainTextOfHTML:nil
|
|