eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'HTMLUtilities':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: HTMLUtilities


Inheritance:

   Object
   |
   +--HTMLUtilities

Package:
stx:libbasic2
Category:
Net-Communication-Support
Version:
rev: 1.69 date: 2024/04/22 17:41:45
user: stefan
file: HTMLUtilities.st directory: libbasic2
module: stx stc-classLibrary: libbasic2

Description:


Collected support functions to deal with HTML.
Used both by HTML generators (DocGenerator), HTMLParsers and the webServer.
Therefore, it has been put into libbasic2.

copyright

COPYRIGHT (c) 2007 by eXept Software AG All Rights Reserved This software is furnished under a license and may be used only in accordance with the terms of that license and with the inclusion of the above copyright notice. This software may not be provided or otherwise made available to, or used by, any other person. No title to or ownership of the software is hereby transferred.

Class protocol:

common actions
o  openLauncherOnDisplay: displayName
obsolete - do not use

** This is an obsolete interface - do not use it (it may vanish in future versions) **

constants
o  ampersandEscapes
AmpersandEscapes := nil.
self ampersandEscapes at:#nbsp
self ampersandEscapes at:#ordf

o  htmlEntityToCharacter

o  mathAmpersandEscapes
these are obsolete now, as HTML4 added the missing stuff in the meantime.

helpers
o  characterFromHtmlEntityNamed: anHtmlEntityName
where to get the mapping???

o  combine: previousChar withDiacriticalMark: markCharacter
in HTML, you may write à to combine the a with a diacritical mark.
This combines a mark with some previous character and returns a string.
Incomplete; only the most common one's are defined here; maybe someone completes it.
(see https://en.wikipedia.org/wiki/Combining_Diacritical_Marks)

Usage example(s):

     self combine:$a withDiacriticalMark:(Character value:0x300). 
     self combine:$A withDiacriticalMark:(Character value:0x300).
     self combine:$A withDiacriticalMark:(Character value:0x308).
     self combine:$u withDiacriticalMark:(Character value:0x308).
     self combine:$1 withDiacriticalMark:(Character value:0x308).  

o  controlCharacters
Modified (comment): / 06-05-2015 / 16:17:31 / sr

o  copyReplaceCharactersWithHtmlEntitiesIn: aString

o  escapeCharacterEntities: aString
helper to escape invalid/dangerous characters in html strings.
These are:
control characters,
characters above 0x7F
'<', '&' and space -> %XX ascii as hex digits
% -> %%

Usage example(s):

     self escapeCharacterEntities:'a 'a<b'   
     self escapeCharacterEntities:'aöb'  => 'aöb'   

o  escapeCharacterEntities: aString andControlCharacters: controlCharacters
helper to escape invalid/dangerous characters in html strings.
These are:
control characters,
characters above 0x7F
'<', '>', '&' and space -> %XX ascii as hex digits
% -> %%

Usage example(s):

     self escapeCharacterEntities:'a

o  escapeCharacterEntities: aString andControlCharacters: controlCharacters on: aWriteStream
helper to escape invalid/dangerous characters in html strings.
These are:
control characters,
characters above 0x7F,
'<', '>', '&' and space -> %XX ascii as hex digits
% -> %%

Usage example(s):

     self escapeCharacterEntities:'a

o  escapeCharacterEntities: aString on: aStream
helper to escape invalid/dangerous characters in html strings.
These are:
control characters, '<', '&' and space -> %XX ascii as hex digits
% -> %%

Usage example(s):

     self escapeCharacterEntities:'a

o  extractCharSetEncodingFromContentType: contentTypeLine
self extractCharSetEncodingFromContentType:'text/html; charset=ascii'
self extractCharSetEncodingFromContentType:'text/html; charset='
self extractCharSetEncodingFromContentType:'text/html; fooBar=bla'
self extractCharSetEncodingFromContentType:'text/xml; charset=utf-8'
self extractCharSetEncodingFromContentType:'text/xml; charset=utf-8; bla=fasel'

o  extractMimeTypeFromContentType: contentTypeLine
self extractMimeTypeFromContentType:'text/html; charset=ascii'
self extractMimeTypeFromContentType:'text/html; '
self extractMimeTypeFromContentType:'text/html'
self extractMimeTypeFromContentType:'text/xml; charset=utf-8'

o  htmlEntityForCharacter: aCharacter

o  unEscape: aString
Convert escaped characters in an url's arguments or post fields back to their proper characters.
Undoes the effect of #urlEncoded: and #urlEncoded2:.
These are:
+ -> space
%XX ascii as hex digits
%uXXXX unicode as hex digits NOTE: %u is non-standard bit implemented in MS IIS
%% -> %

Usage example(s):

     self unEscape:'a%20b'   
     self unEscape:'a%%b'
     self unEscape:'a+b' 
     self unEscape:'a%+b' 
     self unEscape:'a%' 
     self unEscape:'a%2' 
     self unEscape:'/Home/a%C3%A4%C3%B6%C3%BCa'

o  unescapeCharacterEntities: aString
helper to unescape character entities in a string.
Normally, this is done by the HTMLParser when it scans text,
but seems to be also used in post-data fields which contain non-ascii characters
(for example: the login postdata of expeccALM).

Sequences are:
&<specialName>;
&#<decimal>;
&#x<hex>

From Reference:
http://wiki.selfhtml.org/wiki/Referenz:HTML/Zeichenreferenz#HTML-eigene_Zeichen

Usage example(s):

     self unescapeCharacterEntities:'&;'            
     self unescapeCharacterEntities:'&16368;'            
     self unescapeCharacterEntities:'&16368;&16368'            
     self unescapeCharacterEntities:'&16368;<'            
     self unescapeCharacterEntities:'&16368;<'            
     self unescapeCharacterEntities:'Ϩ'    
     self unescapeCharacterEntities:'က'    
     self unescapeCharacterEntities:'꿾'    
     self unescapeCharacterEntities:'"<foo'      
     self unescapeCharacterEntities:'&funny;<foo'     

o  urlDecoded: aString
Convert escaped characters in an urls arguments or post fields back to their proper characters.
Undoes the effect of #urlEncoded: and #urlEncoded2:.
These are:
+ -> space
%XX ascii as hex digits
%uXXXX unicode as hex digits NOTE: %u is non-standard bit implemented in MS IIS
%% -> %

Usage example(s):

     self urlDecoded:'a%20b'   
     self urlDecoded:'a%%b'
     self urlDecoded:'a+b' 
     self urlDecoded:'a%+b' 
     self urlDecoded:'a%' 
     self urlDecoded:'a%2' 
     self urlDecoded:'/Home/a%C3%A4%C3%B6%C3%BCa'

o  urlEncode2: aStringOrStream on: ws
helper to escape invalid/dangerous characters in an urls arguments.
Similar to urlEncode, but treats '*','~' and spaces differently.
(some clients, such as bitTorrent seem to require this - time will tell...)
Any byte not in the set 0-9, a-z, A-Z, '.', '-', '_', is encoded using
the '%nn' format, where nn is the hexadecimal value of the byte.
see: RFC1738

** This is an obsolete interface - do not use it (it may vanish in future versions) **

o  urlEncode: aStringOrStream on: ws
helper to escape invalid/dangerous characters in an url's argument or post-fields.

Any byte not in the set 0-9, a-z, A-Z, '.', '-', '_' and '~',
is encoded using the '%nn' format, where nn is the hexadecimal value of the byte.
Characters outside the ASCII range are encoded into utf8 first.
Spaces are encoded as '+'.
see: application/x-www-form-urlencoded
see: https://tools.ietf.org/html/rfc3986 (obsoletes RFC1738)

o  urlEncoded2: aString
helper to escape invalid/dangerous characters in an urls arguments or post-fields.
Similar to urlEncoded, but treats '*','~' and spaces differently.
(some clients, such as bitTorrent seem to require this - time will tell...)
Any byte not in the set 0-9, a-z, A-Z, '.', '-', '_' and '~', is encoded using
the '%nn' format, where nn is the hexadecimal value of the byte.
see: application/x-www-form-urlencoded
see: RFC1738

** This is an obsolete interface - do not use it (it may vanish in future versions) **

o  urlEncoded: aString
helper to escape invalid/dangerous characters in an urls arguments or post-fields.

Any byte not in the set 0-9, a-z, A-Z, '.', '-', '_' and '~', is encoded using
the '%nn' format, where nn is the hexadecimal value of the byte.
Characters outside the ASCII range are encoded into utf8 first.
Spaces are encoded as '+'.
see: application/x-www-form-urlencoded
see: https://tools.ietf.org/html/rfc3986 (obsoletes RFC1738)

Usage example(s):

      self unEscape:(self urlEncoded:'_-.*Frankfurt(Main) Hbf')
      self urlEncoded:'_-.*Frankfurt(Main) Hbf'

      self unEscape:(self urlEncoded:'-_.*%exept;')
      self urlEncoded:'-_.*%exept;'

      self urlEncoded:'Не только в сервере, но и в ComSpec, чтобы дочерние КОНСОЛЬНЫЕ процессы могли пользоваться редиректами'

o  withAllSpecialHTMLCharactersEscaped: aStringOrCharacter
replace ampersand, less, greater and quotes by html-character escapes.
This DOES escape quote and doubleQuote characters.

Usage example(s):

     self withAllSpecialHTMLCharactersEscaped:'<>#&'     
     self withAllSpecialHTMLCharactersEscaped:$<
     self withAllSpecialHTMLCharactersEscaped:$#

o  withSpecialHTMLCharactersEscaped: aStringOrCharacter
replace ampersand, less and greater by html-character escapes.
Does NOT escape percent and control characters.
Does NOT escape quote and doubleQuote characters.

Usage example(s):

     self withSpecialHTMLCharactersEscaped:'<>#&'
     self withSpecialHTMLCharactersEscaped:$<
     self withSpecialHTMLCharactersEscaped:$#

queries
o  isUtilityClass
(comment from inherited method)
a utility class is one which is not to be instantiated,
but only provides a number of utility functions on the class side.
It is usually also abstract

serving-helpers
o  escape: aString
helper to escape invalid/dangerous characters in an url's arguments or post-fields.
These are:
control characters, dQuote, '+', ';', '?', '&' and space -> %XX ascii as hex digits
% -> %%

Usage example(s):

     self escape:'a b'      
     self escape:'a%b'    
     self escape:'a b'      
     self escape:'a+b'      
     self escape:'aäüöb'      

text processing helpers
o  convertFromMarkDown: markDownString
given some MarkDown (Wiki), convert to html.

o  convertFromMarkDown: markDownString bodyTag: writeBodyTag
given some MarkDown, convert to html.

Usage example(s):

        |mdString|
        mdString := '
# To Do
## At Home
* Wash dishes
* Install winter tires
## At Work
* Finish Report
* Book Team **101** meeting'.

        self convertFromMarkDown:mdString bodyTag:true.

o  convertFromWikiStyle: wikiStyleString
given some wiki text, convert to html.

o  convertFromWikiStyle: wikiStyleString bodyTag: writeBodyTag
given some wiki text, convert to html.

Usage example(s):

       |wikiString|
       wikiString := '== headline2
=== headline3
=== headline3b ===
* bullet1
* bullet2
line1
line2

line3
line4

line5
-'.
       self convertFromWikiStyle:wikiString bodyTag:true.

o  plainTextOfHTML: htmlString
given some HTML, extract the raw text.
Can be used to search for strings in some html text.

Usage example(s):

     self plainTextOfHTML:'
            bla1 bla2 
bla3
bla4
bla5

bla6' self plainTextOfHTML:'Hello World' self plainTextOfHTML:nil



ST/X 7.7.0.0; WebServer 1.702 at 20f6060372b9.unknown:8081; Sat, 21 Dec 2024 16:23:31 GMT