eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'HTMLParser':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: HTMLParser


Inheritance:

   Object
   |
   +--HTMLParser

Package:
stx:libhtml
Category:
System-Documentation
Version:
rev: 1.98 date: 2019/06/04 10:02:12
user: cg
file: HTMLParser.st directory: libhtml
module: stx stc-classLibrary: libhtml
Author:
Claus Gittinger

Description:


Notice & Warning: 
    this HTML markup framework and the corresponding parser
    started as a quick hack (in the 90's) when replacing a buggy mosaic
    X-widget with a Smalltalk written HTML viewer. 
    Its goals were to be fast enough for typical uses, to be not too memory hungry
    and to provide the functionality required to display simple help documents.
    It was NOT meant to become a full featured web-browser replacement.

    We plan to replace all uses of this parser by the newer HTML::HTMLParser,
    which generates a better DOM representation.
    
    This framework is still in use as the document viewer inside ST/X,
    and supported to the extent that simple online help documents and html tooltips are to be displayed.
    However, there are no plans to further enhance or spend more time on its maintenance.

    If you need more sophisticated html/dom/doc functionality, you may want to use either
    the HTMLTree framework or one of the free frameworks found in the goodies folder.

instances of this class are used to read HTML documents
and build a collection (linked list) of markup elements for simple online help documents.
This markup-collection can be displayed using the HTMLDocumentViewer
or printed by the HTMLDocumentPrinter.


Related information:

    HTMLMarkup
    HTMLDocumentView
    HTMLDocumentPainter
    HTMLDocumentPrinter

Class protocol:

accessing
o  ampersandEscapes
backward compatibility only

** This is an obsolete interface - do not use it (it may vanish in future versions) **

o  mathAmpersandEscapes
backward compatibility only

** This is an obsolete interface - do not use it (it may vanish in future versions) **

initialization
o  initialize
save space by reusing common strings (empty lines and single spaces)

usage example(s):

     AmpersandEscapes := nil.
     HTMLParser initialize

     MathAmpersandEscapes := nil.
     HTMLParser initialize

parsing
o  parseText: aStringOrStream
parse aStringOrStream.and answer the parsed document

usage example(s):

     self parseText:'hello world - this is easy'  
     self parseText:'hello < world > - this is easy'  
     self parseText:'hello world this is easy'  
     self parseText:'hello
world

this is easy' self parseText:'hello

  • world
  • foo

this is easy' self parseText:'

this is easy' self parseText:('../../doc/online/english/TOP.html' asFilename contentsOfEntireFile asString) self parseText:('../../doc/online/english/TOP.html' asFilename readStream) self parseText:('../../doc/online/english/TOP.html' asFilename contentsOfEntireFile asString)

o  parseText: aStringOrStream characterEncoding: anEncodingString
parse aStringOrStream. The encoding of the character set is specified by anEncodingString
(e.g. #utf8 or 'iso8859-1').

Answer the parsed document

usage example(s):

     self
        parseText:('../../doc/online/english/TOP.html' 
                        asFilename contentsOfEntireFile asString) characterEncoding:#utf8


Instance protocol:

accessing
o  characterEncoding: aString
set the character set / encoding for the following text

error reporting
o  infoMessage: msg

scanning
o  ampersandEscape
parse an ampersand escape; the '&' has already been read.

o  ampersandEscape: aString
return a new string, containing the ampersand escape character.
Expects aString to NOT contain the initial ampersand.

usage example(s):

     (HTMLParser new) ampersandEscape:'lt'   
     (HTMLParser new) ampersandEscape:'ouml'  
     (HTMLParser new) ampersandEscape:'#32'  

     (HTMLParser new) parseText:'hello α β γ normal'    
     (HTMLParser new) parseText:'helloworld

this is easy'

o  ampersandEscapeString
parse an ampersand escape; the '&' has already been read.
Return the escape string.

o  extractMetaInformationFrom: element
<mime-type> ; charset=

o  finishTextBlock
finish a scanned textBlock; add it to the markup list

o  parseMarkup
parse '<' and return a markup element

o  parseText: aStringOrStream
parse some string, return a list of markups

usage example(s):

     (HTMLParser new) parseText:'hello world - this is easy'  
     (HTMLParser new) parseText:'helloworld - this is easy'  
     (HTMLParser new) parseText:'hello < world > - this is easy'  
     (HTMLParser new) parseText:'hello world this is easy'  
     (HTMLParser new) parseText:'hello
world

this is easy' (HTMLParser new) parseText:'hello

  • world
  • foo

this is easy' (HTMLParser new) parseText:'

this is easy' (HTMLParser new) parseText:('../../doc/online/english/TOP.html' asFilename contentsOfEntireFile asString) (HTMLParser new) parseText:('../../doc/online/english/TOP.html' asFilename readStream) (HTMLParser new) parseText:('../../doc/online/english/programming/viewintro.html' asFilename contentsOfEntireFile asString)

o  parseText: aStringOrStream withBindings: metaBindings
parse some string, return a list of HTMLMarkups.
Ampersand variables (i.e. &url) are expanded as given in the
metabindings dictionary.
(this seems to be non-standard HTML, but is used in hotjava).
The destination is only required for scripts, which may want to access
document very early.

usage example(s):

     (HTMLParser new) parseText:'hello world - this is easy'  
     (HTMLParser new) parseText:'hello < world > - this is easy'  
     (HTMLParser new) parseText:'hello world this is easy'  
     (HTMLParser new) parseText:'hello
world

this is easy' (HTMLParser new) parseText:'hello

  • world
  • foo

this is easy' (HTMLParser new) parseText:'

this is easy' (HTMLParser new) parseText:('../../doc/online/english/TOP.html' asFilename contentsOfEntireFile asString) (HTMLParser new) parseText:('../../doc/online/english/programming/viewintro.html' asFilename contentsOfEntireFile asString)

o  parseText: aStringOrStream withBindings: metaBindings for: aDestination
parse some string, return a list of HTMLMarkups.
Ampersand variables (i.e. &url) are expanded as given in the
metabindings dictionary.
(this seems to be non-standard HTML, but is used in hotjava).
The destination is only required for scripts, which may want to access
document very early.

usage example(s):

     (HTMLParser new) parseText:'hello world - this is easy'  
     (HTMLParser new) parseText:'hello < world > - this is easy'  
     (HTMLParser new) parseText:'hello world this is easy'  
     (HTMLParser new) parseText:'hello
world

this is easy' (HTMLParser new) parseText:'hello

  • world
  • foo

this is easy' (HTMLParser new) parseText:'

this is easy' (HTMLParser new) parseText:('../../doc/online/english/TOP.html' asFilename contentsOfEntireFile asString) (HTMLParser new) parseText:('../../doc/online/english/programming/viewintro.html' asFilename contentsOfEntireFile asString)

o  startNewTextBlock

scripts

o  parseJavaScriptFrom: scriptStream
HTML

o  parseSmalltalkScriptFrom: scriptStream

o  script: element
a <script> TAG was encountered.
check for the language (which defaults to javaScript) and dispatch
to a script language handler.

o  script_javascript: element
a <script language=javaScript> TAG was encountered.
parse the script, and construct the scriptObject

o  script_smalltalkscript: element
a <script language=smalltalkScript> TAG was encountered.
parse the script, and construct the scriptObject (which has the methods in
its anonymous class)


Examples:


    |p in document|

    p := HTMLParser new.
    in := '../../doc/online/english/TOP.html' asFilename readStream.
    document := p parseText:in.
    in close.
    document inspect
    |v document|

    v := HTMLDocumentView new openAndWait.

    v homeDocument:'../../doc/online/english/TOP.html'.
    |top v document|

    top := StandardSystemView extent:200@500.
    v := HVScrollableView for:HTMLDocumentView miniScrollerH:true in:top.
    v origin:0.0@ 0.0 corner:1.0@1.0.
    top openAndWait.

    v homeDocument:'../../doc/online/english/TOP.html'.
    |v document|

    v := HTMLDocumentView new openAndWait.

    document := (HTMLParser new) 
                        parseText:('../../doc/online/english/programming/viewintro.html' 
                                        asFilename readStream).
    v document:document.



ST/X 7.2.0.0; WebServer 1.670 at bd0aa1f87cdd.unknown:8081; Tue, 19 Mar 2024 09:40:17 GMT