eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'HTMLParser':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: HTMLParser


Inheritance:

   Object
   |
   +--HTMLParser

Package:
stx:libhtml
Category:
System-Documentation
Version:
rev: 1.119 date: 2024/04/22 17:41:25
user: stefan
file: HTMLParser.st directory: libhtml
module: stx stc-classLibrary: libhtml

Description:


Notice & Warning: 
    this HTML markup framework and the corresponding parser
    started as a quick hack (in the 90's) when replacing a buggy mosaic
    X-widget with a Smalltalk written HTML viewer. 
    Its goals were to be fast enough for typical uses, to be not too memory hungry
    and to provide the functionality required to display simple help documents
    (i.e. the online doc).
    It was NOT meant to become a full featured web-browser replacement.

    We plan to replace all uses of this parser by the newer HTML::HTMLParser,
    which generates a better DOM representation.
    
    This framework is still in use as the document viewer inside ST/X,
    and supported to the extent that simple online help documents and html tooltips are to be displayed.
    However, there are no plans to further enhance or spend more time on its maintenance.

    If you need more sophisticated html/dom/doc functionality, you may want to use either
    the HTMLTree framework or one of the free frameworks found in the goodies folder.

instances of this class are used to read HTML documents
and build a collection (linked list) of markup elements for simple online help documents.
This markup-collection can be displayed using the HTMLDocumentViewer
or printed by the HTMLDocumentPrinter.
It can also be used to extract anchors from an html document.

copyright

COPYRIGHT (c) 1996 by Claus Gittinger All Rights Reserved This software is furnished under a license and may be used only in accordance with the terms of that license and with the inclusion of the above copyright notice. This software may not be provided or otherwise made available to, or used by, any other person. No title to or ownership of the software is hereby transferred.

Class protocol:

accessing
o  ampersandEscapes
backward compatibility only

** This is an obsolete interface - do not use it (it may vanish in future versions) **

o  mathAmpersandEscapes
backward compatibility only

** This is an obsolete interface - do not use it (it may vanish in future versions) **

parsing
o  parseText: aStringOrStream
parse aStringOrStream.and answer the parsed document

Usage example(s):

     self parseText:'hello world - this is easy'  
     self parseText:'hello < world > - this is easy'  
     self parseText:'hello world this is easy'  
     self parseText:'hello
world

this is easy' self parseText:'hello

  • world
  • foo

this is easy' self parseText:'

this is easy' self parseText:('../../doc/online/english/TOP.html' asFilename contentsOfEntireFile asString) self parseText:('../../doc/online/english/TOP.html' asFilename readStream) self parseText:('../../doc/online/english/TOP.html' asFilename contentsOfEntireFile asString)

o  parseText: aStringOrStream characterEncoding: anEncodingString
parse aStringOrStream. The encoding of the character set is specified by anEncodingString
(e.g. #utf8 or 'iso8859-1').

Answer the parsed document

Usage example(s):

     self
        parseText:('../../doc/online/english/TOP.html' 
                        asFilename contentsOfEntireFile asString) 
        characterEncoding:#utf8


Instance protocol:

accessing
o  characterEncoding: aString
set the character set / encoding for the following text

error reporting
o  infoMessage: msg

scanning
o  ampersandEscape
parse an ampersand escape; the '&' has already been read.

o  ampersandEscape: aString
return a new string, containing the ampersand escape character.
Expects aString to NOT contain the initial ampersand.

Usage example(s):

     (HTMLParser new) ampersandEscape:'lt'   
     (HTMLParser new) ampersandEscape:'ouml'  
     (HTMLParser new) ampersandEscape:'#32'  

     (HTMLParser new) parseText:'hello α β γ normal'    
     (HTMLParser new) parseText:'helloworld

this is easy'

o  ampersandEscapeString
parse an ampersand escape; the '&' has already been read.
Return the escape string.

o  extractMetaInformationFrom: element
<mime-type> ; charset=

o  finishTextBlock
finish a scanned textBlock; add it to the markup list

o  parseMarkup
parse markup after '<' and return a markup element

o  parseText: aStringOrStream
parse some string, return a list of markups

Usage example(s):

     (HTMLParser new) parseText:'hello world - this is easy'  
     (HTMLParser new) parseText:'helloworld - this is easy'  
     (HTMLParser new) parseText:'hello < world > - this is easy'  
     (HTMLParser new) parseText:'hello world this is easy'  
     (HTMLParser new) parseText:'hello
world

this is easy' (HTMLParser new) parseText:'hello

  • world
  • foo

this is easy' (HTMLParser new) parseText:'

this is easy' (HTMLParser new) parseText:('../../doc/online/english/TOP.html' asFilename contentsOfEntireFile asString) (HTMLParser new) parseText:('../../doc/online/english/TOP.html' asFilename readStream) (HTMLParser new) parseText:('../../doc/online/english/programming/viewintro.html' asFilename contentsOfEntireFile asString)

o  parseText: aStringOrStream withBindings: metaBindings
parse some string, return a list of HTMLMarkups.
Ampersand variables (i.e. &url) are expanded as given in the
metabindings dictionary.
(this seems to be non-standard HTML, but is used in hotjava).
The destination is only required for scripts, which may want to access
document very early.

Usage example(s):

     (HTMLParser new) parseText:'hello world - this is easy'  
     (HTMLParser new) parseText:'hello < world > - this is easy'  
     (HTMLParser new) parseText:'hello world this is easy'  
     (HTMLParser new) parseText:'hello
world

this is easy' (HTMLParser new) parseText:'hello

  • world
  • foo

this is easy' (HTMLParser new) parseText:'

this is easy' (HTMLParser new) parseText:('../../doc/online/english/TOP.html' asFilename contentsOfEntireFile asString) (HTMLParser new) parseText:('../../doc/online/english/programming/viewintro.html' asFilename contentsOfEntireFile asString)

o  parseText: aStringOrStream withBindings: metaBindings for: aDestination
parse some string, return a list of HTMLMarkups.
Ampersand variables (i.e. &url) are expanded as given in the
metabindings dictionary.
(this seems to be non-standard HTML, but is used in hotjava).
The destination is only required for scripts, which may want to access
document very early.

Usage example(s):

     (HTMLParser new) parseText:'hello world - this is easy'  
     (HTMLParser new) parseText:'hello < world > - this is easy'  
     (HTMLParser new) parseText:'hello world this is easy'  
     (HTMLParser new) parseText:'hello
world

this is easy' (HTMLParser new) parseText:'hello

  • world
  • foo

this is easy' (HTMLParser new) parseText:'

this is easy' (HTMLParser new) parseText:('../../doc/online/english/TOP.html' asFilename contentsOfEntireFile asString) (HTMLParser new) parseText:('../../doc/online/english/programming/viewintro.html' asFilename contentsOfEntireFile asString)

o  startNewTextBlock

scripts

o  parseJavaScriptFrom: scriptStream
HTML

o  parseSmalltalkScriptFrom: scriptStream

o  script: element
a <script> TAG was encountered.
check for the language (which defaults to javaScript) and dispatch
to a script language handler.

o  script_javascript: element
a <script language=javaScript> TAG was encountered.
parse the script, and construct the scriptObject

o  script_smalltalkscript: element
a <script language=smalltalkScript> TAG was encountered.
parse the script, and construct the scriptObject (which has the methods in
its anonymous class)


Examples:


    |p in document|

    p := HTMLParser new.
    in := '../../doc/online/english/TOP.html' asFilename readStream.
    document := p parseText:in.
    in close.
    document inspect
    |v document|

    v := HTMLDocumentView new openAndWaitUntilVisible.

    v homeDocument:'../../doc/online/english/TOP.html'.
    |top v document|

    top := StandardSystemView extent:200@500.
    v := HVScrollableView for:HTMLDocumentView miniScrollerH:true in:top.
    v origin:0.0@ 0.0 corner:1.0@1.0.
    top openAndWaitUntilVisible.

    v homeDocument:'../../doc/online/english/TOP.html'.
    |v document|

    v := HTMLDocumentView new openAndWaitUntilVisible.

    document := (HTMLParser new) 
                        parseText:('../../doc/online/english/programming/viewintro.html' 
                                        asFilename readStream).
    v document:document.

    |p in document|

    p := HTMLParser new.
    in := '<html><body>combining: a&#768;rest' readStream.
    document := p parseText:in.
    in close.
    document inspect
    |p document|

    p := HTMLParser new.
    document := p parseText:'&auml; <script>
    bla &auml; bla
</script> &auml; <div> &auml; </div> <script>
    more bla &auml; bla
</script> '.
    document inspect


ST/X 7.7.0.0; WebServer 1.702 at 20f6060372b9.unknown:8081; Wed, 22 Jan 2025 05:55:57 GMT