|
Class: HTMLParser
Object
|
+--HTMLParser
- Package:
- stx:libhtml
- Category:
- System-Documentation
- Version:
- rev:
1.119
date: 2024/04/22 17:41:25
- user: stefan
- file: HTMLParser.st directory: libhtml
- module: stx stc-classLibrary: libhtml
Notice & Warning:
this HTML markup framework and the corresponding parser
started as a quick hack (in the 90's) when replacing a buggy mosaic
X-widget with a Smalltalk written HTML viewer.
Its goals were to be fast enough for typical uses, to be not too memory hungry
and to provide the functionality required to display simple help documents
(i.e. the online doc).
It was NOT meant to become a full featured web-browser replacement.
We plan to replace all uses of this parser by the newer HTML::HTMLParser,
which generates a better DOM representation.
This framework is still in use as the document viewer inside ST/X,
and supported to the extent that simple online help documents and html tooltips are to be displayed.
However, there are no plans to further enhance or spend more time on its maintenance.
If you need more sophisticated html/dom/doc functionality, you may want to use either
the HTMLTree framework or one of the free frameworks found in the goodies folder.
instances of this class are used to read HTML documents
and build a collection (linked list) of markup elements for simple online help documents.
This markup-collection can be displayed using the HTMLDocumentViewer
or printed by the HTMLDocumentPrinter.
It can also be used to extract anchors from an html document.
copyrightCOPYRIGHT (c) 1996 by Claus Gittinger
All Rights Reserved
This software is furnished under a license and may be used
only in accordance with the terms of that license and with the
inclusion of the above copyright notice. This software may not
be provided or otherwise made available to, or used by, any
other person. No title to or ownership of the software is
hereby transferred.
accessing
-
ampersandEscapes
-
backward compatibility only
** This is an obsolete interface - do not use it (it may vanish in future versions) **
-
mathAmpersandEscapes
-
backward compatibility only
** This is an obsolete interface - do not use it (it may vanish in future versions) **
parsing
-
parseText: aStringOrStream
-
parse aStringOrStream.and answer the parsed document
Usage example(s):
self parseText:'hello world - this is easy'
self parseText:'hello < world > - this is easy'
self parseText:'hello world this is easy'
self parseText:'hello world this is easy'
self parseText:'hello this is easy'
self parseText:' this is easy'
self
parseText:('../../doc/online/english/TOP.html'
asFilename contentsOfEntireFile asString)
self
parseText:('../../doc/online/english/TOP.html'
asFilename readStream)
self
parseText:('../../doc/online/english/TOP.html'
asFilename contentsOfEntireFile asString)
|
-
parseText: aStringOrStream characterEncoding: anEncodingString
-
parse aStringOrStream. The encoding of the character set is specified by anEncodingString
(e.g. #utf8 or 'iso8859-1').
Answer the parsed document
Usage example(s):
self
parseText:('../../doc/online/english/TOP.html'
asFilename contentsOfEntireFile asString)
characterEncoding:#utf8
|
accessing
-
characterEncoding: aString
-
set the character set / encoding for the following text
error reporting
-
infoMessage: msg
-
scanning
-
ampersandEscape
-
parse an ampersand escape; the '&' has already been read.
-
ampersandEscape: aString
-
return a new string, containing the ampersand escape character.
Expects aString to NOT contain the initial ampersand.
Usage example(s):
(HTMLParser new) ampersandEscape:'lt'
(HTMLParser new) ampersandEscape:'ouml'
(HTMLParser new) ampersandEscape:'#32'
(HTMLParser new) parseText:'hello α β γ normal'
(HTMLParser new) parseText:'hello
-
ampersandEscapeString
-
parse an ampersand escape; the '&' has already been read.
Return the escape string.
-
extractMetaInformationFrom: element
-
<mime-type> ; charset=
-
finishTextBlock
-
finish a scanned textBlock; add it to the markup list
-
parseMarkup
-
parse markup after '<' and return a markup element
-
parseText: aStringOrStream
-
parse some string, return a list of markups
Usage example(s):
(HTMLParser new) parseText:'hello world - this is easy'
(HTMLParser new) parseText:'helloworld - this is easy'
(HTMLParser new) parseText:'hello < world > - this is easy'
(HTMLParser new) parseText:'hello world this is easy'
(HTMLParser new) parseText:'hello world this is easy'
(HTMLParser new) parseText:'hello this is easy'
(HTMLParser new) parseText:' this is easy'
(HTMLParser new)
parseText:('../../doc/online/english/TOP.html'
asFilename contentsOfEntireFile asString)
(HTMLParser new)
parseText:('../../doc/online/english/TOP.html'
asFilename readStream)
(HTMLParser new)
parseText:('../../doc/online/english/programming/viewintro.html'
asFilename contentsOfEntireFile asString)
|
-
parseText: aStringOrStream withBindings: metaBindings
-
parse some string, return a list of HTMLMarkups.
Ampersand variables (i.e. &url) are expanded as given in the
metabindings dictionary.
(this seems to be non-standard HTML, but is used in hotjava).
The destination is only required for scripts, which may want to access
document very early.
Usage example(s):
(HTMLParser new) parseText:'hello world - this is easy'
(HTMLParser new) parseText:'hello < world > - this is easy'
(HTMLParser new) parseText:'hello world this is easy'
(HTMLParser new) parseText:'hello world this is easy'
(HTMLParser new) parseText:'hello this is easy'
(HTMLParser new) parseText:' this is easy'
(HTMLParser new)
parseText:('../../doc/online/english/TOP.html'
asFilename contentsOfEntireFile asString)
(HTMLParser new)
parseText:('../../doc/online/english/programming/viewintro.html'
asFilename contentsOfEntireFile asString)
|
-
parseText: aStringOrStream withBindings: metaBindings for: aDestination
-
parse some string, return a list of HTMLMarkups.
Ampersand variables (i.e. &url) are expanded as given in the
metabindings dictionary.
(this seems to be non-standard HTML, but is used in hotjava).
The destination is only required for scripts, which may want to access
document very early.
Usage example(s):
(HTMLParser new) parseText:'hello world - this is easy'
(HTMLParser new) parseText:'hello < world > - this is easy'
(HTMLParser new) parseText:'hello world this is easy'
(HTMLParser new) parseText:'hello world this is easy'
(HTMLParser new) parseText:'hello this is easy'
(HTMLParser new) parseText:' this is easy'
(HTMLParser new)
parseText:('../../doc/online/english/TOP.html'
asFilename contentsOfEntireFile asString)
(HTMLParser new)
parseText:('../../doc/online/english/programming/viewintro.html'
asFilename contentsOfEntireFile asString)
|
-
startNewTextBlock
-
scripts
-
parseJavaScriptFrom: scriptStream
-
HTML
-
parseSmalltalkScriptFrom: scriptStream
-
-
script: element
-
a <script> TAG was encountered.
check for the language (which defaults to javaScript) and dispatch
to a script language handler.
-
script_javascript: element
-
a <script language=javaScript> TAG was encountered.
parse the script, and construct the scriptObject
-
script_smalltalkscript: element
-
a <script language=smalltalkScript> TAG was encountered.
parse the script, and construct the scriptObject (which has the methods in
its anonymous class)
|p in document|
p := HTMLParser new.
in := '../../doc/online/english/TOP.html' asFilename readStream.
document := p parseText:in.
in close.
document inspect
|
|v document|
v := HTMLDocumentView new openAndWaitUntilVisible.
v homeDocument:'../../doc/online/english/TOP.html'.
|
|top v document|
top := StandardSystemView extent:200@500.
v := HVScrollableView for:HTMLDocumentView miniScrollerH:true in:top.
v origin:0.0@ 0.0 corner:1.0@1.0.
top openAndWaitUntilVisible.
v homeDocument:'../../doc/online/english/TOP.html'.
|
|v document|
v := HTMLDocumentView new openAndWaitUntilVisible.
document := (HTMLParser new)
parseText:('../../doc/online/english/programming/viewintro.html'
asFilename readStream).
v document:document.
|
|p in document|
p := HTMLParser new.
in := '<html><body>combining: àrest' readStream.
document := p parseText:in.
in close.
document inspect
|
|p document|
p := HTMLParser new.
document := p parseText:'ä <script>
bla ä bla
</script> ä <div> ä </div> <script>
more bla ä bla
</script> '.
document inspect
|
|
| ST/X 7.7.0.0; WebServer 1.702 at 20f6060372b9.unknown:8081; Wed, 22 Jan 2025 05:55:57 GMT |
|