eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'HTML::HTMLParser':

Home

everywhere
www.exept.de
for:
[back]

Class: HTMLParser (in HTML)


Inheritance:

   Object
   |
   +--HTML::HTMLParser

Package:
stx:goodies/webServer/htmlTree
Category:
Net-Documents-Utilities
Version:
rev: 1.28 date: 2009/07/03 15:10:46
user: cg
file: HTML__HTMLParser.st directory: goodies/webServer/htmlTree
module: stx stc-classLibrary: htmlTree
Author:
Claus Gittinger

Description:


Instances of this class are used to read HTML documents
and build a tree of HTML::Element objects.

IMPORTANT: textScannedSoFar is in the characterEncoding of the input data. Conversion takes place if a textBlock
is finished!


Related information:

    Element

Class protocol:

initialization
o  initialize

o  initializeAmpersandEscapes
NOTE: we have some inconsistencies here.
We map ampersand escape chars to ISO-8859-1 codes,
and try to interpret them as some other encoding,
if characterDecoder is set

o  initializeElementTypes

o  initializeMathAmpersandEscapes
these are obsolete now, as HTML4 added the missing stuff in the meantime.

parsing
o  parseText: aStringOrStream
parse aStringOrStream; answer the parsed document

o  parseText: aStringOrStream characterEncoding: anEncodingString
parse aStringOrStream. The encoding of the character set is specified by anEncodingString
(e.g. #utf8 or 'iso8859-1').

Answer the parsed document


Instance protocol:

accessing
o  characterEncoding: aString
set the character set / ecoding for the following text

error reporting
o  infoMessage: msg

private
o  addElement: anElement

o  addProcessingInstruction: aProcessingInstruction

o  addText: aString

o  classForType: aTypeSymbol
internal interface - return a markup elements class, given a typeSymbol
(such as #b, #pre or #'/pre')

o  elementFor: aString
given a marks string (such as 'b', 'pre' or '/pre'),
return a new markup instance

o  endElement: markupText

o  inPre
return true, if currently in a pre element.
(Do not strip separators of a text block if inside a pre)

public-scanning
o  parseText: aStringOrStream
parse some string, return a list of markups

o  parseText: aStringOrStream withBindings: metaBindings
parse some string, return a list of HTMLMarkups.
Ampersand variables (i.e. &url) are expanded as given in the
metabindings dictionary.
(this seems to be non-standard HTML, but is used in hotjava).
The destination is only required for scripts, which may want to access
document very early.

o  parseText: aStringOrStream withBindings: metaBindings for: aDestination
parse some string, return a list of HTMLMarkups.
Ampersand variables (i.e. &url) are expanded as given in the
metabindings dictionary.
(this seems to be non-standard HTML, but is used in hotjava).
The destination is only required for scripts, which may want to access
document very early.

scanning
o  ampersandEscape
parse an ampersand escape; the '&' has already been read.

o  ampersandEscape: aString
return a new string, containing the ampersand escape character.
Expects aString to NOT contain the initial ampersand.

o  ampersandEscapeString
parse an ampersand escape; the '&' has already been read.
Return the escape string.

o  collectParametersFrom: text

o  extractMetaInformationFrom: metaElement

o  finishTextBlock
finish a scanned textBlock; add it to the markup list

o  parseMarkup
'<' has been detected; parse and return a markup element

o  startNewTextBlock

scripts
o  parseJavaScriptFrom: scriptStream

o  parseSmalltalkScriptFrom: scriptStream

o  script: element
a <script> TAG was encountered.
check for the language (which defaults to javaScript) and dispatch
to a script language handler.

o  script_javascript: element
a <script language=javaScript> TAG was encountered.
parse the script, and construct the scriptObject

o  script_smalltalkscript: element
a <script language=smalltalkScript> TAG was encountered.
parse the script, and construct the scriptObject (which has the methods in
its anonymous class)


Examples:


ElementTypes := nil. HTMLParser initializeElementTypes


  |p in document|

  p := HTML::HTMLParser new.
  in := '<head>
<? bla bla bla ?>
<!-- bla bla bla -->
<!-- 
bla bla bla -->
<!-- 
bla bla bla 
-->
</head>
' readStream.
  document := p parseText:in.
  in close.
  document inspect


  |p in document|

  p := HTML::HTMLParser new.
  in := '../../doc/online/english/TOP.html' asFilename readStream.
  document := p parseText:in.
  in close.
  document inspect


  |p in document|

  p := HTML::HTMLParser new. 
  in := '../../../exept/expecco/projects/not_delivered/buggyWebShopDemo/selenium_tests/buggyWebshop_bestellung'
               asFilename readStream.
  document := p parseText:in.
  in close.
  document inspect.


  |p in document|

  p := HTML::HTMLParser new. 
  in := '../../../exept/expecco/projects/not_delivered/buggyWebShopDemo/selenium_tests/buggyWebshop_checkImages'
               asFilename readStream.
  document := p parseText:in.
  in close.
  document inspect.


  |p in document|

  p := HTML::HTMLParser new. 
  in := '../../../exept/expecco/projects/not_delivered/buggyWebShopDemo/selenium_tests/buggyWebshop_checkLinks'
               asFilename readStream.
  document := p parseText:in.
  in close.
  document inspect.


  |p in document|

  p := HTML::HTMLParser new.
  in := '
<?xml version=''1.0'' encoding=''UTF-8''?>
<!DOCTYPE html PUBLIC ''-//W3C//DTD XHTML 1.0 Strict//EN'' ''http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd''>
<html xmlns=''http://www.w3.org/1999/xhtml'' xml:lang=''en'' lang=''en''>
<head profile=''http://selenium-ide.openqa.org/profiles/test-case''>
<meta http-equiv=''Content-Type'' content=''text/html; charset=UTF-8'' />
<link rel=''selenium.base'' href='''' />
<title>New Test</title>
</head>
<body>
<table cellpadding=''1'' cellspacing=''1'' border=''1''>
<thead>
<tr><td rowspan=''1'' colspan=''3''>New Test</td></tr>
</thead><tbody>

</tbody></table>
</body>
</html>
' readStream.
  document := p parseText:in.
  in close.
  document inspect


ST/X 6.1.1; WebServer 1.620 at exept:8081; Fri, 10 Feb 2012 14:11:15 GMT