|
|
Class: HTMLParser (in HTML)
Object
|
+--HTML::HTMLParser
- Package:
- stx:goodies/webServer/htmlTree
- Category:
- Net-Documents-Utilities
- Version:
- rev:
1.28
date: 2009/07/03 15:10:46
- user: cg
- file: HTML__HTMLParser.st directory: goodies/webServer/htmlTree
- module: stx stc-classLibrary: htmlTree
- Author:
- Claus Gittinger
Instances of this class are used to read HTML documents
and build a tree of HTML::Element objects.
IMPORTANT: textScannedSoFar is in the characterEncoding of the input data. Conversion takes place if a textBlock
is finished!
Element
initialization
-
initialize
-
-
initializeAmpersandEscapes
-
NOTE: we have some inconsistencies here.
We map ampersand escape chars to ISO-8859-1 codes,
and try to interpret them as some other encoding,
if characterDecoder is set
-
initializeElementTypes
-
-
initializeMathAmpersandEscapes
-
these are obsolete now, as HTML4 added the missing stuff in the meantime.
parsing
-
parseText: aStringOrStream
-
parse aStringOrStream; answer the parsed document
-
parseText: aStringOrStream characterEncoding: anEncodingString
-
parse aStringOrStream. The encoding of the character set is specified by anEncodingString
(e.g. #utf8 or 'iso8859-1').
Answer the parsed document
accessing
-
characterEncoding: aString
-
set the character set / ecoding for the following text
error reporting
-
infoMessage: msg
-
private
-
addElement: anElement
-
-
addProcessingInstruction: aProcessingInstruction
-
-
addText: aString
-
-
classForType: aTypeSymbol
-
internal interface - return a markup elements class, given a typeSymbol
(such as #b, #pre or #'/pre')
-
elementFor: aString
-
given a marks string (such as 'b', 'pre' or '/pre'),
return a new markup instance
-
endElement: markupText
-
-
inPre
-
return true, if currently in a pre element.
(Do not strip separators of a text block if inside a pre)
public-scanning
-
parseText: aStringOrStream
-
parse some string, return a list of markups
-
parseText: aStringOrStream withBindings: metaBindings
-
parse some string, return a list of HTMLMarkups.
Ampersand variables (i.e. &url) are expanded as given in the
metabindings dictionary.
(this seems to be non-standard HTML, but is used in hotjava).
The destination is only required for scripts, which may want to access
document very early.
-
parseText: aStringOrStream withBindings: metaBindings for: aDestination
-
parse some string, return a list of HTMLMarkups.
Ampersand variables (i.e. &url) are expanded as given in the
metabindings dictionary.
(this seems to be non-standard HTML, but is used in hotjava).
The destination is only required for scripts, which may want to access
document very early.
scanning
-
ampersandEscape
-
parse an ampersand escape; the '&' has already been read.
-
ampersandEscape: aString
-
return a new string, containing the ampersand escape character.
Expects aString to NOT contain the initial ampersand.
-
ampersandEscapeString
-
parse an ampersand escape; the '&' has already been read.
Return the escape string.
-
collectParametersFrom: text
-
-
extractMetaInformationFrom: metaElement
-
-
finishTextBlock
-
finish a scanned textBlock; add it to the markup list
-
parseMarkup
-
'<' has been detected; parse and return a markup element
-
startNewTextBlock
-
scripts
-
parseJavaScriptFrom: scriptStream
-
-
parseSmalltalkScriptFrom: scriptStream
-
-
script: element
-
a <script> TAG was encountered.
check for the language (which defaults to javaScript) and dispatch
to a script language handler.
-
script_javascript: element
-
a <script language=javaScript> TAG was encountered.
parse the script, and construct the scriptObject
-
script_smalltalkscript: element
-
a <script language=smalltalkScript> TAG was encountered.
parse the script, and construct the scriptObject (which has the methods in
its anonymous class)
ElementTypes := nil.
HTMLParser initializeElementTypes
|p in document|
p := HTML::HTMLParser new.
in := '<head>
<? bla bla bla ?>
<!-- bla bla bla -->
<!--
bla bla bla -->
<!--
bla bla bla
-->
</head>
' readStream.
document := p parseText:in.
in close.
document inspect
|
|p in document|
p := HTML::HTMLParser new.
in := '../../doc/online/english/TOP.html' asFilename readStream.
document := p parseText:in.
in close.
document inspect
|
|p in document|
p := HTML::HTMLParser new.
in := '../../../exept/expecco/projects/not_delivered/buggyWebShopDemo/selenium_tests/buggyWebshop_bestellung'
asFilename readStream.
document := p parseText:in.
in close.
document inspect.
|
|p in document|
p := HTML::HTMLParser new.
in := '../../../exept/expecco/projects/not_delivered/buggyWebShopDemo/selenium_tests/buggyWebshop_checkImages'
asFilename readStream.
document := p parseText:in.
in close.
document inspect.
|
|p in document|
p := HTML::HTMLParser new.
in := '../../../exept/expecco/projects/not_delivered/buggyWebShopDemo/selenium_tests/buggyWebshop_checkLinks'
asFilename readStream.
document := p parseText:in.
in close.
document inspect.
|
|p in document|
p := HTML::HTMLParser new.
in := '
<?xml version=''1.0'' encoding=''UTF-8''?>
<!DOCTYPE html PUBLIC ''-//W3C//DTD XHTML 1.0 Strict//EN'' ''http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd''>
<html xmlns=''http://www.w3.org/1999/xhtml'' xml:lang=''en'' lang=''en''>
<head profile=''http://selenium-ide.openqa.org/profiles/test-case''>
<meta http-equiv=''Content-Type'' content=''text/html; charset=UTF-8'' />
<link rel=''selenium.base'' href='''' />
<title>New Test</title>
</head>
<body>
<table cellpadding=''1'' cellspacing=''1'' border=''1''>
<thead>
<tr><td rowspan=''1'' colspan=''3''>New Test</td></tr>
</thead><tbody>
</tbody></table>
</body>
</html>
' readStream.
document := p parseText:in.
in close.
document inspect
|
|