|
Class: XMLParser (in XML)
Object
|
+--XML::XMLParser
|
+--XML::XMLParserStX
- Package:
- stx:goodies/xml/vw
- Category:
- XML-VW-Parsing
- Version:
- rev:
1.99
date: 2023/10/23 15:31:42
- user: stefan
- file: XMLParser.st directory: goodies/xml/vw
- module: stx stc-classLibrary: vw
This class represents the main XML processor in the system.
This XMLParser may be used as a validating or non-validating parser to scan and process an XML document
and provide access to it's content and structure to a smalltalk application.
This XMLParser tries to follow the guidelines laid out in the W3C XML Version 1.0 Recommendation,
plus the XML Namespaces Recommendation.
Instance Variables:
sourceStack <XML.StreamWrapper> stack of input streams that handles inclusion.
hereChar <Character> the current character being parsed
lastSource <XML.StreamWrapper> record of previous source used to check correct nesting
currentSource <XML.StreamWrapper> current input stream (the top of sourceStack)
documentNode <XML.Document> the document created by parsing
dtd <XML.DocumentType> the document type definition for the current document
unresolvedIDREFs <Collection> collection of IDREfs that have yet to be resolved; used for validation
builder <XML.NodeBuilder> node builder
validating <Boolean> if true then the parse validates the XML
ignore <Boolean> ?
eol <Character> the end-of-line character in the source stream
attribute processing
-
isValidName: aTag
-
-
isValidNmToken: aTag
-
defaults
-
defaultNormalizeAttributes: aBoolean
-
dialects
-
concreteClass
-
return the concrete parser class, per smalltalk dialect
instance creation
-
new
-
(comment from inherited method)
return an instance of myself without indexed variables
-
on: aStream
-
-
on: aStream protocol: protocolString name: name
-
-
parse: aStringOrStream
-
parse the xml in aStringOrStream;
return a DOM-tree
-
parseDtdAsPatterns: aStringOrStream
-
parse a document type from aStringOrStream.
Do not normalize the DTD patterns, so they can be used for type construction.
Answer a XML::DocumentType.
-
parseDtdString: aStringOrStream
-
parse a dtd from a aStringOrStream
-
parseFile: aFilename
-
-
parseText: aStringOrStream
-
parse the xml in aStringOrStream;
return a DOM-tree.
For API compatibility with HTMLParser
-
processDocumentInFilename: aFilename
-
-
processDocumentInFilename: aFilename beforeScanDo: aBlock
-
-
processDocumentStream: aStream
-
-
processDocumentStream: aStream beforeScanDo: aBlock
-
-
processDocumentString: aString
-
-
processDocumentString: aString beforeScanDo: aBlock
-
private
-
isValidNameChar: c
-
cg: this is not correct:
^ c isLetterOrDigit or: [c == $- or:[c ==$_]]
a name may also contain much more...
-
isValidNameStart: c
-
cg: this is not correct;
^ c isLetter or: [c ==$_ ]
a name may contain much more...
utilities
-
invalid: aString
-
-
malformed: aString
-
-
mapEncoding: anEncoding
-
visualworks specific: map xml-encoding names to vw encodedStream names
-
warn: aString
-
Added to unify warnings for SAX. REW
DTD processing
-
conditionalSect
-
-
dtdEntry
-
-
dtdFile: newURI
-
So we don't lose hereChar.
-
dtdStream: aStream rootElement: rootElementNameString
-
set the DTD from the contents of aStream
-
externalID: usage
-
Usage may be #docType, #entity, or #notation.
DocType is treated specially, since PE references are not allowed.
Notation is treated specially since the system identifier of the
PUBLIC form is optional.
-
inInternalSubset
-
-
markUpDecl
-
-
notationDecl
-
-
pubIdLiteral
-
Modified (format): / 11-06-2021 / 22:39:57 / cg
-
systemLiteral
-
-
uriResolver
-
IDs
-
checkUnresolvedIDREFs
-
-
rememberIDREF: anID
-
self
-
resolveIDREF: anID
-
accessing
-
builder
-
return the value of the instance variable 'builder' (automatically generated)
-
document
-
cg: added for twoFlower *compatibilitz with newer XMLParser framework
-
dtd
-
-
encoding
-
-
eol
-
-
isEncodeChecking
-
-
isEncodeChecking: aBoolean
-
-
isTreeBuilding
-
answer true, if we build a tree of xml elements.
This is false for SAX parsing
-
isTreeBuilding: something
-
-
normalizeAttributes
-
-
normalizeAttributes: aBoolean
-
controls if attribute values like ' foo bar ' are normalized to
'foo bar' or not. The default is true.
If you have to parse non-standard XML, you can set this to false
before parsing
-
normalizeDtd
-
-
normalizeDtd: something
-
-
sourceWrapper
-
Modified (comment): / 23-02-2022 / 00:41:47 / cg
-
validate: aBoolean
-
api
-
comment
-
self
-
docTypeDecl
-
-
latestURI
-
-
misc
-
comment or PI
-
parseDtd
-
parse a plain dtd
-
pi
-
self
-
prolog
-
This is optional.
-
pushSource: aStreamWrapper
-
-
scanDocument
-
MessageTally spyOn:[
attribute def processing
-
attListDecl
-
-
completeNotationType
-
-
defaultDecl
-
^(self skipIf: '#REQUIRED')
-
enumeration
-
attribute processing
-
attValue
-
cg: must eat all other spaces ...
do it here, to limit changes to one place.
Q: is this true?
-
attribute
-
-
isValidName: arg
-
-
isValidNmToken: arg
-
-
processAttributes
-
-
quotedString
-
-
validateAttributes: attributes for: tag
-
element def processing
-
completeChildren: str
-
-
completeMixedContent: str
-
we already have the #PCDATA finished.
-
contentsSpec
-
^(self skipIf: 'ANY')
-
cp
-
-
elementDecl
-
element processing
-
charEntity: data startedIn: str1
-
parse a character entity and add it to data.
cg: separated into parsing the entity and adding to the stream
-
closeTag: tag return: elements
-
-
completeCDATA: str1
-
data := CharacterWriteStream on:(String new: 32).
-
completeComment: str1
-
OLD:
-
completePI: str1
-
pi := self upToAll_positionBefore:'?>'
-
element
-
-
elementAtPosition: startPosition
-
self mustFind:'<'.
-
elementContent: tag openedIn: str
-
(data findString: ']]>' startingAt: 1) = 0
ifFalse: [self halt: 'including ]]> in element content'].
-
generalEntityInText: str canBeExternal: external
-
-
isValidTag: aTag
-
-
parseCharEntityStartedIn: str1
-
parse a character entity.
cg: separated into parsing and separate adding to the stream
entity processing
-
PERef: refType
-
if we are in IGNORE conditional, this is not an error. gj
-
entityDecl
-
peDef modified for SAX. REW
-
entityDef: entityName
-
Parameter entityName added for SAX. REW
-
entityValue
-
-
generalEntity: str
-
-
nDataDecl
-
^self skipSpaceInDTD
-
peDef: entityName
-
Parameter entityName added for SAX. REW
initialization
-
builder: anXMLNodeBuilder
-
-
lineEndLF
-
-
on: inputStream
-
-
on: inputStream protocol: protocolString name: name
-
-
wrapStream: aStream protocol: protocolString name: name
-
private
-
checkForWrongRootNode
-
-
closeAllFiles
-
-
documentNode
-
-
error: aStringOrMessage
-
(comment from inherited method)
Raise an error with error message aString.
The error is reported by raising the Error exception,
which is non-proceedable.
If no handler has been setup, a debugger is entered.
-
expected: string
-
-
fullSourceStack
-
-
getDottedName
-
-
getElement
-
cg: added for twoFlower *compatibility with newer XMLParser framework
-
getQualifiedName
-
original:
-
getSimpleName
-
-
invalid: aString
-
-
malformed: aString
-
-
nmToken
-
-
notPermitted: string
-
-
validateEncoding: encName
-
validate the encoding string in encName.
Set the encoding instVar as a side effect.
-
validateText: data from: start to: stop testBlanks: testBlanks
-
cg: added for twoFlower *compatibilitz with newer XMLParser framework
-
warn: aString
-
Modfied to unify warn system for SAX, REW
-
with: list add: node
-
streaming
-
atEnd
-
-
forceSpace
-
-
forceSpaceInDTD
-
-
getNextChar
-
-
mustFind: str
-
-
nextChar
-
avoid #atEnd if possible (let #next return nil)
-
skipIf: str
-
-
skipSpace
-
answer true, if any whitespace was skipped
-
skipSpaceInDTD
-
-
upTo: aCharacter
-
Answer a subcollection from position to the occurrence (if any, exclusive) of anObject.
The stream is left positioned after anObject.
If anObject is not found answer everything.
-
upToAll: target
-
Answer a subcollection from the current position
up to the occurrence (if any, not inclusive) of target,
and leave the stream positioned after the occurrence.
If no occurrence is found, answer the entire remaining
stream contents, and leave the stream positioned at the end.
We are going to cheat here, and assume that the first
character in the target only occurs once in the target, so
that we don't have to backtrack.
-
upToAll_positionBefore: target
-
Answer a subcollection from the current position
up to the occurrence (if any, not inclusive) of target,
and leave the stream positioned before the occurrence.
If no occurrence is found, answer the entire remaining
stream contents, and leave the stream positioned at the end.
We are going to cheat here, and assume that the first
character in the target only occurs once in the target, so
that we don't have to backtrack.
testing
-
documentHasDTD
-
-
hasExpanded: anEntity
-
-
isIllegalCharacter: anInteger
-
answer true, if anInteger is an illegal unicode code point in an xml file
-
isValidating
-
-
shouldTestWFCEntityDeclared
-
XML::XMLParser
processDocumentStream:'<HalloWelt />' readStream
beforeScanDo:[:parser |
parser
validate:false
].
XML::XMLParser
processDocumentStream:'<Hallo_Welt />' readStream
beforeScanDo:[:parser |
parser
validate:false
].
Fails (invalid character):
XML::XMLParser
processDocumentStream:'<Hallo$Welt />' readStream
beforeScanDo:[:parser |
parser
validate:false
].
|