eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'XML::XMLParser':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: XMLParser (in XML)


Inheritance:

   Object
   |
   +--XML::XMLParser
      |
      +--XML::XMLParserStX

Package:
stx:goodies/xml/vw
Category:
XML-VW-Parsing
Version:
rev: 1.77 date: 2019/05/28 12:35:59
user: stefan
file: XMLParser.st directory: goodies/xml/vw
module: stx stc-classLibrary: vw

Description:


This class represents the main XML processor in the system. This  XMLParser may be used as a validating or non-validating parser to scan and process an XML document and provide access to it's content and structure to a smalltalk application. This XMLParser tries to follow the guidelines laid out in the W3C XML Version 1.0 Recommendation, plus the XML Namespaces Recommendation.

Instance Variables:
	sourceStack     <XML.StreamWrapper>  stack of input streams that handles inclusion.
	hereChar        <Character>  the current character being parsed
	lastSource      <XML.StreamWrapper>  record of previous source used to check correct nesting
	currentSource   <XML.StreamWrapper>  current input stream (the top of sourceStack)
	documentNode    <XML.Document>  the document created by parsing
	dtd     <XML.DocumentType>  the document type definition for the current document
	unresolvedIDREFs        <Collection>  collection of IDREfs that have yet to be resolved; used for validation
	builder <XML.NodeBuilder>  node builder
	validating      <Boolean>  if true then the parse validates the XML
	ignore  <Boolean>  ?
	eol     <Character>  the end-of-line character in the source stream


Class protocol:

attribute processing
o  isValidName: aTag

o  isValidNmToken: aTag

class initialization
o  initialize
XMLParser initialize

defaults
o  defaultNormalizeAttributes: aBoolean

dialects
o  concreteClass
return the concrete parser class, per smalltalk dialect

instance creation
o  new

o  on: aStream

o  on: aStream protocol: protocolString name: name

o  parse: aStringOrStream
parse the xml in aStringOrStream;
return a DOM-tree

o  parseDtdAsPatterns: aStringOrStream
parse a document type from aStringOrStream.
Do not normalize the DTD patterns, so they can be used for type construction.
Answer a XML::DocumentType.

o  parseDtdString: aStringOrStream
parse a dtd from a aStringOrStream

o  processDocumentInFilename: aFilename

o  processDocumentInFilename: aFilename beforeScanDo: aBlock

o  processDocumentStream: aStream

o  processDocumentStream: aStream beforeScanDo: aBlock
UTF-8 EF BB BF

o  processDocumentString: aString

o  processDocumentString: aString beforeScanDo: aBlock

private
o  isValidNameChar: c
cg: this is not correct:
^ c isLetterOrDigit or: [c == $- or:[c ==$_]]
a name may also contain much more...

o  isValidNameStart: c
cg: this is not correct;
^ c isLetter or: [c ==$_ ]
a name may contain much more...

utilities
o  invalid: aString

o  malformed: aString

o  mapEncoding: anEncoding
visualworks specific: map xml-encoding names to vw encodedStream names

o  warn: aString
Added to unify warnings for SAX. REW


Instance protocol:

DTD processing
o  conditionalSect

o  dtdEntry

o  dtdFile: newURI
So we don't lose hereChar.

o  dtdStream: aStream rootElement: rootElementNameString
set the DTD from the contents of aStream

o  externalID: usage
Usage may be #docType, #entity, or #notation.
DocType is treated specially, since PE references are not allowed.
Notation is treated specially since the system identifier of the
PUBLIC form is optional.

o  inInternalSubset

o  markUpDecl

o  notationDecl

o  pubIdLiteral

o  systemLiteral

o  uriResolver

IDs
o  checkUnresolvedIDREFs

o  rememberIDREF: anID

o  resolveIDREF: anID

accessing
o  builder
return the value of the instance variable 'builder' (automatically generated)

o  document
cg: added for twoFlower *compatibilitz with newer XMLParser framework

o  dtd

o  encoding

o  eol

o  isEncodeChecking

o  isEncodeChecking: aBoolean

o  isTreeBuilding
answer true, if we build a tree of xml elements.
This is false for SAX parsing

o  isTreeBuilding: something

o  normalizeAttributes

o  normalizeAttributes: aBoolean
controls if attribute values like ' foo bar ' are normalized to
'foo bar' or not. The default is true.
If you have to parse non-standard XML, you can set this to false
before parsing

o  normalizeDtd

o  normalizeDtd: something

o  sourceWrapper
last

o  validate: aBoolean

api
o  comment

o  docTypeDecl

o  latestURI

o  misc
comment or PI

o  parseDtd
parse a plain dtd

o  pi

o  prolog
This is optional.

o  pushSource: aStreamWrapper

o  scanDocument
MessageTally spyOn:[

attribute def processing
o  attListDecl

o  completeNotationType

o  defaultDecl
^(self skipIf: '#REQUIRED')

o  enumeration

attribute processing
o  attValue
cg: must eat all other spaces ...
do it here, to limit changes to one place.
Q: is this true?

o  attribute

o  isValidName: arg

o  isValidNmToken: arg

o  processAttributes
(attributes collect: [:i | i key]) asSet size = attributes size
ifFalse: [self notPermitted: 'two attributes with the same name']


o  quotedString

o  validateAttributes: attributes for: tag

element def processing
o  completeChildren: str

o  completeMixedContent: str
we already have the #PCDATA finished.

o  contentsSpec
^(self skipIf: 'ANY')

o  cp

o  elementDecl

element processing
o  charEntity: data startedIn: str1
parse a character entity and add it to data.
cg: separated into parsinf the entity and adding to the stream

o  closeTag: tag return: elements

o  completeCDATA: str1
data := CharacterWriteStream on:(String new: 32).

o  completeComment: str1
OLD:

o  completePI: str1
pi := self upToAll_positionBefore:'?>'

o  element

o  elementAtPosition: startPosition

o  elementContent: tag openedIn: str
(data findString: ']]>' startingAt: 1) = 0
ifFalse: [self halt: 'including ]]> in element content'].

o  generalEntityInText: str canBeExternal: external

o  isValidTag: aTag

o  parseCharEntityStartedIn: str1
parse a character entity.
cg: separated into parsing and separate adding to the stream

entity processing
o  PERef: refType
if we are in IGNORE conditional, this is not an error. gj

o  entityDecl
peDef modified for SAX. REW

o  entityDef: entityName
Parameter entityName added for SAX. REW

o  entityValue

o  generalEntity: str

o  nDataDecl
^self skipSpaceInDTD

o  peDef: entityName
Parameter entityName added for SAX. REW

initialization
o  builder: anXMLNodeBuilder

o  lineEndLF

o  on: inputStream

o  on: inputStream protocol: protocolString name: name

o  wrapStream: aStream protocol: protocolString name: name

private
o  checkForWrongRootNode

o  closeAllFiles

o  documentNode

o  error: aStringOrMessage

o  expected: string

o  fullSourceStack

o  getDottedName

o  getElement
cg: added for twoFlower *compatibility with newer XMLParser framework

o  getQualifiedName
original:

o  getSimpleName

o  invalid: aString

o  malformed: aString

o  nmToken

o  notPermitted: string

o  validateEncoding: encName
validate the encoding string in encName.
Set the encoding instVar as a side effect.

o  validateText: data from: start to: stop testBlanks: testBlanks
cg: added for twoFlower *compatibilitz with newer XMLParser framework

o  warn: aString
Modfied to unify warn system for SAX, REW

o  with: list add: node

streaming
o  atEnd

o  forceSpace

o  forceSpaceInDTD

o  getNextChar

o  mustFind: str

o  nextChar
avoid #atEnd if possible (let #next return nil)

o  skipIf: str

o  skipSpace
answer true, if any whitespace was skipped

o  skipSpaceInDTD

o  upTo: aCharacter
Answer a subcollection from position to the occurrence (if any, exclusive) of anObject.
The stream is left positioned after anObject.
If anObject is not found answer everything.

o  upToAll: target
Answer a subcollection from the current position
up to the occurrence (if any, not inclusive) of target,
and leave the stream positioned after the occurrence.
If no occurrence is found, answer the entire remaining
stream contents, and leave the stream positioned at the end.
We are going to cheat here, and assume that the first
character in the target only occurs once in the target, so
that we don't have to backtrack.

o  upToAll_positionBefore: target
Answer a subcollection from the current position
up to the occurrence (if any, not inclusive) of target,
and leave the stream positioned before the occurrence.
If no occurrence is found, answer the entire remaining
stream contents, and leave the stream positioned at the end.
We are going to cheat here, and assume that the first
character in the target only occurs once in the target, so
that we don't have to backtrack.

testing
o  documentHasDTD

o  hasExpanded: anEntity

o  isIllegalCharacter: anInteger
answer true, if anInteger is an illegal unicode code point in an xml file

o  isValidating

o  shouldTestWFCEntityDeclared


Examples:


XML::XMLParser processDocumentStream:'<HalloWelt />' readStream beforeScanDo:[:parser | parser validate:false ]. XML::XMLParser processDocumentStream:'<Hallo_Welt />' readStream beforeScanDo:[:parser | parser validate:false ]. Fails (invalid character): XML::XMLParser processDocumentStream:'<Hallo$Welt />' readStream beforeScanDo:[:parser | parser validate:false ].

ST/X 7.2.0.0; WebServer 1.670 at bd0aa1f87cdd.unknown:8081; Fri, 19 Apr 2024 01:13:05 GMT