eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'HTML::HTMLToTextConverter':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: HTMLToTextConverter (in HTML)


Inheritance:

   Object
   |
   +--HTML::Visitor
      |
      +--HTML::TextExtractor
         |
         +--HTML::RichTextExtractor
            |
            +--HTML::HTMLToTextConverter

Package:
stx:goodies/webServer/htmlTree
Category:
Net-Documents-HTML-Utilities
Version:
rev: 1.3 date: 2018/06/05 05:35:19
user: cg
file: HTML__HTMLToTextConverter.st directory: goodies/webServer/htmlTree
module: stx stc-classLibrary: htmlTree
Author:
Claus Gittinger

Description:


Similar to TextExtractor, this is a tool to extract the text of some html 
(either a constructed tree, or from a parser).

In contrast to TextExtractor, this one tries to generate a nicer looking
output string, by handling formatting elements (<P>, <BR>, <UL> etc.),
and trying to make useful ascii output for it (i.e. line breaks, empty lines,
bullet lists with '*')
Can be used to display HTML in a tooltip or as a basis for HTML
to WikiText converters, etc.

CAVEAT:
    This is a q&d hack, and more might be needed.


Class protocol:

extraction
o  generateTextFromDocument: domTree

o  generateTextFromHtmlString: htmlString


Instance protocol:

initialization
o  initialize

visiting
o  appendString: aStringOrText

o  break

o  paragraph

o  visitBreak: element

o  visitHeading: aHeadingElement
(comment from inherited method)
A heading gets visited.

o  visitListItem: element

o  visitParagraph: anElement
(comment from inherited method)
A paragraph gets visited.

o  visitUnorderedList: element


Examples:


     |b document x|

     b := HTML::TreeBuilder new beginWith:(document := Document new).
     b 
        head;
        headEnd;
        body;
          bold; text:'Bla Bla Bla'; boldEnd;
          br; 
          text:'Line2 bla bla'; 
          p; 
            text:'Paragraph';
          pEnd;
          ul; 
            li; text:'bullet1'; liEnd;
            li; 
              ul; 
                li; text:'bullet1.1'; liEnd;
                li; text:'bullet1.2'; liEnd;
              ulEnd; 
            liEnd;
            li; text:'bullet2'; liEnd;
          ulEnd;
          table;
            tr;
              td; text:'aaa'; tdEnd;
              td; text:'bbb'; tdEnd;
            trEnd;
          tableEnd;
        bodyEnd.

     Transcript showCR:document htmlString.
     x := HTML::HTMLToTextConverter generateTextFromDocument:document.
     Transcript showCR:x.


ST/X 7.2.0.0; WebServer 1.670 at bd0aa1f87cdd.unknown:8081; Sun, 27 Nov 2022 08:02:25 GMT