eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'HTML::HTMLToTextConverter':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: HTMLToTextConverter (in HTML)


Inheritance:

   Object
   |
   +--HTML::Visitor
      |
      +--HTML::TextExtractor
         |
         +--HTML::RichTextExtractor
            |
            +--HTML::HTMLToTextConverter

Package:
stx:goodies/webServer/htmlTree
Category:
Net-Documents-HTML-Utilities
Version:
rev: 1.6 date: 2023/04/25 21:29:54
user: cg
file: HTML__HTMLToTextConverter.st directory: goodies/webServer/htmlTree
module: stx stc-classLibrary: htmlTree

Description:


Similar to TextExtractor, this is a tool to extract the text of some html 
(either a constructed tree, or from a parser).

In contrast to TextExtractor, this one tries to generate a nicer looking
output string, by handling formatting elements (<P>, <BR>, <UL> etc.),
and trying to make useful ascii output for it (i.e. line breaks, empty lines,
bullet lists with '*')
Can be used to display HTML in a tooltip or as a basis for HTML
to WikiText converters, etc.

CAVEAT:
    This is a q&d hack, and more might be needed.

copyright

COPYRIGHT (c) 2018 by eXept Software AG All Rights Reserved This software is furnished under a license and may be used only in accordance with the terms of that license and with the inclusion of the above copyright notice. This software may not be provided or otherwise made available to, or used by, any other person. No title to or ownership of the software is hereby transferred.

Class protocol:

extraction
o  generateTextFromDocument: domTree

o  generateTextFromHtmlString: htmlString


Instance protocol:

initialization
o  initialize

visiting
o  appendString: aStringOrText

o  break

o  paragraph

o  visitBreak: element
(comment from inherited method)
A line break gets visited.

o  visitHeading: aHeadingElement
(comment from inherited method)
A heading gets visited.

o  visitListItem: element
(comment from inherited method)
A list item gets visited.

o  visitParagraph: anElement
(comment from inherited method)
A paragraph gets visited.

o  visitPre: anElement
(comment from inherited method)
A pre gets visited.

o  visitUnorderedList: element
(comment from inherited method)
An unordered list gets visited.


Examples:


     |b document x|

     b := HTML::TreeBuilder new beginWith:(document := Document new).
     b 
        head;
        headEnd;
        body;
          bold; text:'Bla Bla Bla'; boldEnd;
          br; 
          text:'Line2 bla bla'; 
          p; 
            text:'Paragraph';
          pEnd;
          ul; 
            li; text:'bullet1'; liEnd;
            li; 
              ul; 
                li; text:'bullet1.1'; liEnd;
                li; text:'bullet1.2'; liEnd;
              ulEnd; 
            liEnd;
            li; text:'bullet2'; liEnd;
          ulEnd;
          table;
            tr;
              td; text:'aaa'; tdEnd;
              td; text:'bbb'; tdEnd;
            trEnd;
          tableEnd;
        bodyEnd.

     Transcript showCR:document htmlString.
     x := HTML::HTMLToTextConverter generateTextFromDocument:document.
     Transcript showCR:x.


ST/X 7.7.0.0; WebServer 1.702 at 20f6060372b9.unknown:8081; Sun, 22 Dec 2024 12:01:19 GMT