|
Class: HTMLToTextConverter (in HTML)
Object
|
+--HTML::Visitor
|
+--HTML::TextExtractor
|
+--HTML::RichTextExtractor
|
+--HTML::HTMLToTextConverter
- Package:
- stx:goodies/webServer/htmlTree
- Category:
- Net-Documents-HTML-Utilities
- Version:
- rev:
1.6
date: 2023/04/25 21:29:54
- user: cg
- file: HTML__HTMLToTextConverter.st directory: goodies/webServer/htmlTree
- module: stx stc-classLibrary: htmlTree
Similar to TextExtractor, this is a tool to extract the text of some html
(either a constructed tree, or from a parser).
In contrast to TextExtractor, this one tries to generate a nicer looking
output string, by handling formatting elements (<P>, <BR>, <UL> etc.),
and trying to make useful ascii output for it (i.e. line breaks, empty lines,
bullet lists with '*')
Can be used to display HTML in a tooltip or as a basis for HTML
to WikiText converters, etc.
CAVEAT:
This is a q&d hack, and more might be needed.
copyrightCOPYRIGHT (c) 2018 by eXept Software AG
All Rights Reserved
This software is furnished under a license and may be used
only in accordance with the terms of that license and with the
inclusion of the above copyright notice. This software may not
be provided or otherwise made available to, or used by, any
other person. No title to or ownership of the software is
hereby transferred.
extraction
-
generateTextFromDocument: domTree
-
-
generateTextFromHtmlString: htmlString
-
initialization
-
initialize
-
visiting
-
appendString: aStringOrText
-
-
break
-
-
paragraph
-
-
visitBreak: element
-
(comment from inherited method)
A line break gets visited.
-
visitHeading: aHeadingElement
-
(comment from inherited method)
A heading gets visited.
-
visitListItem: element
-
(comment from inherited method)
A list item gets visited.
-
visitParagraph: anElement
-
(comment from inherited method)
A paragraph gets visited.
-
visitPre: anElement
-
(comment from inherited method)
A pre gets visited.
-
visitUnorderedList: element
-
(comment from inherited method)
An unordered list gets visited.
|b document x|
b := HTML::TreeBuilder new beginWith:(document := Document new).
b
head;
headEnd;
body;
bold; text:'Bla Bla Bla'; boldEnd;
br;
text:'Line2 bla bla';
p;
text:'Paragraph';
pEnd;
ul;
li; text:'bullet1'; liEnd;
li;
ul;
li; text:'bullet1.1'; liEnd;
li; text:'bullet1.2'; liEnd;
ulEnd;
liEnd;
li; text:'bullet2'; liEnd;
ulEnd;
table;
tr;
td; text:'aaa'; tdEnd;
td; text:'bbb'; tdEnd;
trEnd;
tableEnd;
bodyEnd.
Transcript showCR:document htmlString.
x := HTML::HTMLToTextConverter generateTextFromDocument:document.
Transcript showCR:x.
|
|