|
Class: HTMLToTextConverter (in HTML)
Object
|
+--HTML::Visitor
|
+--HTML::TextExtractor
|
+--HTML::RichTextExtractor
|
+--HTML::HTMLToTextConverter
- Package:
- stx:goodies/webServer/htmlTree
- Category:
- Net-Documents-HTML-Utilities
- Version:
- rev:
1.3
date: 2018/06/05 05:35:19
- user: cg
- file: HTML__HTMLToTextConverter.st directory: goodies/webServer/htmlTree
- module: stx stc-classLibrary: htmlTree
- Author:
- Claus Gittinger
Similar to TextExtractor, this is a tool to extract the text of some html
(either a constructed tree, or from a parser).
In contrast to TextExtractor, this one tries to generate a nicer looking
output string, by handling formatting elements (<P>, <BR>, <UL> etc.),
and trying to make useful ascii output for it (i.e. line breaks, empty lines,
bullet lists with '*')
Can be used to display HTML in a tooltip or as a basis for HTML
to WikiText converters, etc.
CAVEAT:
This is a q&d hack, and more might be needed.
extraction
-
generateTextFromDocument: domTree
-
-
generateTextFromHtmlString: htmlString
-
initialization
-
initialize
-
visiting
-
appendString: aStringOrText
-
-
break
-
-
paragraph
-
-
visitBreak: element
-
-
visitHeading: aHeadingElement
-
(comment from inherited method)
A heading gets visited.
-
visitListItem: element
-
-
visitParagraph: anElement
-
(comment from inherited method)
A paragraph gets visited.
-
visitUnorderedList: element
-
|b document x|
b := HTML::TreeBuilder new beginWith:(document := Document new).
b
head;
headEnd;
body;
bold; text:'Bla Bla Bla'; boldEnd;
br;
text:'Line2 bla bla';
p;
text:'Paragraph';
pEnd;
ul;
li; text:'bullet1'; liEnd;
li;
ul;
li; text:'bullet1.1'; liEnd;
li; text:'bullet1.2'; liEnd;
ulEnd;
liEnd;
li; text:'bullet2'; liEnd;
ulEnd;
table;
tr;
td; text:'aaa'; tdEnd;
td; text:'bbb'; tdEnd;
trEnd;
tableEnd;
bodyEnd.
Transcript showCR:document htmlString.
x := HTML::HTMLToTextConverter generateTextFromDocument:document.
Transcript showCR:x.
|
|