eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'KeywordInContextIndexBuilder':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: KeywordInContextIndexBuilder


Inheritance:

   Object
   |
   +--KeywordInContextIndexBuilder

Package:
stx:libbasic2
Category:
Collections-Support
Version:
rev: 1.15 date: 2016/11/04 12:14:03
user: cg
file: KeywordInContextIndexBuilder.st directory: libbasic2
module: stx stc-classLibrary: libbasic2
Author:
Claus Gittinger (cg@alan)

Description:


A support class for building KWIC (Keyword in Context) or KWOC (Keyword out of Context) indexes.
(for example, to build such indexes on html pages or class documentation).

To generate a kwic, add each line together with a reference (or page number, or whatever),
using addLine:reference:.
Then, when finished, enumerate the kwic and print as kwic or kwoc.

To ignore fill words (such as 'and', 'the', 'in', etc.), 
define those with the #excluded: messages.

The keyword handling is configurable by providing actions/lists for:
    separatorAlgorithm      a block which separates lines into individual words
                            gets a line; delivers a collection of words

    excluded                a collection of words which are to be ignored

    unquoteAlgorithm        a block to remove quotes around words. 
                            gets word as argument, delivers unquoted word

    keywordMappingAlgorithm 
                            maps keywords; for example, can be used to map 'startsWith'
                            to 'start', so they appear in the same section.
                            Gets the word and the set-of-all-words as arguments,
                            delivers the key into which the word's entries should be placed  
                            
    matchSorter             determines the order in which keywords are listed
    

[examples:]
    see examples method

    


Related information:

    [ttps]
    (english)
    [ttps]
    (german)

Class protocol:

instance creation
o  forMethodComments
return an indexer for method comments

o  forMethodSelectorIndex
return an indexer for method selector components, with word separation at case boundaries

o  new


Instance protocol:

accessing
o  excluded: aListOfExcludedWords
define words which are to be ignored.
Typically, this is a list of fillwords, such as 'and', 'the', 'in', etc.

o  exclusionFilter: aBlock
define an additional filter to exclude more complicated patterns.
This is invoked after filtering by the exclusion list.
If defined, this should return true,if the word is to be excluded.

o  matchSorter: aSortBlock
if set, matches will be enumerated in that sort order.

o  separatorAlgorithm: aBlock
define the algorithm to split a given string into words.
The default is to split at punctuation and whitespace
(see #initialize)

o  unquoteAlgorithm: aBlock
define the algorithm to unquote words.
The default is to unquote single and double quotes
(see #initialize)

building
o  addLine: aLine reference: opaqueReference
add a text line; the line is split at words and entered into the kwic.
The reference argument is stored as 'value' of the generated entries.
It can be anything

o  addLine: aLine reference: opaqueReference ignoreCase: ignoreCase
add a line to the kwic.
The line is split up into words, and a reference to opaqueReference
is added for each word.
The reference argument is stored as 'value' of the generated entries;
it can be anything

o  remapKeywordsWith: keywordMappingAlgorithm
allows for an additional mapper to be applied (after the kwic has been constructed).
This can map multiple different words to the same keword.
It is given the word and the set of already known words as argument.
It may, for example figure out that a word with a long prefix is already in the
list and decide, that a new word should be brought into the same bucket.
For example, if 'starts' is already in the list, and 'startWith' is encountered.

enumerating
o  entriesDo: aFourToSixArgBlock
evaluate the argument, for each entry.
If it is a 4-arg block, it is called with:
kwic-word,
left-text,
right text
and reference
If it is a 5-arg block, the original text is passed as additional argument.
If it is a 6-arg block, the original text and the context are passed as additional argument.
(stupid, but done for backward compatibility)

initialization
o  initialize


Examples:


building a kwic; print as kwic and kwoc
|kwic|

kwic := KeywordInContextIndexBuilder new.
kwic excluded:#('the' 'and' 'a' 'an' 'in').

kwic addLine:'bla bla bla' reference:1.
kwic addLine:'foo, bar. baz' reference:2.
kwic addLine:'one two three' reference:3.
kwic addLine:'a cat and a dog' reference:4.
kwic addLine:'the man in the middle' reference:5.
kwic addLine:'the man with the dog' reference:6.

Transcript showCR:'Printed as KWIC:'.
kwic 
    entriesDo:[:word :left :right :ref |
        Transcript 
            show:((left contractTo:20) leftPaddedTo:20);
            space;
            show:((word contractTo:10) leftPaddedTo:10) allBold;
            space;
            show:((right contractTo:20) leftPaddedTo:20);
            space;
            show:'['; show:ref; show:']';
            cr    
    ].

Transcript cr.
Transcript showCR:'Printed as KWOC:'.
kwic 
    entriesDo:[:word :left :right :ref :fullText :context |
        Transcript 
            show:((word contractTo:10) paddedTo:10) allBold;
            space;
            show:((context contractTo:60) paddedTo:60);
            space;
            show:'['; show:ref; show:']';
            cr    
    ].
KWIC index over method selector components; build a little browser window:
|kwic v s c list refs|

kwic := KeywordInContextIndexBuilder new.
Smalltalk allClassesDo:[:eachClass |
    eachClass instAndClassSelectorsAndMethodsDo:[:sel :mthd |
        kwic addLine:sel reference:mthd.
    ]
].

v := StandardSystemView new.
v addComponent:(s := HVScrollableView for:SelectionInListView).
s origin:0.0@0.0 corner:1.0@0.5.
v addComponent:(c := HVScrollableView for:CodeView).
c origin:0.0@0.5 corner:1.0@1.0.

refs := OrderedCollection new.
list := OrderedCollection new.
kwic 
    entriesDo:[:word :left :right :ref |
        list add:(word,' ',left,' ',word allBold,' ',right,' (',ref mclass name,')').
        refs add:ref].
s list:list.
s action:[:lNr | c contents:(refs at:lNr) source].
v open.
KWIC index over method selector components, with word separation:
|kwic|

kwic := KeywordInContextIndexBuilder forMethodSelectorIndex.

Smalltalk allClassesDo:[:eachClass |
    eachClass instAndClassSelectorsAndMethodsDo:[:sel :mthd |
        kwic addLine:sel reference:mthd.
    ]
].
kwic
KWIC index over method comments:
|kwic v s c refs list|

kwic := KeywordInContextIndexBuilder forMethodComments.

Smalltalk allClassesDo:[:eachClass |
    eachClass instAndClassSelectorsAndMethodsDo:[:sel :mthd |
        |comment|

        (sel == #documentation) ifTrue:[
            comment := mthd comment.
            comment notNil ifTrue:[
                kwic addLine:comment reference:mthd mclass ignoreCase:true.
            ]
        ] ifFalse:[
            (sel ~~ #examples
            and:[ sel ~~ #copyright
            and:[ sel ~~ #version]]) ifTrue:[
                comment := mthd comment.
                comment notNil ifTrue:[
                    kwic addLine:comment reference:mthd ignoreCase:true.
                ]
            ]
        ]
    ]
].
kwic.
KWIC index over class comments:
|kwic|

kwic := KeywordInContextIndexBuilder forMethodComments.

Smalltalk allClassesDo:[:eachClass |
    |mthd comment|

    mthd := eachClass theMetaclass compiledMethodAt:#documentation.
    mthd notNil ifTrue:[
        comment := mthd comment.
        comment notNil ifTrue:[
            kwic addLine:comment reference:eachClass theNonMetaclass ignoreCase:true.
        ]
    ]
].
kwic


ST/X 7.2.0.0; WebServer 1.670 at bd0aa1f87cdd.unknown:8081; Fri, 19 Apr 2024 00:22:58 GMT