|
Class: KeywordInContextIndexBuilder
Object
|
+--KeywordInContextIndexBuilder
- Package:
- stx:libbasic2
- Category:
- Collections-Support
- Version:
- rev:
1.19
date: 2021/06/20 12:13:50
- user: cg
- file: KeywordInContextIndexBuilder.st directory: libbasic2
- module: stx stc-classLibrary: libbasic2
A support class for building KWIC (Keyword in Context) or KWOC (Keyword out of Context) indexes.
(for example, to build such indexes on html pages or class documentation).
To generate a kwic, add each line together with a reference (or page number, or whatever),
using addLine:reference:.
Then, when finished, enumerate the kwic and print as kwic or kwoc.
To ignore fill words (such as 'and', 'the', 'in', etc.),
define those with the #excluded: messages.
The keyword handling is configurable by providing actions/lists for:
separatorAlgorithm a block which separates lines into individual words
gets a line; delivers a collection of words
excluded a collection of words which are to be ignored
unquoteAlgorithm a block to remove quotes around words.
gets word as argument, delivers unquoted word
keywordMappingAlgorithm
maps keywords; for example, can be used to map 'startsWith'
to 'start', so they appear in the same section.
Gets the word and the set-of-all-words as arguments,
delivers the key into which the word's entries should be placed
matchSorter determines the order in which keywords are listed
[examples:]
see examples method
copyrightCOPYRIGHT (c) 2003 by eXept Software AG
All Rights Reserved
This software is furnished under a license and may be used
only in accordance with the terms of that license and with the
inclusion of the above copyright notice. This software may not
be provided or otherwise made available to, or used by, any
other person. No title to or ownership of the software is
hereby transferred.
instance creation
-
forMethodComments
-
return an indexer for method comments
-
forMethodSelectorIndex
-
return an indexer for method selector components, with word separation at case boundaries
-
new
-
(comment from inherited method)
return an instance of myself without indexed variables
queries
-
defaultFillWordsEnglish
-
-
defaultFillWordsFrench
-
-
defaultFillWordsGerman
-
-
fillWordsEnglish
-
-
fillWordsFrench
-
-
fillWordsGerman
-
accessing
-
excluded: aListOfExcludedWords
-
define words which are to be ignored.
Typically, this is a list of fillwords, such as 'and', 'the', 'in', etc.
-
exclusionFilter: aBlock
-
define an additional filter to exclude more complicated patterns.
This is invoked after filtering by the exclusion list.
If defined, this should return true,if the word is to be excluded.
-
matchSorter: aSortBlock
-
if set, matches will be enumerated in that sort order.
-
separatorAlgorithm: aBlock
-
define the algorithm to split a given string into words.
The default is to split at punctuation and whitespace
(see #initialize)
-
unquoteAlgorithm: aBlock
-
define the algorithm to unquote words.
The default is to unquote single and double quotes
(see #initialize)
building
-
addLine: aLine reference: opaqueReference
-
add a text line; the line is split at words and entered into the kwic.
The reference argument is stored as 'value' of the generated entries.
It can be anything
-
addLine: aLine reference: opaqueReference ignoreCase: ignoreCase
-
add a line to the kwic.
The line is split up into words, and a reference to opaqueReference
is added for each word.
The reference argument is stored as 'value' of the generated entries;
it can be anything
-
remapKeywordsWith: keywordMappingAlgorithm
-
allows for an additional mapper to be applied (after the kwic has been constructed).
This can map multiple different words to the same keword.
It is given the word and the set of already known words as argument.
It may, for example figure out that a word with a long prefix is already in the
list and decide, that a new word should be brought into the same bucket.
For example, if 'starts' is already in the list, and 'startWith' is encountered.
enumerating
-
entriesDo: aFourToSixArgBlock
-
evaluate the argument, for each entry.
If it is a 4-arg block, it is called with:
kwic-word,
left text,
right text
and reference
If it is a 5-arg block, the original text is passed as additional argument.
If it is a 6-arg block, the original text and the context are passed as additional argument.
(stupid, but done for backward compatibility)
initialization
-
initialize
-
(comment from inherited method)
just to ignore initialize to objects which do not need it
building a kwic; print as kwic and kwoc
|kwic|
kwic := KeywordInContextIndexBuilder new.
kwic excluded:#('the' 'and' 'a' 'an' 'in').
kwic addLine:'bla bla bla' reference:1.
kwic addLine:'foo, bar. baz' reference:2.
kwic addLine:'one two three' reference:3.
kwic addLine:'a cat and a dog' reference:4.
kwic addLine:'the man in the middle' reference:5.
kwic addLine:'the man with the dog' reference:6.
Transcript showCR:'Printed as KWIC:'.
kwic
entriesDo:[:word :left :right :ref |
Transcript
show:((left contractTo:20) leftPaddedTo:20);
space;
show:((word contractTo:10) leftPaddedTo:10) allBold;
space;
show:((right contractTo:20) leftPaddedTo:20);
space;
show:'['; show:ref; show:']';
cr
].
Transcript cr.
Transcript showCR:'Printed as KWOC:'.
kwic
entriesDo:[:word :left :right :ref :fullText :context |
Transcript
show:((word contractTo:10) paddedTo:10) allBold;
space;
show:((context contractTo:60) paddedTo:60);
space;
show:'['; show:ref; show:']';
cr
].
|
KWIC index over method selector components; build a little browser window:
|kwic v s c list refs|
kwic := KeywordInContextIndexBuilder new.
Smalltalk allClassesDo:[:eachClass |
eachClass instAndClassSelectorsAndMethodsDo:[:sel :mthd |
kwic addLine:sel reference:mthd.
]
].
v := StandardSystemView new.
v addComponent:(s := HVScrollableView for:SelectionInListView).
s origin:0.0@0.0 corner:1.0@0.5.
v addComponent:(c := HVScrollableView for:CodeView).
c origin:0.0@0.5 corner:1.0@1.0.
refs := OrderedCollection new.
list := OrderedCollection new.
kwic
entriesDo:[:word :left :right :ref |
list add:(word,' ',left,' ',word allBold,' ',right,' (',ref mclass name,')').
refs add:ref].
s list:list.
s action:[:lNr | c contents:(refs at:lNr) source].
v open.
|
KWIC index over method selector components, with word separation:
|kwic|
kwic := KeywordInContextIndexBuilder forMethodSelectorIndex.
Smalltalk allClassesDo:[:eachClass |
eachClass instAndClassSelectorsAndMethodsDo:[:sel :mthd |
kwic addLine:sel reference:mthd.
]
].
kwic
|
KWIC index over method comments:
|kwic v s c refs list|
kwic := KeywordInContextIndexBuilder forMethodComments.
Smalltalk allClassesDo:[:eachClass |
eachClass instAndClassSelectorsAndMethodsDo:[:sel :mthd |
|comment|
(sel == #documentation) ifTrue:[
comment := mthd comment.
comment notNil ifTrue:[
kwic addLine:comment reference:mthd mclass ignoreCase:true.
]
] ifFalse:[
(sel ~~ #examples
and:[ sel ~~ #copyright
and:[ sel ~~ #version]]) ifTrue:[
comment := mthd comment.
comment notNil ifTrue:[
kwic addLine:comment reference:mthd ignoreCase:true.
]
]
]
]
].
kwic.
|
KWIC index over class comments:
|kwic|
kwic := KeywordInContextIndexBuilder forMethodComments.
Smalltalk allClassesDo:[:eachClass |
|mthd comment|
mthd := eachClass theMetaclass compiledMethodAt:#documentation.
mthd notNil ifTrue:[
comment := mthd comment.
comment notNil ifTrue:[
kwic addLine:comment reference:eachClass theNonMetaclass ignoreCase:true.
]
]
].
kwic
|
|