NAME |
Synopsis |
Description |
Distributions |
Constructor and initialization |
Method: build_index() |
Required Modules |
Example code |
Author |
Copyright |
NAME
SynopsisSee below for a warning about Set::Array V 0.11. my($dataset) = [
{ # Index: 0.
address => 'Here',
event => 'Start',
time => 'Now',
},
{ # Index: 1.
address => 'Heaven',
event => 'Exit',
time => 'Then',
},
{ # Index: 2.
address => 'There',
event => 'Finish',
time => 'Thus',
}
];
my($keyset) = [qw/address time/];
my($index) = Search::InvertedIndex::Simple -> new
(
dataset => $dataset,
keyset => $keyset,
) -> build_index();
my(@index) = $$index{'address'}{'He'} -> intersection($$index{'time'}{'T'}) );
This code is discussed in the next section. See t/test.t for a complete program. Description
You might like to run the example program before reading this explanation. The input to new(dataset => $a, keyset => $k) is an arrayref of data (each element of which is a hashref), and an arrayref of keys. The arrayref of data is in the format returned by many DBI methods,
eg DBI's The arrayref of keys is used to select a subset of the keys within each hashref. These selected keys become the primary keys in the hashref returned by the method In the example in the synopsis, The values (assumed to be strings) from the arrayref of data corresponding to those keys are used to create a set of secondary keys under each of these primary keys. The secondary keys are created by taking these values, growing them one character at a time, and using these
generated strings as the secondary keys in the hashref returned by the method In the example in the synopsis, This means that all data values for the key 'address', and all prefixes of those values, are used to create entries in the returned hashref. Similary, the primary key 'time' will have a set of secondary keys. It should be clear by now that these sets of secondary keys can be used for searching for the existence of values, eg by using as input user-supplied data of any length. At the same time, any number of keys can be searched for simultaneously. Hence we have (using the example data above): my($indexer) = Search::InvertedIndex::Simple -> new(...); my($index) = $indexer -> build_index(); where $$index{'address'} is a hashref, and
$$index{'address'}{'H'} is X, and
$$index{'address'}{'He'} is Y.
But what are X and Y? They are objects of type Set::Array, and they contain lists of array indexes. The values of these indexes are the indexes from the original dataset which contributed to the construction of the hashref of secondary keys under each primary key. That is: $$index{'address'}{'H'} is an object of type Set::Array -> new(0, 1)
because indexes 0 and 1 in the dataset contain an address starting with 'H'. And: $$index{'address'}{'He'} is an object of Set::Array -> new(0, 1)
for the same reason. But: $$index{'address'}{'Hea'} is an object of Set::Array -> new(1)
$$index{'address'}{'Her'} is an object of Set::Array -> new(0)
because address 'Hea' comes from index 1 in the dataset, and address 'Her' comes from index 0. Similarly: $$index{'time'}{'T'} is an object of Set::Array -> new(1, 2)
because time 'T' comes from dataset indexes 1 (time => Then) and 2 (time => Thus). Now we can tell instantaneously which elements of the dataset contain the results of a multi-key search: @index = $$index{'address'}{'He'} -> intersection($$index{'time'}{'T'}) );
That is, @index = (1). In other words, $$dataset[1] contains the only hashref where we have an address value starting with 'He' and a time value starting with 'T'. Here, DistributionsThis module is available both as a Unix-style distro (*.tgz) and an ActiveState-style distro (*.ppd). The latter is shipped in a *.zip file. See http://savage.net.au/Perl-modules.html for details. See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing each type of distro. Constructor and initializationnew(...) returns a This is the class's contructor. Parameters to
Method:
|
| Top of page |