indri::index::TermBitmap Class Reference
#include <TermBitmap.hpp>
List of all members.
Detailed Description
TermBitmap is used to convert termIDs when many DiskIndexes are merged together. The add() function has very strict preconditions; both from and to must increase on every call, and from must always be less than to.
This data is stored in 32-byte bitmap chunks with the following form: 4 bytes - fromBase 4 bytes - toBase 24 bytes - bitmap
Each bit set in the bitmap region corresponds to a (from, to) pair.
Suppose the beginning of the bitmap looks like this: 000100100110000.... This could be represented by the following pairs: (1, 4) (2, 7) (3, 10) (4, 11) For instance, the (2, 7) pair says that the second non-zero bit is at index 7. This (2, 7) pair is translated to mean that (fromBase + 2, toBase + 7) is a pair stored by some explicit add() call.
To save on heap overhead, we manage blocks of 64K each in Buffer objects, which are stored in the vector called _maps.
The TermBitmap is used because, in the ideal case, it is much more space efficient than the simpler approach of using an array mapping. In an array, we'd need 32 bits for each (from, to) pair. In the case where the (from, to) pairs are optimally dense [e.g. (1,1), (2,2), (3,3) ... ], the TermBitmap uses 1.33 bits per pair.
Constructor & Destructor Documentation
indri::index::TermBitmap::TermBitmap |
( |
|
) |
[inline] |
|
indri::index::TermBitmap::~TermBitmap |
( |
|
) |
[inline] |
|
Member Function Documentation
void indri::index::TermBitmap::_addBufferIfNecessary |
( |
|
) |
[inline, private] |
|
int indri::index::TermBitmap::_bitsSet |
( |
unsigned char |
c |
) |
[inline, private] |
|
const char* indri::index::TermBitmap::_findInBuffer |
( |
indri::utility::Buffer * |
b, |
|
|
int |
from |
|
) |
[inline, private] |
|
void indri::index::TermBitmap::add |
( |
int |
from, |
|
|
int |
to |
|
) |
[inline] |
|
void indri::index::TermBitmap::add |
( |
int |
to |
) |
[inline] |
|
int indri::index::TermBitmap::get |
( |
int |
from |
) |
[inline] |
|
int indri::index::TermBitmap::lastFrom |
( |
|
) |
[inline] |
|
size_t indri::index::TermBitmap::memorySize |
( |
|
) |
[inline] |
|
Member Data Documentation
The documentation for this class was generated from the following file:
Generated on Tue Jun 15 11:03:00 2010 for Lemur by
1.3.4