Installing and Running Lemur
(Version 4.12)
Contents
- Installation on Unix
- Installation on Windows (NT and XP)
- Running Applications
- Testing the Toolkit on Sample Data
- Using the API to Write Your Own Application
- Modifying the Toolkit Libraries
1. Installation on Unix
After downloading the Unix Lemur package, follow the following steps to install it:
- Unpack the source
- Configure the makefiles
- Compile Lemur
- Install Lemur library
- Problems with installation
We have dropped the support for any version of gcc older than 3.2. Solutions to some problems with installing Lemur have been posted on the Lemur Forums (or, the older, archived forums).
On the command line, type in the following commands to unpack the package. This should create a directory named lemur-4.12,.
> gunzip lemur-4.12.tar.gz > tar -xvf lemur-4.12.tar
Go to directory lemur-4.12 and run the configuration script configure. This will generate a file named "MakeDefns", which has some customized definitions to be used in makefiles. configure accepts the following arguments:
--enable-distrib compiles and installs the distributed retrieval components. Default is disabled.
--enable-summarization compiles and installs the summarization components. Default is disabled.
--enable-cluster compiles and installs the clustering components. Default is disabled.
--enable-assert Enable assert statements in the code. Default is disabled.
--prefix=Specifies the directory for the installed toolkit. Default is /usr/local.
--enable-java compiles and installs the swig generated java wrappers. Default is disabled.
--enable-php compiles and installs the swig generated php wrappers. Default is disabled.
--enable-csharp compiles and installs the swig generated C# wrappers. Default is disabled.
--with-javahome=<path> Path to JAVAHOME for compiling the swig generated shared library.
--with-php-config=<path> Path to php-config binary. Only required if php-config is not on the path.
--with-swig=<path> Path to swig binary. Only required if the wrapper interfaces are changed.
--with-site-seed=<hostname> Hostname to use as the seed for building a site search index.
For example, to configure Lemur with the default libraries:
lemur-4.12>./configure
Or to configure Lemur with some modules:
lemur-4.12>./configure --enable-distrib --enable-summarization
With directory lemur-4.12 as the current working directory, type in "make". This will compile the whole Lemur toolkit and link all the Lemur applications.
lemur-4.12> make
After compiling Lemur, type in "make install". This will install the Lemur library and include files according to the directory specified by the prefix option of the configure script. If you change the prefix, be sure to enable all of the modules that you wanted to install.
For example:
lemur-4.12> ./configure --prefix=/usr0/mydir-for-lemur lemur-4.12> make install
will create /usr0/mydir-for-lemur/lib/liblemur.a and install the header files in /usr0/mydir-for-lemur/include/. The application executables will be all in /usr0/mydir-for-lemur/bin.
If configured with --enable-java, documentation for the Lemur JNI will be installed in <install-directory>/share/lemur/JNIdoc. The file index.html points into the javadoc generated documentation.
If configured with --enable-java, the shared library will be installed in <install-directory>/lib/liblemur_jni.so and the java class files will be installed in <install-directory>/share/lemur/lemur.jar and <install-directory>/share/lemur/indri.jar, for the Lemur and Indri APIS. You will need to add <install-directory>/lib to your LD_LIBRARY_PATH and add the appropriate jar file(s) to your CLASSPATH to use the JNI interface.
Four additional jar files are installed. RetUI.jar provides a basic
document retrieval GUI for interactive queries, using the Indri
API. IndexUI.jar provides a
basic collection indexing GUI for building an indri
repository. LemurRet.jar provides a basic document retrieval GUI for
interactive queries using the Lemur API. LemurIndex.jar provides a basic
collection indexing GUI for building Lemur indexes. All are
installed in <install-directory>/share/lemur and can be run with
java -jar <jarfilename>
If configured with --enable-php, the shared library will be installed in <install-directory>/lib/libindri_php.so. You will need to manually install it in the correct extensions directory for your php configuration. Note that only portions of the Indri API are wrapped for use with PHP.
If configured with --enable-csharp, the shared library will be installed in <install-directory>/lib/liblemur_csharp.so. The C# wrapper classes assembly will be installed in <install-directory>/lib/LemurCsharp.dll This assembly should be referenced by your C# program.
For users who are only interested in using Lemur as a library and application suite, the original source tree (i.e., the lemur-4.12 directory) can be removed after this step.
2. Installation on Windows (NT and XP)
Download and install the lemur toolkit with the windows executable installer. The lemur applications, libraries, and include files will be installed in the selected target directory (default is C:\Program Files\Lemur\Lemur 4.12), with the applications in the subfolder bin, the library lemur.lib in the subfolder lib, the include files in the subfolder include. The installer can add the bin subfolder to the search path to facilitate running the lemur applications.
If you wish to compile the toolkit from source using Visual Studio, please see the instructions on using the toolkit with Visual Studio.
Using lemur.lib to build an application
After installing the lemur toolkit, you can use the library by adding the subfolder include of the target directory to the C/C++ General Additional Include Directories property for your project, eg:
C:\Program Files\Lemur\Lemur 4.12\include
and adding the subfolder lib of the target directory to the Linker General Additional Library Directories property for your project, eg:C:\Program Files\Lemur\Lemur 4.12\lib
and adding lemur.lib and wsock32.lib to the Linker Input Additional Dependencies property for your project.If your project is configured as Debug, you should choose the Multi-threaded Debug DLL (/MDd) runtime library. If your project is configured as Release, you should choose the Multi-threaded DLL (/MD) runtime library. The lemur library and applications were built in Release mode using Multi-Threaded. You should have C/C++ Language Enable Run-Time Type Info set to yes.
Building the libraries and applications
The installer can optionally install the full lemur toolkit source tree, placing it in the src subfolder of the target directory. That folder contains the Visual Studio solution file Lemur.sln. There is a separate project file for each library and for each application in Lemur.
By default the project configurations are built in "Debug" mode. To change this so that it compiles with fewer warnings and runs at higher efficiency, change the configuration setting in the "Build" menu. Then choose "Configuration Manager". In the menu for "Active Solution Configuration", choose "Release".
When built from source, there is a separate library for each of the sub-libraries that are compiled into "lemur.lib". The combined library, "lemur.lib", is built in the lemur subfolder, with output in either Release or Debug, depending on configuration.
3. Running Applications
Most Lemur Applications
The executables for Lemur applications are generated in the directory app/obj; they will be copied to prefix/bin (as configured by configure) after running "make install".
The usage for different applications may vary, but most applications tend to have the following general usage.
Create a parameter file with value definitions for all the input variables of an application. Parameter files are in XML format. The top level element in the parameter file is named parameters. For example,
<parameters> <dataFiles>/usr3/web/sourcelist</dataFiles> <index>/usr3/web/myindex</index> <indexType>inv</indexType> <memory>128000000</memory> <docFormat>web</docFormat> <position>true</position> </parameters>
In general, all the file paths must be absolute paths in accordance to your operating system. Lemur does not have the capability of searching for files along different paths.
Run the application program with the parameter as the only argument, or the first argument, if the application can take other parameters from the command line. Most applications only recognize parameters defined in the parameter file, but there are some exceptions.
For example, if the parameter file above is named buildparam in the directory /usr3/web, then just do:
/usr3/web> BuildIndex buildparam
Most applications will display a usage or a list of required input variables, if you run it with the "--help" option. For more information about the specific applications and their parameters, please see Lemur Modules and Applications .
4. Testing the Toolkit on Sample Data
The Lemur Toolkit comes with a sample data directory which includes a small public information retrieval testing collection (i.e., the CACM collection). This sample data is provided to let you easily try the toolkit and will help you to understand the capabilities of the toolkit as well as how to use them.
The directory has some test scripts, including test_indri_index.sh, test_pos_index.sh, test_key_index.sh,and test_struct_query.sh. The test index scripts use the specified indexes and demonstrates most of the functionality of Lemur, i.e., from formatting a database, building an index, to running various kinds of retrieval experiments. clean.sh cleans up any files generated by any of the testing scripts. For more information about the indexes and how they differ, please see the indexing guide.
Basically, the scripts would start from a source database file and a query file with some simple SGML format, and build an index of the database and a support file that is necessary to make some retrieval algorithms fast, and then, they will run different retrieval experiments with different parameter files. The retrieval results can be evaluated with ( trec_eval ) to generate a precision recall summary file for each experiment. Edit the scripts to point to your installed version of trec_eval. If it is not found, the scripts will run without generating the summary report.
You can try to change some of the settings in the parameter files and see how it will affect the retrieval performance.
5. Using the Lemur API to Write Your Own Application
For writing applications using Visual Studio .NET or Visual Studio 2005, please see this page on using the Lemur Toolkit API with Visual Studio.
To use the Lemur API on Unix, you will need both the Lemur library file and all the header files. The installation script of Lemur puts the library file in prefix/lib/liblemur.a and all the header files in prefix/include/. Header files in C have the extension of .h, while a C++ header file has an extension of .hpp.
An application level Makefile that you can use for your own applications has been included. To use it:
- Copy Makefile.app from the top level lemur directory to the
directory with your application's source code. Edit the file
and fill in values for the following:
OBJS -- list of each of the object files needed to build your application.
PROG -- name for your application. - Use make -f Makefile.app to build your application.
You will use the Lemur library exactly in the same way as you would use any other C++ library. This means you generally do the following:
- In your C++ code, include the relevant Lemur header files.
- When compiling your code, use -Iprefix/include as an option so that the compiler can find the included files. (where prefix is as specified when running configure.)
- When linking your code, use -Lprefix/lib as an option so that the linker can find the Lemur library. Also, you need to specify -llemur as a linking option to indicate that you want the Lemur library to be linked with your code. See Makefile.app for the list of other libraries that are required to link with -llemur. You may need to be careful about the order of the libraries you specified. The order reflects the assumed dependency among the libraries.
6. Modifying the Toolkit Libraries
Modifying the toolkit is not recommended, but individual users may need to customize its behavior.
- To modify an existing file or add a file to an existing directory:
- Make the changes
- follow the instructions above for making the toolkit library and applications.
- follow the instructions above for installing the toolkit library.
- To add a new (library) module to the toolkit:
- Add the module subdirectory to lemur, for example "<new-module-dir>".
- Put all include files in the subdirectory named "<new-module-dir>/include".
- Put all implementation files in the subdirectory named "<new-module-dir>/src".
- Add the module directory name to the Makefile variable LIBDIRS and to the MakeDefns variables MODULES and ALLMODULES. New modules should be placed at the front of the lists. Note: If you rerun configure, you will have to make this change again. Advanced users should edit MakeDefns.in to add the module directory name to ALLMODULES and edit configure.ac to add an AC_ARG_ENABLE for the new module (see the distrib entry in configure.ac) and then use autoconf to generate a new configure script.
- Copy a Makefile from an existing module directory (e.g, index/src/Makefile) to <new-module-dir>/src, and change the variable MODULE to the name of the new module (<new-module-dir>).