summaryrefslogtreecommitdiffstats
path: root/README.txt
blob: 7999bbbb825a53380ee54cbad29cad5ab7926e1a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
GENERAL INFORMATION
-------------------

This module implements the Porter-Stemmer algorithm, version 2, to
improve English-language searching with the Drupal built-in
Search module. Information about the algorithm can be found at
http://snowball.tartarus.org/algorithms/english/stemmer.html

Stemming reduces a word to its basic root or stem (e.g. 'blogging' to
'blog') so that variations on a word ('blogs', 'blogger', 'blogging',
'blog') are considered equivalent when searching. This generally
results in more relevant results.

Note that a few parts of the Porter Stemmer algorithm work better for American
English than British English, so some British spellings will not be stemmed
correctly.

This module will use the PECL "stem" library's implementation of the Porter
Stemmer algorithm, if it is installed on your server. If the PECL "stem"
library is not available, the module uses its own PHP implementation of the
algorithm. The output is the same in either case. More information about the
PECL "stem" library: http://pecl.php.net/package/stem


INSTALLATION
------------
See the INSTALL.txt file for installation instructions.


TESTING
-------

The Porter Stemmer module includes tests for the stemming algorithm.
If you would like to run the tests, install the SimpleTest module from
http://drupal.org/project/simpletest, and then navigate to Administer
> Site building > Testing. 

Each "Stemming output" test for the Porter Stemmer module includes
approximately 2000 individual word stemming tests (which test the
module against a standard word list downloaded from the site above).
Due to the way output is displayed in SimpleTest, you may run into
browser timeout or memory issues if you try to run all 16 of the
"Stemming output" tests during the same test run.

Tests are provided both for the internal algorithm and the PECL library.