Unsolved problems:
- What type of lang to pass around in "stemmer" a struct fts_lang or
  just char?
- Should stemmer funcs require length of input or not?
- Should api return bools or 0/-1 for ok/error?
  http://wiki2.dovecot.org/Design/Code says int, but initial design
  was with bool.
- Automake problem: When fts-tokenizer-hdr.c uses funcs from lib-mail,
  should libfts_la depend on $(LIBDOVECOT) stuff or just the
  executables using the library?
- tokenizer: Can an input stream contain multiple emails or is it one
  tokenizer for one istream with one mail?
- stopwords: comparison and hash table lookup, should it assume some
  normalization, decomposition??
- Normalizer: Inspecting err with U_FAILURE() and U_SUCCESS() does not
  reveal warnings, only errors, but it is the way libicu documentation
  encourages to handle err. if (U_SUCCESS(err) && err != U_ZERO_ERROR)
  , there is a warning. u_errorName() prints warning strings also.
- Tokenized text is not yet normalized. Effects on word boundary
  detection unknown.
- Handling of quotes and double quotes in tokenizing. Should we try to
  find "quoted segments" as a single token?

Known issues/fixmes/todos:
- Returned, processed tokens/text are now allocated from a alloconly
  pool with potentially a long lifetime. This will hurt the memory
  footprint.
- fts-normalize make_uchar helper memory allocation for conversion is
  problematic to say the least
- Memory allocation strategies are very varying and nonsensical in all
  of fts-normalize.c. 
- fts-normalize.c: Conversion utf8-->utf16-->utf8 is potentially a
  performance issue.
- Internally (make_uchar, make_utf8 etc) use string_t instead of
  passing around char bufs? OR use a buffer_t in struct fts_normalizer
  instead of separate allocations for each call.
- Bad word skipping not done
- Stemmer needs lower cased words?
- filter: Have languages in classes, or have them in _create()?
    + in classes: some filters(e.g. stopwords) really only support known languages
    + in _create(): error handling easier, _find() must return nulls otherwise (or -1)
    * currently it is very illogical and fts_filter.class_languages is not used at all
    * needs design decisions and refactoring

Misc:
- Too much comfort code, as usual.
