TokenStream (open)

Split text into tokens

Syntax

LOADLIB "wh::util/langspecific.whlib";

OBJECTTYPE TokenStream

Description

This object reads text and returns tokens found within the text. Tokens are units of text, for example words, whitespace or punctuation. Words are normalized (accents removed, converted to lowercase) and stemmed (reduced to a base form) if a language is specified.

Constructor

Properties

Functions