|[edição não verificada]||[edição não verificada]|
Geeser Project intends to give a toolbox for data analyists to work with massive tweet streams in a distributed and scalable environment. For that, Geeser is based on Storm, an open-source fault-tolerant stream processing system. Geeser implements natural language processing activities and simple statistics, such as word counting and trending topics.
In this first stage of the project (that is shown here), we are proposing a distributed way to preprocess tweet streams to extract main textual features. More than that, we are showing practical use of Storm stream processing framework in a challenging context.