Geeser Project intends to give a toolbox for data analyists to work with massive tweet streams in a distributed and scalable environment. For that, Geeser is based on Storm, an open-source fault-tolerant stream processing system. Geeser implements natural language processing activities and simple statistics, such as word counting and trending topics.
In this first stage of the project (that is shown here), we are proposing a distributed way to preprocess tweet streams to extract main textual features. More than that, we are showing practical use of Storm stream processing framework in a challenging context. It is import to stress that this project do '''not aim do optimize or speedup any specific algorithm'''. My only concern is to '''make it feasible to run some algorithms in massive streams'''.


