Processamento de Dados Massivos/Projeto e implementação de aplicações Big Data/Processamento de streams de tweets: diferenças entre revisões

Saltar para a navegação Saltar para a pesquisa
[edição não verificada][edição não verificada]
Storm's requires 3 main software components to work: Nimbus, Zookeeper, and Workers. Nimbus is the component responsible for code deployment on the worker nodes. Tha Apache Zookeeper is a software for load control on the nodes. Zookeeper load is quite low since its only function is to choose which node will process the next tuple. If fault tolerance is a requirement, the number of Zookeerper processes should be increased, for most cases, only one running is enough. For details on how to install such requirements, check on the Install section.
[[File:Storm-components.png|centro|Storm Components]]
The system bootstrap works as follows. All worker nodes report to the Zookeeper as soon as the code is submited to Nimbus. Then the binary code is submited to each worker node. When the worker nodes are ready to take a job, Zookeeper sends each node a tuple to be processed. And this is done until a spout sends a terminate signal.
The exclamation topology is a very simple topology that has only one objective: Add exclamation marks at the end of random words. In this example, we have two instances of the same object ExclamationBolt. The tuple in this case is just a simple string. One interesting fact in this example is that the order is not important so we can create a superscalar topology.
[[File:ExclamationTopology.png|centro|Exclamation Topology]]
In this case, we have 10 processes for the spout, 3 for Exclamation Bolt 1 and 2 for exclamation Bolt 2. The run is quite fast and the overhead is only done.
The Word Count topology is another simple topology that is used to count words in sentences. For that, a spout randomly emits a set of 5 different sentences. Then, there is a bolt implemented in Python to split the sentences. Finally, a bolt to count word frequencies:
Maybe a good improvement in this Topology would be adding some persistency in the word count structure. That way it would be possible for another bolt to consult a certain word frequency. I am showing this on the following sections. For now, I am focusing on the sytax and the results and the implementation of this.
This is the most complex example in this section. In here it is necessary to implement methods that do the join considering that the tuples came unordered. For that, the communication buffer is used to wait until the correct example. Also, timeouts are used to solve starvation problems in this approach. Because of that Joins in Storm's topologies might introduce bottle necks that should be avoided at all costs


Menu de navegação