Courtesy:wakpaper.com
Hey Guys, I’m back with the 3rd Blog in the AWS Kinesis series. From my previous blogs you will now be well versed with what Exactly AWS Kinesis is & what are the building blocks of Kinesis i.e the streams, shards, partition keys, sequence numbers etc. If you missed them, I suggest you take a quick glance from the Links for a better understanding. So now that we know what the building blocks are and what they mean, lets see how they Fit together to bring the Kinesis story to Life!!
Let us Understand the High Level Architecture
Courtesy: AWS Website | Amazon Kinesis High Level Architecture
So lets break down & discuss each of the components in detail.
An Amazon Kinesis Application is called as the “Consumer”.So if there is a “Consumer” , there will also be a “Producer”.
[ Learn more about the Three Hadoop Distributions used by the Big Players ]
So what Exactly is a Consumer and a Producer??
Well we know what a Kinesis Stream is, as discussed in our earlier blogs; Kinesis can receive data from any source that is capable of making a PUT request. So the “Producers” are the applications that make a PUT call, and continuously throw data( like the logs of wood example in the earlier blog ) into the Kinesis Stream.
Ok,so now that the data has entered the Kinesis Stream. Next, that data has to be picked out of the stream to be processed.This will be done by the “Consumer” also known as a “Worker”.
Since the data is being continuously added, a provision needs to be there that continuously keeps checking the stream for new data. Plus fault tolerance must also be taken into consideration. If our application fails or the hardware fails we need to pick up from the last point up to which the data in the stream was processed. AWS provides fault tolerance through a KCL or the Kinesis Client Library.
[ What are the Building Blocks that make up AWS Kinesis ]
What Role Does the KCL Play ?
KCL takes care of all the above mentioned activities. It keeps checking for new data in the stream, creates checkpoints at regular intervals for recovery in case of a failure. Not only does it take care of the above mentioned,it also takes the responsibility for creating a “worker” for every shard. As the number of shards increases/decreases , it increases/reduces the number of workers. It also restarts workers if they have failed.
In short, everything related to the Kinesis Stream handling part is done by the client library itself. We just need to to integrate it into our Java Application that contains the processing code.In other words, we just need to concentrate on the processing logic!!
When creating a Kinesis applications we must make sure that it should have a unique name that is scoped to the AWS account and region used by the application. This is because whenever a Kinesis Applicaion is launched, the name of the application is used as a name for the control table in Amazon DynamoDB and the namespace for Amazon CloudWatch metrics.
When the Kinesis application starts up, it creates an Amazon DynamoDB table to store the application state, connects to the specified Amazon Kinesis stream, and then starts consuming data from the stream. After processing the data, it can be sent to Amazon’s S3, DynamoDB, or Redshift for long term storage or further in-depth processing.
So Now we are ready to Build our own Kinesis Application! Still not confident? Don’t worry, in the next few Blogs we will do a detailed understanding of how to build a Kinesis Application from Scratch.