Sunday, September 18, 2011

Crisps abt scalability Part 3

New to this post?? Go on with Crisps abt Scalablity Part 1 and Part 2
So are you convinced with Master Slave. Let us move on to Sharding or Shared Nothing Architecture
What is it?
 Before this I'll answer a different question.
For Whom is it??
 If the application you built can be partitioned. A scenario, You have 1 billion customers and all the customers are currently being maintained by only one data store. And they are being served from the only available data store. But the beauty of your application is there is nothing being shared between two customers. So, what is the use of it?? Just split them. Because they all are going to read data about them and they are going to write to data about them. Put up in another way, No one is going to read or write to your data except you. So, what you can do is just split your billion customers in to half a billion and make another data store to handle them. I hope now you understood what it is. 
That is awesome :)  
 Ya really, provided your application is of such a kind :)
Final touch on Consistency, Availability and Partition tolerance [CAP]
Why these three?
 According to Brewer's theorem on distributed systems. Any well designed distributed system can achieve only any two of these verticals. It is a tradeoff. And the best thing is, this theorem has a proof as well. 
Cool, what are they?
 Nice delayed question. Consistency - It can be broadly explained as read and write consistency. Read consistency -> Assurance to the user that he/she reads the most recent data. Means once there is an update that should be immediately reflected back to the user who sends the next request after the update. Write consistency -> A scenario, let X be a global variable P1 and P2 both wishes to write to it. By timeline P1 should be able to do the first write and P2 the second one. So, Finally P2's data should be available. If this doesn't happen then write consistency is said to be failed. As per the needs the system integrates a desired level of consistency like Strict, Causal and Weak. Causal only looks odd. It is nothing but event based consistency. Like if e2 is the effect of e1 then we can say e1->e2. So, this is the way the reads are write will be performed. But the problem is, if the events occur with in a single system everything is fine we can determine causal relationship based on time and process execution. Here everything happens across system. So, we need to implement a kind of global clock or use protocol like NTP. Here I'm discussing abt distributed systems. So, reads and write I mention can happen @ any point in the system but it should be reflected properly in all points :)
Availability - This is simple but difficult :P What if one of the machines of yours failed. Means either network unavailability, floods blah, blah  n all. The data contained by that machine is lost. So, how do we handle this?? The solution is replica management. Instead of writing data to a single server write it to other complement servers and that would help atleast to retain data after loss. This n is mostly called as replication factor.
Partition Tolerance - Consider you have 5 nodes and are arranged in a ring what if node 2 and 4 goes down. 3 becomes unreachable now to 1 and 5 and this is called as network partitioning. Your application should be able to handle this.  

As I said earlier choose two go good :) Cassandra choses AP, Google Big table choses CP, MYSQL choses CA. And hence they are performing well :)

Crisps abt Scalability Part 2

If you are reading this part for the first time. First go on with the previous post Crisps abt Scalability Part 1 of mine.
So what are the two alien terms I mentioned earlier. 
WT* is Master and Slave Architecture??
Simple as it name depicts. There ll be one master and the slaves ll listen to him. Soooorrry :P. Let me be clear. One of the implementations of master slave architecture is all writes will go only to the master and reads can be done from any of the master or slave. Ex: Redis Cache. And another way of implementation [I Hope] is master will act as a place holder of metadata about the location of data in its slaves. Ex: HBase (But this is totally different from the former, will be explained soon).
What is the advantage of this??? 
Consistency -> [Conditions apply]
How??
Simple, consider the same scenario of shopping cart. The problem was not about reading the data but writing.. right?? So, we converge all the writes to one common location so that the conflicts in consistency will be resolved at one point. Now the application looks almost like a usual single server model if u consider writes. Problem solved?? Yesssss :) But the answer is no :P 
WhyyyyyyT*?
Now comes the concept of Single point failure :) 
WT* is that?
Ha ha he he, Im there. Means your application has only one entry point for writes. So what if the entry point fails :( Everything is gone :( So, there are many other mechanisms which handles such fail overs. Like hardware, software and information redundancy blah blah n all :) 
OOPs So now all done??
You want me to say yes?? The answer again is no :P There is one more simple thing. 
What is that?? Is that really simple??
Ha ha :) You have got me right :P It is seemingly simple. The problem of write consistency is over. But read consistency??? The thing is How ll master update its slave. And when should it?? There are two ways push based and pull based. Push based is like when ever there is an update master sends an update to all its slaves to maintain consistency. Else the slave can ask for an update from master for a piece of data. Next when?? Periodically or Trigger based, based on the level of consistency application needs.
Atleast Now is it done??
Sorry to say this again the answer is no. Because there is one more issue with the previous issue :P It is, Will the master r slave will be able to respond to requests if they are getting or sending updates?? It is once againg based on the factor availability. In redis the Master side is non blocking but the slave part it is a blocking process. God should help :P So, there is a special care that needs to be taken if there is a update cycle running.

So, I'm done for Master Slave Architecture

Lets list the overheads and [de-merits may be] now
1.SPF
2.Master might get overloaded on write heavy apps
3.Update cycle
4.Availability

Did You see any of these overheads in ur single server implementation?????
1, 2 and 4 are still a problem of in that model too. 3 alone is an overhead. But that makes the big difference. If 3 is not handled properly read consistency will not be achieved. Then all is gone.

I hope now you understand why scalability might even degrade your performance instead enhancing, if you choose a perfectly wrong architecture. I have highlighted it because they ll not always degrade provided if u understand what you are implementing is appropriate for your application. Because Google and facebook are able to serve requests in n*billions using appropriate distributed architecture for them. 

In part3 I'll deal with the other one sharding or shared nothing architecture too. And let us see how it helps :)

Saturday, September 17, 2011

Crisps abt Scalability

Hi all, It was a great day with "Cassandra - Definitive Guide". The best thing i liked abt that book is the way they explained scalability(ofcourse abt Cassandra too :P). 
So, 
What is scalability?
It is nothing but yet another performance measure. To be clear, it is the measure on how well the application you built performs, if you add more resource to the environment it operates on. Ex. Adding more cores to processor, Increasing main memory's size, etc.,
Why should I care abt it?
Because it is not true that if you add more resource, performance of your system(ur app) should increase. Also there is a probability that performance of your system to decrease.
Can You Explain more in detail?
Yes, For sure. Actually there are two ways you can scale the processing environment.Horizontally and Vertically. Horizontally refers to adding more systems instead enhancing the existing system. Ex: Adding more web servers to serve the requests and using a load balancer ahead. Vertically refers to enhancing the existing system. Ex: Moving from 2GB ram to 64 GB ram, 16 core processor etc. 
What are the limitations?
Perfect question. The problem with vertical scalability is limitation and cost. Means how much can u extend the cores and main memory?? There is a limit for that always. And also cost is directly proportional to the square of configuration you expect. So, next horizontal scaling. The only issue with horizontal scaling[Distributed Systems] is maintenance. I'll explain with a scenario. Assume before scaling u had only one web server handling all the requests hitting your system. So, U need not worry abt consistency of data. Coz every one is going to read from the same. Now You had replicated your content and hosted one more web server to handle the traffic. So, part of your traffic reads from server 1 and another from server 2. So, now comes the problem. If you update one letter in a word in one of the pages of your web application you need to sync the same across all the web servers. There is a mega issue if the server you replicated is a database server instead a web server. I hope you understand the seriousness of the latter(database server).
Seriousness?? What Seriousness??
 Cool. Im there again :) Regarding your web server it serves static or dynamic content. Where does this dynamicity comes from?? It is because of  the data store behind it. So, once again a scenario. You have a system implements a shopping cart. Now you have 2 customers viewing the status of a same product. The best part of it is, when C1 gets data from DB Server 1 and C2 gets data from DB Server 2. Now comes the question what is the issue in that?? Let us make it more serious, the product they wish to buy has only one piece left. So, If both gets access from same server there is no issue. Who ever ordered first, based on the timestamp the conflict of their order ll be resolved and the other ll be given proper status of unavailability. But here the issue is different they both contact two different servers. Sooooooooooooooo how ll consistency be achieved :(
OOOOOOOPs what should we do then?
 There are two different solutions based on the application. One is master slave architecture and another is sharding or shared nothing architecture. 
What are they? 
 Wait for part 2.

Sunday, September 4, 2011

Abt jquery drilldown

It was a boring weekend once again and I wished to author a jquery plugin for drilldown operation. Atlast I did it and commited in GIT Link to Git Repo
Really awesome job by jquery team. I never felt that it is difficult to create a plugin. It was full of fun and my day passed by in a useful way :)


Abt my plugin,
This is a plugin which helps to implement drill down in jquery.
To start working on it u need jquery latest version
U need to include either jquery.drilldown.min.js/jquery.drilldown.js in your javascript src


Usage:

$(selector).drillDown({ 

animate: true; // Default 'true' *Not required

container : ".divname|#spanName" //No Default values *Required 

listParam : "list" // Default 'list', see the demo for explanation

direction : 0 // Default 0->vertical, 1->horizontal

callBack  : test // Default 'null', Call back function to be used on selection

});


Styles:
Elements inside Drill Down are li elements with class name drillItem. You can write a customized style for it
Container is the element specified by the user. So, Sytling is upto the user