|
Hi All - This one is regarding Hadoop Cluster configuration.. Since every flavour of linux(ubuntu/cent OS/RedHat) comes with its own configurations, data processing speeds, etc.. Lets say we have a 2-node cluster. One node has RedHat linux and another one has CentOS installed.. But since these two operating systems are different in some way like processing blocks of data, etc.. Is there any possibility of getting into 'speculative execution' every time we start execution of MapReduce Job on our 2-node cluster? |
|
Are you saying you always want every task of a job to run twice, once on each type of machine? |
|
no.. no.. whatever operations are there are as usual like the input file will be divided into 64mb blocks and then given to datanodes for processing.. now suppose if these datanodes (in our case assuming its a 2-node cluster) have different operating systems installed on them, then, is there any difference in the processing time taken by these datanodes on the input file due to difference in operating system..? Means the block processed by datanode having redhat completes its job considerably earlier than the datanode having centOS.. It is quite conceivable that you will have very different performance on these two nodes, but not so much due to difference in operating system, but for the simple and practical reason that if you have different operating systems, it is likely that the two nodes were built at different times on different hardware with potentially differing (mis)configurations. On identical hardware, running equivalent versions with identical configuration, without any hardware failures or degradations, Redhat and Centos should produce essentially identical performance. If any of these conditions are violated, then any degree of difference could be observed. Absent misconfiguration and within reasonable tolerances for hardware, the two nodes should run very nearly the same.
(02 May '12, 11:23)
TedDunning ♦♦
|