|
I have seen hadoop map/reduce job to process huge amount data which it is in the hadoop system already (HDFS/HBase etc). Most likely it is offline processing. However, I wonder if there is any system for large data ingestion using hadoop map/reduce capabilities for near real time data feeds. Of course, the data feeds will most likely be outside of hadoop system. For example, a live email server or a web log etc. |
|
With MapR you can use NFS to directly deposit the data into Hadoop. MapR can make the entire Hadoop cluster (consisting of 100's of nodes) appear like one giant NFS server capable of ingesting data (in real time) with very, very high bandwidth (> 100 GB/sec). Essentially, each node can become an NFS server and each node provides the exact same view of the cluster. Better yet the NFS-server provided by MapR can run alongside your application server on the same physical box, allowing access to any Hadoop server with a single hop. |