I have seen hadoop map/reduce job to process huge amount data which it is in the hadoop system already (HDFS/HBase etc). Most likely it is offline processing.

However, I wonder if there is any system for large data ingestion using hadoop map/reduce capabilities for near real time data feeds. Of course, the data feeds will most likely be outside of hadoop system. For example, a live email server or a web log etc.

asked 11 Apr '12, 07:59

IArch's gravatar image

IArch
26567
accept rate: 25%

edited 11 Apr '12, 10:03


With MapR you can use NFS to directly deposit the data into Hadoop. MapR can make the entire Hadoop cluster (consisting of 100's of nodes) appear like one giant NFS server capable of ingesting data (in real time) with very, very high bandwidth (> 100 GB/sec). Essentially, each node can become an NFS server and each node provides the exact same view of the cluster. Better yet the NFS-server provided by MapR can run alongside your application server on the same physical box, allowing access to any Hadoop server with a single hop.

link

answered 11 Apr '12, 08:09

MC%20Srivas's gravatar image

MC Srivas ♦♦
2.6k1517
accept rate: 35%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×87
×25

Asked: 11 Apr '12, 07:59

Seen: 973 times

Last updated: 31 Oct '12, 05:41

powered by OSQA