I am using bzip to compress large files. My custom InputFormat returns true for isSplitable but I get failures with large input files.
BZip2 splitting support was added to Apache Hadoop in the 0.21.0 release but I am running 0.20. Is that my problem?
Answer by Ted Dunning · Jun 30, 2011 at 06:51 PM
WIth MapR, compression is built in and is entirely transparent.
This means that your compressed inputs are split just like any other file. No action is required on your part.
Compression on existing directory 1 Answer
MapR and snappy compression 1 Answer
How to compress map output in mapr? 2 Answers
Hadoop 0.22 in M3 3 Answers