I am using bzip to compress large files. My custom InputFormat returns true for isSplitable but I get failures with large input files.
BZip2 splitting support was added to Apache Hadoop in the 0.21.0 release but I am running 0.20. Is that my problem?
Answer by Ted Dunning · Jun 30, 2011 at 06:51 PM
WIth MapR, compression is built in and is entirely transparent.
This means that your compressed inputs are split just like any other file. No action is required on your part.