I am using bzip to compress large files. My custom InputFormat returns true for isSplitable but I get failures with large input files.
BZip2 splitting support was added to Apache Hadoop in the 0.21.0 release but I am running 0.20. Is that my problem?
asked 22 Jun '11, 08:57
WIth MapR, compression is built in and is entirely transparent.
This means that your compressed inputs are split just like any other file. No action is required on your part.
answered 30 Jun '11, 18:51