New in 2.0.0¶
Pydoop 2.0.0 adds Python 3 and Hadoop 3 support, and features a complete
overhaul of the mapreduce subpackage, which is now easier to use and more
efficient. As any major software release, Pydoop 2 also makes some
backwards-incompatible changes, mainly by dropping old, seldom-used
features. Finally, it includes several bug fixes and performance
improvements. Here is a more detailed list of changes:
Python 3 support.
Hadoop 3 support.
The
sercoreextension, together with most of thepydoop.mapreducesubpackage, has been rewritten from scratch. Now it’s simpler and slightly faster (much faster when using a combiner).
JobConfis now fully compatible withdict.
pydoop submitnow works when the default file system is local.Compilation of avro-parquet-based examples is now much faster.
Many utilities for guessing Hadoop environment details have been either removed or drastically simplified (affects
hadoop_utilsand related package-level functions). Pydoop now assumes that thehadoopcommand is in thePATH, and uses only that information to try fallback values whenHADOOP_HOMEand/orHADOOP_CONF_DIRare not defined.The
hadutmodule has been stripped down to contain little more than what’s required bypydoop submit. In particular,PipesRunneris gone. Running applications withmapred pipesstill works, but with caveats (e.g., it does not work on the local fs, and controlling remote task environment is not trivial).The
hdfsmodule no longer provides a default value forLIBHDFS_OPTS.The Hadoop simulator has been dropped.
Bug fixes and performance improvements.