Tutorial

Pydoop is a package that provides a Python API for Hadoop MapReduce and HDFS. Pydoop has several advantages [1] over Hadoop’s built-in solutions for Python programming, i.e., Hadoop Streaming and Jython: being a CPython package, it allows you to access all standard library and third party modules, some of which may not be available for other Python implementations – e.g., SciPy; in addition, Pydoop provides a Python HDFS API which, to the best of our knowledge, is not available in other solutions.

Footnotes

[1]Simone Leo, Gianluigi Zanetti. Pydoop: a Python MapReduce and HDFS API for Hadoop., Proceedings Of The 19th ACM International Symposium On High Performance Distributed Computing, page 819–825, 2010