Table Of Contents

Previous topic

New in 1.2.0

Next topic

Tutorial

Get Pydoop

Contributors

Pydoop is developed by: CRS4

News Archive

New in 1.1.0

  • Added support for HDP 2.2.
  • Pyavroc is now automatically loaded if installed, enabling much faster (30-40x) Avro (de)serialization.
  • Added Timer objects to help debug performance issues.
  • NoSeparatorTextOutputFormat is now available for all MR versions.
  • Added Avro support to the Hadoop Simulator.
  • Bug fixes and performance improvements.

New in 1.0.0

  • Pydoop now features a brand new, more pythonic MapReduce API
  • Added built-in Avro support (for now, only with Hadoop 2). By setting a few flags in the submitter and selecting AvroContext as your application’s context class, you can read and write Avro data, transparently manipulating records as Python dictionaries. See the Avro I/O docs for further details.
  • The new pydoop submit tool drastically simplifies job submission, in particular when running applications without installing Pydoop and other dependencies on the cluster nodes (see Installation-free Usage).
  • Added support for testing Pydoop programs in a simulated Hadoop framework
  • Added support (experimental) for MapReduce V2 input/output formats (see Writing a Custom InputFormat)
  • The path module offers many new functions that serve as the HDFS-aware counterparts of those in os.path
  • The pipes backend (except for the performance-critical serialization section) has been reimplemented in pure Python
  • An alternative (optional) JPype HDFS backend is available (currently slower than the one based on libhdfs)
  • Added support for CDH5 and Apache Hadoop 2.4.1, 2.5.2 and 2.6.0
  • Removed support for CDH3 and Apache Hadoop 0.20.2
  • Installation has been greatly simplified: now Pydoop does not require any external library to build its native extensions

New in 0.12.0

  • YARN is now fully supported
  • Added support for CDH 4.4.0 and CDH 4.5.0

New in 0.11.1

  • Added support for hadoop 2.2.0
  • Added support for hadoop 1.2.1

New in 0.10.0

  • Added support for CDH 4.3.0
  • Added a walk() method to hdfs instances (works similarly to os.walk() from Python’s standard library)
  • The Hadoop version parser is now more flexible. It should be able to parse version strings for all CDH releases, including older ones (note that most of them are not supported)
  • Pydoop script can now handle modules whose file name has no extension
  • Fixed “unable to load native-hadoop library” problem (thanks to Liam Slusser)

New in 0.9.0

  • Added explicit support for:
    • Apache Hadoop 1.1.2
    • CDH 4.2.0
  • Added support for Cloudera from-parcels layout (as installed by Cloudera Manager)
  • Added pydoop.hdfs.move()
  • Record writers can now be used in map-only jobs

New in 0.8.1

  • Fixed a problem that was breaking installation from PyPI via pip install

New in 0.8.0

  • Added support for Apple OS X Mountain Lion
  • Added support for Hadoop 1.1.1
  • Patches now include a fix for HDFS-829
  • Restructured docs
    • A separate tutorial section collects and expands introductory material

New in 0.7.0

  • Added Debian package

New in 0.7.0-rc3

  • Fixed a bug in the hdfs instance caching method

New in 0.7.0-rc2

  • Support for HDFS append open mode
    • fails if your Hadoop version and/or configuration does not support HDFS append

New in 0.7.0-rc1

  • Works with CDH4, with the following limitations:
    • support for MapReduce v1 only
    • CDH4 must be installed from dist-specific packages (no tarball)
  • Tested with the latest releases of other Hadoop versions
    • Apache Hadoop 0.20.2, 1.0.4
    • CDH 3u5, 4.1.2
  • Simpler build process
    • the source code we need is now included, rather than searched for at compile time
  • Pydoop scripts can now accept user-defined configuration parameters
    • New examples show how to use the new feature
  • New wrapper object makes it easier to interact with the JobConf
  • New hdfs.path functions: isdir, isfile, kind
  • HDFS: support for string description of permission modes in chmod
  • Several bug fixes

New in 0.6.6

Fixed a bug that was causing the pipes runner to incorrectly preprocess command line options.

New in 0.6.4

Fixed several bugs triggered by using a local fs as the default fs for Hadoop. This happens when you set a file: path as the value of fs.default.name in core-site.xml. For instance:

<property>
  <name>fs.default.name</name>
  <value>file:///var/hadoop/data</value>
</property>

New in 0.6.0

  • The HDFS API features new high-level tools for easier manipulation of files and directories. See the API docs for more info
  • Examples have been thoroughly revised in order to make them easier to understand and run
  • Several bugs were fixed; we also introduced a few optimizations, most notably the automatic caching of HDFS instances

New in 0.5.0

  • Pydoop now works with Hadoop 1.0
  • Multiple versions of Hadoop can now be supported by the same installation of Pydoop. See the section on building for multiple Hadoop versions) for the details
  • We have added a command line tool to make it trivially simple to write shorts scripts for simple problems.
  • In order to work out-of-the-box, Pydoop now requires Pydoop 2.7. Python 2.6 can be used provided that you install a few additional modules (see the installation page for details).
  • We have dropped support for the 0.21 branch of Hadoop, which has been marked as unstable and unsupported by Hadoop developers.