News Archive¶

New in 1.1.0¶

Added support for HDP 2.2.

Pyavroc is now automatically loaded if installed, enabling much faster (30-40x) Avro (de)serialization.

Added Timer objects to help debug performance issues.

NoSeparatorTextOutputFormat is now available for all MR versions.

Added Avro support to the Hadoop Simulator.

Bug fixes and performance improvements.

New in 1.0.0¶

Pydoop now features a brand new, more pythonic MapReduce API

Added built-in Avro support (for now, only with Hadoop 2). By setting a few flags in the submitter and selecting AvroContext as your application’s context class, you can read and write Avro data, transparently manipulating records as Python dictionaries. See the Avro I/O docs for further details.

The new pydoop submit tool drastically simplifies job submission, in particular when running applications without installing Pydoop and other dependencies on the cluster nodes (see Installation-free Usage).

Added support for testing Pydoop programs in a simulated Hadoop framework

Added support (experimental) for MapReduce V2 input/output formats (see Writing a Custom InputFormat)

The path module offers many new functions that serve as the HDFS-aware counterparts of those in os.path

The pipes backend (except for the performance-critical serialization section) has been reimplemented in pure Python

An alternative (optional) JPype HDFS backend is available (currently slower than the one based on libhdfs)

Added support for CDH5 and Apache Hadoop 2.4.1, 2.5.2 and 2.6.0

Removed support for CDH3 and Apache Hadoop 0.20.2

Installation has been greatly simplified: now Pydoop does not require any external library to build its native extensions

New in 0.12.0¶

YARN is now fully supported

Added support for CDH 4.4.0 and CDH 4.5.0

New in 0.11.1¶

Added support for hadoop 2.2.0

Added support for hadoop 1.2.1

New in 0.10.0¶

Added support for CDH 4.3.0

Added a walk() method to hdfs instances (works similarly to os.walk() from Python’s standard library)

The Hadoop version parser is now more flexible. It should be able to parse version strings for all CDH releases, including older ones (note that most of them are not supported)

Pydoop script can now handle modules whose file name has no extension

Fixed “unable to load native-hadoop library” problem (thanks to Liam Slusser)

New in 0.9.0¶

Added explicit support for:
- Apache Hadoop 1.1.2
- CDH 4.2.0
Added support for Cloudera from-parcels layout (as installed by Cloudera Manager)
Added pydoop.hdfs.move()
Record writers can now be used in map-only jobs

New in 0.8.1¶

Fixed a problem that was breaking installation from PyPI via pip install

New in 0.8.0¶

Added support for Apple OS X Mountain Lion
Added support for Hadoop 1.1.1
Patches now include a fix for HDFS-829
Restructured docs
- A separate tutorial section collects and expands introductory material

New in 0.7.0¶

Added Debian package

New in 0.7.0-rc3¶

Fixed a bug in the hdfs instance caching method

New in 0.7.0-rc2¶

Support for HDFS append open mode
- fails if your Hadoop version and/or configuration does not support HDFS append

New in 0.7.0-rc1¶

Works with CDH4, with the following limitations:
- support for MapReduce v1 only
- CDH4 must be installed from dist-specific packages (no tarball)
Tested with the latest releases of other Hadoop versions
- Apache Hadoop 0.20.2, 1.0.4
- CDH 3u5, 4.1.2
Simpler build process
- the source code we need is now included, rather than searched for at compile time
Pydoop scripts can now accept user-defined configuration parameters
- New examples show how to use the new feature
New wrapper object makes it easier to interact with the JobConf
New hdfs.path functions: isdir, isfile, kind
HDFS: support for string description of permission modes in chmod
Several bug fixes

New in 0.6.6¶

Fixed a bug that was causing the pipes runner to incorrectly preprocess command line options.

New in 0.6.4¶

Fixed several bugs triggered by using a local fs as the default fs for Hadoop. This happens when you set a file: path as the value of fs.default.name in core-site.xml. For instance:

<property>
  <name>fs.default.name</name>
  <value>file:///var/hadoop/data</value>
</property>

New in 0.6.0¶

The HDFS API features new high-level tools for easier manipulation of files and directories. See the API docs for more info
Examples have been thoroughly revised in order to make them easier to understand and run
Several bugs were fixed; we also introduced a few optimizations, most notably the automatic caching of HDFS instances

New in 0.5.0¶

Pydoop now works with Hadoop 1.0
Multiple versions of Hadoop can now be supported by the same installation of Pydoop. See the section on building for multiple Hadoop versions) for the details
We have added a command line tool to make it trivially simple to write shorts scripts for simple problems.
In order to work out-of-the-box, Pydoop now requires Pydoop 2.7. Python 2.6 can be used provided that you install a few additional modules (see the installation page for details).
We have dropped support for the 0.21 branch of Hadoop, which has been marked as unstable and unsupported by Hadoop developers.

Table Of Contents

Previous topic

Next topic

Get Pydoop

Contributors

News Archive¶

New in 1.1.0¶

New in 1.0.0¶

New in 0.12.0¶

New in 0.11.1¶

New in 0.10.0¶

New in 0.9.0¶

New in 0.8.1¶

New in 0.8.0¶

New in 0.7.0¶

New in 0.7.0-rc3¶

New in 0.7.0-rc2¶

New in 0.7.0-rc1¶

New in 0.6.6¶

New in 0.6.4¶

New in 0.6.0¶

New in 0.5.0¶

Navigation

Table Of Contents

Previous topic

Next topic

Get Pydoop

Contributors

Quick search

News Archive¶

New in 1.1.0¶

New in 1.0.0¶

New in 0.12.0¶

New in 0.11.1¶

New in 0.10.0¶

New in 0.9.0¶

New in 0.8.1¶

New in 0.8.0¶

New in 0.7.0¶

New in 0.7.0-rc3¶

New in 0.7.0-rc2¶

New in 0.7.0-rc1¶

New in 0.6.6¶

New in 0.6.4¶

New in 0.6.0¶

New in 0.5.0¶

Navigation