Table Of Contents

Previous topic


Next topic

Using the Hadoop SequenceFile Format

Get Pydoop


Pydoop is developed by: CRS4


Pydoop includes several usage examples: you can find them in the “examples” subdirectory of the distribution root.

Home Directory

If you’ve installed Pydoop or other modules locally, i.e., into ~/.local/lib/python2.7/site-packages, the Python code that runs within Hadoop tasks might not be able to find them. This is due to the fact that, according to your Hadoop version or configuration, those tasks might run as a different user. In Hadoop 1, you can work around this problem by setting the mapreduce.admin.user.home.dir configuration parameter (see automatic run/check scripts in the aforementioned examples directory).

NOTE: In any event, to allow another user to execute your locally installed code, you must set permissions accordingly, e.g.:

chmod -R 755 ~/.local

Input Data

Most examples, by default, take their input from a free version of Lewis Carrol’s “Alice’s Adventures in Wonderland” available at Project Gutenberg (see the examples/input sub-directory).