Pydoop is a Python interface to Hadoop that allows you to write MapReduce applications in pure Python:
class Mapper(api.Mapper):
def map(self, context):
for w in context.value.split():
context.emit(w, 1)
class Reducer(api.Reducer):
def reduce(self, context):
context.emit(context.key, sum(context.values))
Feature highlights:
a rich HDFS API;
a MapReduce API that allows to write pure Python record readers / writers, partitioners and combiners;
transparent Avro (de)serialization.
Pydoop enables MapReduce programming via a pure (except for a performance-critical serialization section) Python client for Hadoop Pipes, and HDFS access through an extension module based on libhdfs.
To get started, read the tutorial. Full docs, including installation instructions, are listed below.