pydoop.hadut
— Hadoop shell interaction¶
Provides access to some functionalities available via the Hadoop shell.
-
exception
pydoop.hadut.
RunCmdError
(returncode, cmd, output=None)¶ Raised by
run_tool_cmd()
and all functions that make use of it to indicate that the call failed (returned non-zero).
-
pydoop.hadut.
collect_output
(mr_out_dir, out_file=None)¶ Return all mapreduce output in
mr_out_dir
.Append the output to
out_file
if provided. Otherwise, return the result as a single string (it is the caller’s responsibility to ensure that the amount of data retrieved fits into memory).
-
pydoop.hadut.
run_class
(class_name, args=None, properties=None, classpath=None, hadoop_conf_dir=None, logger=None, keep_streams=True)¶ Run a Java class with Hadoop (equivalent of running
hadoop <class_name>
from the command line).Additional
HADOOP_CLASSPATH
elements can be provided viaclasspath
(either as a non-string sequence where each element is a classpath element or as a':'
-separated string). Other arguments are passed torun_cmd()
.Note
HADOOP_CLASSPATH
makes dependencies available only on the client side. If you are running a MapReduce application, useargs=['-libjars', 'jar1,jar2,...']
to make them available to the server side as well.
-
pydoop.hadut.
run_cmd
(cmd, args=None, properties=None, hadoop_home=None, hadoop_conf_dir=None, logger=None, keep_streams=True)¶ Runs the
hadoop
command.Calls
run_tool_cmd()
with"hadoop"
as the first argument.
-
pydoop.hadut.
run_tool_cmd
(tool, cmd, args=None, properties=None, hadoop_conf_dir=None, logger=None, keep_streams=True)¶ Run a Hadoop command.
If
keep_streams
is set toTrue
(the default), the stdout and stderr of the command will be buffered in memory. If the command succeeds, the former will be returned; if it fails, aRunCmdError
will be raised with the latter as the message. This mode is appropriate for short-running commands whose “result” is represented by their standard output (e.g.,rval = run_tool_cmd("hdfs", "dfsadmin", ["-safemode", "get"])
).If
keep_streams
is set toFalse
, the command will write directly to the stdout and stderr of the calling process, and the return value will be empty. This mode is appropriate for long running commands that do not write their “real” output to stdout.