This module provides basic, stand-alone Hadoop simulators for debugging support.
Simulates the invocation of program components in a Hadoop workflow.
from my_mr_app import Factory
hs = HadoopSimulatorLocal(Factory())
job_conf = {...}
hs.run(fin, fout, job_conf)
counters = hs.get_counters()
Run the simulator as configured by job_conf, with num_reducers reducers. If file_in is not None, simulate the behavior of Hadoop’s TextLineReader, creating a record for each line in file_in. Otherwise, assume that the factory argument given to the constructor defines a RecordReader, and that job_conf provides a suitable InputSplit. Similarly, if file_out is None, assume that factory defines a RecordWriter with appropriate parameters in job_conf.
Simulates the invocation of program components in a Hadoop workflow using network connections to communicate with a user-provided pipes program.
program_name = '../wordcount/new_api/wordcount_full.py'
data_in = '../input/alice.txt'
output_dir = './output'
data_in_path = os.path.realpath(data_in)
data_in_uri = 'file://' + data_in_path
data_in_size = os.stat(data_in_path).st_size
os.makedirs(output_dir)
output_dir_uri = 'file://' + os.path.realpath(output_dir)
conf = {
"mapred.job.name": "wordcount",
"mapred.work.output.dir": output_dir_uri,
"mapred.task.partition": "0",
}
input_split = InputSplit.to_string(data_in_uri, 0, data_in_size)
hsn = HadoopSimulatorNetwork(program=program_name, logger=logger,
loglevel=logging.INFO)
hsn.run(None, None, conf, input_split=input_split)
The Pydoop application program will be launched sleep_delta seconds after framework initialization.
Run the program through the simulated Hadoop infrastructure, piping the contents of file_in to the program similarly to what Hadoop’s TextInputFormat does. Setting file_in to None implies that the program is expected to get its data from its own RecordReader, using the provided input_split. Analogously, the final results will be written to file_out unless it is set to None, in which case the program is expected to have a RecordWriter.