YADP - Yet Another Distributed Project

Description:

yadp is a distributed application. You can for example, parallelize the processing of large data sets, or, run a task over several hosts. The focus is put both on the parallelization part and on the module which enable the development of the distributed code. The computation task is simplified by parallelizing the program over multiple individual, unreliable, cheap machines. This mechanism is based on an multi-agent system, i.e. each machine can host one or more agents, this system is able to split and distribute the tasks over multiple agents, monitor their execution, simultaneously process others tasks, register new agents, remove disconnected agents, and collect the results. Moreover, by its design, yadp should reduce the amount of time spent and simplify the development of the code having to be distributed and executed over remote hosts. This module provides a common interface with a few concrete class implementations, and is designed to be fully extended in order to fit to any project. The goal of yadp is to hide the complexity of the parallelization, while keeping his power.

Download:

You can download the source code here. You can also watch a screenshot here.

Features:

Architecture and communication between the agents:

The current overall communication architecture between the agents is driven by the following schema:


Moreover, this other figure (here) shows the minimal requirements needed for the system to work. Whereas this last figure (here) presents a more advanced configuration. Note that each node can be located on a different physical host.

Fault tolerance

Since the early developments, the fault tolerance is a top priority. The ability to deal with unreliable hardware, networks, code, and humans is mandatory. So, these characteristics are the foundations in the design of yadp. Either the agent is intentionally stopped or the agent unexpectedly fail, but the handling mechanism is the same. For a worker agent, at worst the assigned task will be re-scheduled on another worker, an applicant, will cancel its tasks, an outputgenerator agent will not anymore produce any output. Nevertheless, if this agent was the producer, another outputgenerator (if one is running) will become the new producer. The breakdown of one will never broke the others outputgenerator, even if it was the producer. For a taskscheduler, the applicant will detect his inactivity and re-post the uncompleted tasks to another taskscheduler. If the agenda stops, which is the central component of the system, all the agents will be unable to communicate anymore, stops their jobs, and waits for a new connection with a new agenda (one with the same address than the former). Once the reconnection established, the remaining tasks will be re-scheduled without additional costs (i.e. tasks re-execution). The cost is function of the time meanwhile the agenda was down.

Technical details and recommendations:

sebastien.martini at gmail.com

SourceForge.net Logo

Last update:01-2006