GPU_CODELETS

PI Name Jose Manuel Monsalve Diaz
PI Institution Argonne National Laboratory
Project Description

The codelet model can be defined as a model of computation that uses principles of the Von Neumann model and different Dataflow models of computation.  Computation is represented as a graph of instructions where each node is a latency-bounded operation defined as a sequence of instructions that could be executed on different architectures. Although at first Codelets seem to be similar to tasking frameworks, it differs on two aspects. First, codelets are not only data driven but also event driven. Codelets have to wait for the required data, and event signals in order to be enable for execution. Second, operations that are defined within codelets are restricted to know latency operations. This allows to exploit further optimizations at the graph level, and make better decisions in terms of scheduling of operations and static analysis of the required resources. Codelets are grouped together in threaded procedures, which can be invoked from other codelets, and represent the environment of execution of the Codelets themselves. State of Codelets is stored in the Threaded Procedure, while the execution of Codelets are also stateless (Codelets are functional)

This project intends to extend the definition of Codelets from traditional x86 architectures, to a more general and heterogenous system architecture. Currently there is an implementation in C++ for the Codelet model that allows programmers to define programs in terms of codelets and the Threaded Procedures. So far, this implementation has been used on x86 computers, but given the current increasing trend on the interest in heterogenous computation, it is important to provided an extended version of the current implementation that uses the newly available architectures.

Initially, we intend to use the new features introduced in CUDA 10.0 to define graphs, which would allow us to create Threaded Procedures that are defined for the GPU architecture, and allow computation to interact between the GPU and the CPU in the form of a graph.