Loci is a framework for intra-application coordination of fine-grained numerical kernels and methods. To contrast, approaches such as MDICE and NPSS are focused more on inter-application coordination. Both approaches are valuable with respect to software reuse. In the early to mid 90's there were many attempts to use object-oriented technology at the fine-grain with some successes and failures. For example, an early implementation of the ALEGRA code at Sandia National Labs implemented a finite-element ALE (Arbitrary Lagrangian Eulerian) code where objects represent fundamental numerical components such as tensors, material models, stresses, forces, etc. While the code represented an excellent object-oriented design, maximizing code reuse by creating compose-able objects with simple semantics, the resulting performance of this implementation of the ALEGRA code was $>10$ times slower than traditional Fortran codes. The main sources of performance bottlenecks were in the use of operator overloading and dynamic dispatch. A later re-implementation of ALEGRA removed the use of operator overloading and reduced the use of dynamic dispatch. This implementation was able to achieve performances comparable to Fortran codes with the penalty of significantly reducing the flexibility of the resulting design. More recently, techniques such as expression templates and tools such as PETE (also from Sandia) and Blitz++ have been able to avoid the costs associated with operator overloading. However, the costs of using dynamic dispatch at the lowest level of an application design continue to represent a fundamental optimization problem (due to the fact that it hides potential optimizations in register allocation and instruction reordering).
Loci allows one to have the flexibility of creating abstractions using fundamental compose-able objects with simple semantics without inducing the costs associated with dynamic dispatch. It accomplishes this by introducing a run-time logic deduction engine that is capable of performing deductions on aggregates of simple types. These semantics of these aggregations are documented as a Loci ``rule'' which is used to deduce loop bounds that are passed into computational subroutines. Using this technique, modern compilers are able to perform register scheduling, loop unrolling, and instruction scheduling to achieve performance. The deduction engine itself induces a very small overhead since deductions on aggregates can be performed in $O(1)$ time in most cases. In the CHEM code (see next section for details), for example, the deduction overhead consumes $<<1\%$ of the overall execution time.
The advantages of this approach are numerous: 1) Since the rules represent fundamental computational components their semantics are simple and easily captured, 2) Since the semantics are simple, rules can be composed automatically using logic deduction, 3) The semantics of applications formed by these compositions can and are automatically checked for internal consistency, and 4) Intra-application resources management such as automatic parallelization, cache optimization, memory management, and check-pointing are possible.