We present a novel algorithm to evaluate and render Loop subdivision surfaces. The algorithm exploits the fact that Loop subdivision surfaces are piecewise polynomial and uses the forward difference technique for efficiently computing uniform samples on the limit surface. The main advantage of our algorithm is that it only requires a small and constant amount of memory that does not depend on the subdivision depth. The simple structure of the algorithm enables a scalable degree of hardware implementation. By low-level parallelization of the computations, we can reduce the critical computation costs to a theoretical minimum of about one float-operation per triangle.