Generating local memory access sequence is a critical issue in distributed-memory implementations of data-parallel languages. In this paper, for arrays distributed block-cyclically on multiple processors, we introduce a novel approach to the local memory access sequence generation using the theory of permutation. By compressing the active elements in a block into an integer, called compress number, and exploiting the fact that there is a repeating pattern in the access sequence, we obtain the global block cycle. Then, we show that the local block cycle can be efficiently enumerated as closed forms using the permutation of global block cycle. After decompressing the compress number in the local block cycle, the local block patterns are restored and the local memory access sequence can be quickly generated. Unlike other works, our approach incurs no run-time overhead.
關聯:
Journal of Systems Architecture, Volume 47, Issue 6, June 2001, Pages 505-515