Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revisionBoth sides next revision
publications:bibtex [2021/02/12 20:36] – Add further publications hnvpublications:bibtex [2021/02/12 20:40] – fix url hnv
Line 62: Line 62:
   volume    = {abs/1912.05234},   volume    = {abs/1912.05234},
   year      = {2019},   year      = {2019},
-  url       = {http://arxiv.org/abs/1912.05234}, 
   archivePrefix = {arXiv},   archivePrefix = {arXiv},
   eprint    = {1912.05234},   eprint    = {1912.05234},
   timestamp = {Thu, 02 Jan 2020 18:08:18 +0100},   timestamp = {Thu, 02 Jan 2020 18:08:18 +0100},
-  biburl    = {https://dblp.org/rec/journals/corr/abs-1912-05234.bib}, 
-  bibsource = {dblp computer science bibliography, https://dblp.org} 
 } }
  
Line 81: Line 78:
 publisher = {Association for Computing Machinery}, publisher = {Association for Computing Machinery},
 address = {New York, NY, USA}, address = {New York, NY, USA},
-url = {https://doi.org/10.1145/3310232.3310242}, 
 doi = {10.1145/3310232.3310242}, doi = {10.1145/3310232.3310242},
 abstract = {In this paper we present an optimisation for reference counting based garbage collection. The optimisation aims at reducing the total number of calls to the heap manager while preserving the key benefits of reference counting, i.e. the opportunities for in-place updates as well as memory deallocation without global garbage collection. The key idea is to carefully extend the lifetime of variables so that memory deallocations followed by memory allocations of the same size can be replaced by a direct memory reuse. Such memory reuse turns out particularly useful in the context of innermost loops of compute-intensive applications. It leads to a runtime behaviour that performs pointer swaps between buffers in the same way it would be implemented manually in languages that require explicit memory management, e.g. C.We have implemented the proposed optimisation in the context of the Single-Assignment C compiler tool chain. The paper provides an algorithmic description of our optimisation and an evaluation of its effectiveness over a collection of benchmarks including a subset of the Rodinia benchmarks and the NAS Parallel Benchmarks. We show that for several benchmarks with allocations within loops our optimisation reduces the amount of allocations by a few orders of magnitude. We also observe no negative impact on the overall memory footprint nor on the overall runtime. Instead, for some sequential executions we find mild improvement, and on GPU devices we observe speedups of up to a factor of 4x.}, abstract = {In this paper we present an optimisation for reference counting based garbage collection. The optimisation aims at reducing the total number of calls to the heap manager while preserving the key benefits of reference counting, i.e. the opportunities for in-place updates as well as memory deallocation without global garbage collection. The key idea is to carefully extend the lifetime of variables so that memory deallocations followed by memory allocations of the same size can be replaced by a direct memory reuse. Such memory reuse turns out particularly useful in the context of innermost loops of compute-intensive applications. It leads to a runtime behaviour that performs pointer swaps between buffers in the same way it would be implemented manually in languages that require explicit memory management, e.g. C.We have implemented the proposed optimisation in the context of the Single-Assignment C compiler tool chain. The paper provides an algorithmic description of our optimisation and an evaluation of its effectiveness over a collection of benchmarks including a subset of the Rodinia benchmarks and the NAS Parallel Benchmarks. We show that for several benchmarks with allocations within loops our optimisation reduces the amount of allocations by a few orders of magnitude. We also observe no negative impact on the overall memory footprint nor on the overall runtime. Instead, for some sequential executions we find mild improvement, and on GPU devices we observe speedups of up to a factor of 4x.},