|dc.description.abstract||Recent research in deterministic record-replayseeks to ease debugging, security, and fault tolerance on otherwise nondeterministic multicore systems. The important challenge of handling shared memory races (that can occur on any memory reference) can be made more efficient with hardware support. Recent proposals record how long threads run in isolation on top of snooping coherence (IMRR), implicit transactions (DeLorean), or directory coherence (Rerun). As core counts scale, Rerun?s directory-based parallel record gets more attractive, but its nearly sequential replay becomes unacceptably slow.
This paper proposes Karma for both scalable recording and replay. Karma builds episodic memory race recorder using a conventional directory protocol and records order of the episodes as a directed acyclic graph. Karma also enables extension of episodes even after some conflicts. During replay, Karma uses wakeup messages to trigger a partially ordered parallel episode replay. Results with several commercial workloads on a 16-core system show that Karma can achieve replay speed (a) within 19%-28% of native execution speed without record-replay and (b) four times faster than even an idealized Rerun replay. Additional results explore tradeoffs between log size and replay speed.||en_US