host implementations of atomic functions? #6

seibert · 2013-06-17T15:27:22Z

I think it would be useful to offer a simple implementation of functions like atomicAdd(), atomicInc, etc, that are used when compiling hemi code for CPU execution. Currently I have to use #ifdef HEMI_DEV_CODE to hide uses of atomicAdd() from the host compiler.

If this sounds reasonable, I'm happy to do the implementation and issue a pull request.

harrism · 2013-06-17T22:13:00Z

I would consider a pull request, definitely. But first: how would you
implement the atomics on the CPU?

Also, nice to hear from a Hemi user. Can you tell me how you are using it?

Thanks!
Mark

On Tue, Jun 18, 2013 at 1:27 AM, Stan Seibert [email protected]:

I think it would be useful to offer a simple implementation of functions
like atomicAdd(), atomicInc, etc, that are used when compiling hemi code
for CPU execution. Currently I have to use #ifdef HEMI_DEV_CODE to hide
uses of atomicAdd() from the host compiler.

If this sounds reasonable, I'm happy to do the implementation and issue a
pull request.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/6
.

seibert · 2013-06-18T00:37:42Z

Since the CPU version of the code-path with Hemi is inherently single-threaded, I imagined that the atomic operations when compiling for the CPU would literally be things like:

inline int atomicAdd(int *address, int val)
{
  int old = *address;
  *address = old + val;
  return old;
}

This code has a race condition, of course, but I don't see generating multi-threaded CPU code as a use case for Hemi. (If I'm incorrect about that, then my proposed solution won't work.) This assumption would need to be clearly stated in the documentation, just in case someone tries to do something bizarre.

We're using Hemi to implement a new CUDA program where we want to preserve the option of compiling the code in CPU-only mode. We are working around cases that Hemi doesn't easily handle yet (like atomics, streams and shared memory) with #ifdefs in our code. The atomics have an easy solution, so that's why I proposed it first.

Eventually, it would be nice to be able to compile Hemi in CPU mode without the CUDA headers at all, but that would require stubbing out a few things, I think

harrism · 2013-06-18T04:23:25Z

I would like to preserve the ability to use, for example, OpenMP with Hemi.

On Tue, Jun 18, 2013 at 10:37 AM, Stan Seibert [email protected]:

Since the CPU version of the code-path with Hemi is inherently
single-threaded, I imagined that the atomic operations when compiling for
the CPU would literally be things like:

inline int atomicAdd(int *address, int val){
int old = *address;
*address = old + val;
return old;}

This code has a race condition, of course, but I don't see generating
multi-threaded CPU code as a use case for Hemi. (If I'm incorrect about
that, then my proposed solution won't work.) This assumption would need to
be clearly stated in the documentation, just in case someone tries to do
something bizarre.

We're using Hemi to implement a new CUDA program where we want to preserve
the option of compiling the code in CPU-only mode. We are working around
cases that Hemi doesn't easily handle yet (like atomics, streams and shared
memory) with #ifdefs in our code. The atomics have an easy solution, so
that's why I proposed it first.

Eventually, it would be nice to be able to compile Hemi in CPU mode
without the CUDA headers at all, but that would require stubbing out a few
things, I think

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/6#issuecomment-19584230
.

seibert · 2013-06-18T11:14:12Z

That makes this harder, since I can't think of a generic way to implement the atomic that works for any multi-threading situation. If Hemi specifically only supported OpenMP for multi-threading on the CPU, then I think this would work:

inline int atomicAdd(int *address, int val)
{
  int old;
  #pragma omp critical
  {
    old = *address;
    *address = old + val;
  }
  return old;
}

harrism · 2013-06-26T03:36:10Z

I would prefer to do this in as flexible and unobtrusive way as possible. I would include it in a separate header, and just provide the sequential CPU implementation. omp critical could be added around the calls to atomicAdd rather than inside them (for example).

lordofhyphens · 2014-07-25T22:12:03Z

~~Since OpenMP is compiler-based, no harm in leaving the pragmas for the critical section in Hemi.~~ The way that critical sections are handled, directive-wise in OpenMP are all global names. Trying to come up with a scheme that is efficient is likely beyond the scope of Hemi. A lock/unlock pattern using the primitives is as probably as good as we're going to get.
I'd probably overload these functions to allow passing in a mutex or other platform-specific way.

Flexible, should be very unobtrusive (if you know you're using something special, then you can pass in what you want). Even the overloads wouldn't need to duplicate code -- they'd just be a call to the basic sequential function with whatever's needed in guards.

The only "trick" I can think of right now is that we'd want to ensure we replicate the behavior of the device function, specifically because there isn't any waiting on the device if another thread tries simultaneous access. The operation simply fails.

…orted. Uses OpenMP locks as an example.

harrism mentioned this issue Jul 21, 2014

Common math function support? #9

Closed

lordofhyphens referenced this issue in lordofhyphens/hemi Jul 25, 2014

Preliminary atomics implementation, with atomicAdd and atomicCAS supp…

31e4b46

…orted. Uses OpenMP locks as an example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

host implementations of atomic functions? #6

host implementations of atomic functions? #6

seibert commented Jun 17, 2013

harrism commented Jun 17, 2013

seibert commented Jun 18, 2013

harrism commented Jun 18, 2013

seibert commented Jun 18, 2013

harrism commented Jun 26, 2013

lordofhyphens commented Jul 25, 2014

host implementations of atomic functions? #6

host implementations of atomic functions? #6

Comments

seibert commented Jun 17, 2013

harrism commented Jun 17, 2013

seibert commented Jun 18, 2013

harrism commented Jun 18, 2013

seibert commented Jun 18, 2013

harrism commented Jun 26, 2013

lordofhyphens commented Jul 25, 2014