-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
host implementations of atomic functions? #6
Comments
I would consider a pull request, definitely. But first: how would you Also, nice to hear from a Hemi user. Can you tell me how you are using it? Thanks! On Tue, Jun 18, 2013 at 1:27 AM, Stan Seibert [email protected]:
|
Since the CPU version of the code-path with Hemi is inherently single-threaded, I imagined that the atomic operations when compiling for the CPU would literally be things like: inline int atomicAdd(int *address, int val)
{
int old = *address;
*address = old + val;
return old;
} This code has a race condition, of course, but I don't see generating multi-threaded CPU code as a use case for Hemi. (If I'm incorrect about that, then my proposed solution won't work.) This assumption would need to be clearly stated in the documentation, just in case someone tries to do something bizarre. We're using Hemi to implement a new CUDA program where we want to preserve the option of compiling the code in CPU-only mode. We are working around cases that Hemi doesn't easily handle yet (like atomics, streams and shared memory) with #ifdefs in our code. The atomics have an easy solution, so that's why I proposed it first. Eventually, it would be nice to be able to compile Hemi in CPU mode without the CUDA headers at all, but that would require stubbing out a few things, I think |
I would like to preserve the ability to use, for example, OpenMP with Hemi. On Tue, Jun 18, 2013 at 10:37 AM, Stan Seibert [email protected]:
|
That makes this harder, since I can't think of a generic way to implement the atomic that works for any multi-threading situation. If Hemi specifically only supported OpenMP for multi-threading on the CPU, then I think this would work: inline int atomicAdd(int *address, int val)
{
int old;
#pragma omp critical
{
old = *address;
*address = old + val;
}
return old;
} |
I would prefer to do this in as flexible and unobtrusive way as possible. I would include it in a separate header, and just provide the sequential CPU implementation. omp critical could be added around the calls to atomicAdd rather than inside them (for example). |
Flexible, should be very unobtrusive (if you know you're using something special, then you can pass in what you want). Even the overloads wouldn't need to duplicate code -- they'd just be a call to the basic sequential function with whatever's needed in guards. The only "trick" I can think of right now is that we'd want to ensure we replicate the behavior of the device function, specifically because there isn't any waiting on the device if another thread tries simultaneous access. The operation simply fails. |
…orted. Uses OpenMP locks as an example.
I think it would be useful to offer a simple implementation of functions like atomicAdd(), atomicInc, etc, that are used when compiling hemi code for CPU execution. Currently I have to use #ifdef HEMI_DEV_CODE to hide uses of atomicAdd() from the host compiler.
If this sounds reasonable, I'm happy to do the implementation and issue a pull request.
The text was updated successfully, but these errors were encountered: