-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible problems with register values #12
Comments
Hi, on a similar note it seems that calls to rotate with the second value as an Int vector will work on the emulator but not on the actual QPU. It seems like something that wouldn't be possible on the QPU so I'm guessing this is a mistake on the emulator side. However I have an issue where I'd like to rotate a vector by a constant amount, but that constant amount is in an Int vector not an int. I'm forced to use an Int here because when using an int in a normal for loop there is a segfault at compilation of the kernel as I outlined above. Is there a way to extract the int value from the Int? I tried using Int.expr->intLit but this value seems to always be 0. Please let me know if there is a way I can get an int value which matches the value of an Int in a For loop, thanks! |
Hi @cbayley108,
So that could be a bug in QPULib. Even if the size of the kernel code (after unrolling) exceeds some limit, the
That is the expected behaviour, because |
I think this should work on the actual QPU too (as long as the value is in the range 0..15). I'll take a look (might be a few days before I can do that).
Remember the The only way to do it is to write the Thanks for the reports. |
Thanks for the response! The QPU doesn't seem to like receiving an Int as an argument for rotate. Or possibly it isn't an issue with the actual QPU but when running without the emulator an assert fails in Encode.cpp. |
Also because this project is for a school assignment I am not able to make the code publicly available according to our course policy. However, the issue occurs when there are (highly) nested for loops, or even nested loops of either For or for. In general it also seems to occur when there are more nested loops added in terms of breadth, not depth. In my code because I am performing multiple 2D convolutions over multiple images the total depth of the nested for loops gets pretty high, up to 6 deep. I understand this is probably more than was intended but I'm now at the point where I cannot use local data structures as I cannot create more for loops to iterate through them, and they cannot be indexed by an Int in a For loop. For example, I'd like to precompute some data and store it in a 2d array as follows: Overall the library has worked pretty well and I've managed to get a working 2D convolution of an image which is not 16 aligned working slightly faster than the CPU version. If I can use local values as mentioned above it seems it may be possible to double that performance as well. Also I'd like to note that some of your scalar times for the example code seems a bit off from the performance that I see. For HeatMapScalar the listed run time is about 400s, but in practice I seem to see about 100s, however the parallel performance matches very well. |
Turns out this was due to a single character typo in the assertion. I've pushed a fix: c0f798b. Looks like it's working now. Thanks for the report. |
It's a pity you're not allowed to provide a code sample, especially as you're seeing undesirable behaviour in emulation mode where it should be easy for me to diagnose. It's difficult for me to fix if I can't easily reproduce the problem, but thanks anyway for the report.
Nice! Glad to hear you at least get some speed up using the library. I guess there is huge scope to improve the performance of the library, but I don't have much time to work on it these days.
My performance measurements where done on the Pi 1 Model B. Perhaps you're using a Pi 2 or a Pi 3? In your assignment, there are a few reasons you might like to also include results for the Pi 1 Model B (which should be more favourable):
|
Thanks for making that fix! I can talk with my professor about it, he'll likely allow me to send you the code for debugging purposes but it would have to be through email or something less public than Github. And that makes some sense in terms of the scalar run times; I am running on a Pi 3 model B which has a 1.2 Ghz processor compared to the Pi 1's 700 Mhz. I'm not sure where the factor of 4 difference is coming from as the clock is only about 2x faster, but I suppose this could be due to a number of differences between the Pi's. |
That would be great, if you're allowed.
A couple of other factors, beyond just clock frequency: the cache sizes are different, and probably the CPI (cycles per instruction) too. |
Hi @mn416 , I've had a couple of issues which might stem from some problems with register values. The first problem is that there are segfaults or invalid frees from just calling compile on my kernel function. I believe this was due to excessive loop unrolling and changing for(int ... loops into For(Int... loops seems to fix the problem. Another problem I've seen a couple times is that int values do not seem to keep their values between iterations of a For loop. I'm not sure if that is intended behavior and I shouldn't use int values in a For loop over an Int value, so please let me know what the correct usage would be. As an example the following code would write a value of 1 for every output value:
int val = 0;
For(Int i = 0, i < len, i = i +16)
val++;
out[i] = val;
End
The text was updated successfully, but these errors were encountered: