-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize code for HLS #42
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some ideas but think generic & functional programming! :-)
The |
constexpr float infinity = std::numeric_limits<float>::infinity(); | ||
constexpr float pi = 3.1415926535897932385f; | ||
constexpr real_t infinity = std::numeric_limits<real_t>::infinity(); | ||
constexpr real_t pi = 3.1415926535897932385f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
literals are still assuming float which means that if we change real_t to something else we will get some conversions. maybe make our own user-defined literal could help abstract this.
real_t operator "" _r(long double d) { return d; }
this would force conversion to be on the literals directly such that they can be constant folded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use constinit
.
I guess a lot of constexpr
should be constinit
for FPGA. To mention in the poster...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
But I am not convinced by the new design.
I cannot see a reason to use this explicit visitor pattern from a pre-modern C++ or Java world when there was no generic lambdas...
You have created a new explicit visitor with all the object-dependent implementation in it breaking the encapsulation by in-lining here all what was in the hit
function before.
After looking at the code now with the increased complexity with new kitchen-sink visitor requiring direct access to public object members or über-friendship, I am even less convinced...
Having a distributed hit
like before does prevent to add some hierarchy if you have 2 different implementations of sphere for example, by having a hit
implemented in some sphere specific class which is inherited by the various spheres.
To avoid blocking this PR I suggest to split it in 2, by moving this complex refactoring in a new one where we can think more about it.
Thanks for this experimentation! It would be interesting if there is any impact on performance and QoR now we have it...
} | ||
}; | ||
} // namespace raytracer::visitor | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing end-of-line.
Yes, this is good to have this in its own file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious... An IDE configuration bug?
include/hit_record.hpp
Outdated
point p; // hit point | ||
vec normal; // normal at hit point | ||
bool front_face; // to check if hit point is on the outer surface | ||
/*local coordinates for rectangles | ||
and mercator coordinates for spheres */ | ||
real_t u; | ||
real_t v; | ||
|
||
// To set if the hit point is on the front face |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are here, add more Doxygen
point p; // hit point | |
vec normal; // normal at hit point | |
bool front_face; // to check if hit point is on the outer surface | |
/*local coordinates for rectangles | |
and mercator coordinates for spheres */ | |
real_t u; | |
real_t v; | |
// To set if the hit point is on the front face | |
point p; ///< Hit point | |
vec normal; ///< Normal at hit point | |
bool front_face; ///< To check if hit point is on the outer surface | |
/// Local coordinates for rectangles and mercator coordinates for spheres | |
real_t u; | |
real_t v; | |
/// To set if the hit point is on the front face |
Discussion about the refactoring can be continued on #43 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functional + generic C++ = Zen++ :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at the code I wonder whether we cannot merge dev_visit(monostate_dispatch
into a simpler monostate_visit
including the visit
in the implementation.
Where are we about this? |
@lforg37 what's up with this? |
The current version is based on this branch, commits have been added. |
@@ -26,7 +26,7 @@ class box { | |||
} | |||
|
|||
/// Compute ray interaction with the box | |||
bool hit(auto& ctx, const ray& r, real_t min, real_t max, hit_record& rec, | |||
bool hit(const ray& r, real_t min, real_t max, hit_record& rec, | |||
material_t& hit_material_type) const { | |||
hit_record temp_rec; | |||
material_t temp_material_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the feeling that if we remove the std::monostate
from the material it will always create a sphere
here even when there is no intersection.
include/constant_medium.hpp
Outdated
@@ -62,7 +60,8 @@ class constant_medium { | |||
/// Distance between the two hitpoints affect of probability | |||
/// of the ray hitting a smoke particle | |||
const auto distance_inside_boundary = (rec2.t - rec1.t) * ray_length; | |||
const auto hit_distance = neg_inv_density * sycl::log(rng.float_t()); | |||
auto rng = LocalPseudoRNG { toseed(r.direction()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I realize you have removed the context passed everywhere and replaced basically the random generator with some local computation.
What was the rationale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As getting a value from the RNG update its internal state, having a shared RNG creates a very long chain of read after write dependency that prevent the HLS compiler to parallelize the otherwise independent computation steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could have this context with some special HLS decorations to ignore some dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, perhaps something like this pseudo HLS-SYCL code:
auto rng(auto local_seed = std::source_location::current()) {
return [=] {
static auto state = local_seed.line();
#pragma HLS dependence variable=state inter false
state = crunch(state);
return mix(state);
};
}
or
auto rng(auto local_seed = std::source_location::current()) {
return [state = local_seed.line()] mutable {
#pragma HLS dependence variable=state inter false
state = crunch(state);
return mix(state);
};
}
Perhaps we need extension intel/llvm#3746 to have the big picture working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above is basically a good idea, if I understand the intention, to separate the state into multiple independent RNGs. I'm not sure what you're trying to achieve with the false dependence pragma, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenneuendorffer yes this is the idea to distribute the state here in a lazy way to even initialize the seed with something depending from the source location (here with the line number, we could also use the file name, etc.).
I has put probably too much here with the
#pragma HLS dependence variable=state inter false
as a way to have an II=1
without consuming too much latency at the price of a semantics change in the code and hoping we still have enough randomness...
But I agree this is a second order optimization if we are still in trouble...
Perhaps this code will not work with SYCL-HLS because it uses some function pointer. If so, we can try with a macro to avoid returning the lambda, or perhaps just have a macro which inline a random generator wherever we want one... Ahem... :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some comparisons between this version and the main one.
You have not update the README about how to run the program.
It looks like time ./sycl-rt 800 480 50 100
is a way to get the same behaviour as before.
It is a little bit faster than before, I guess because there are some parts of the variant which have been removed...
Also I wonder how it is possible to have the motion blur working without a random generator... by having the time or motion parameter included in the hash? It starts being painful...
include/rtweekend.hpp
Outdated
uint32_t shifted4 = (y2 & 63) << 10; | ||
uint32_t shifted5 = (z1 & 63) << 5; | ||
uint32_t shifted6 = (z2 & 63); | ||
return shifted1 ^ shifted2 ^ shifted3 ^ shifted4 ^ shifted5 ^ shifted6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I read too quickly and was thinking to 63 for 5 bits...
Actually that seems like a good idea then to have an xor. Would more than 6 bits better than? 127 or 255?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still some super old comments and a merge conflict
@lforg37 where are we on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there!
} | ||
}; | ||
} // namespace raytracer::visitor | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious... An IDE configuration bug?
auto& rng = ctx.rng; | ||
bool scatter(auto& ctx, const ray& r_in, const hit_record& rec, color& attenuation, | ||
ray& scattered) const { | ||
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; | |
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
auto& rng = ctx.rng; | ||
bool scatter(auto&, const ray& r_in, const hit_record& rec, color& attenuation, | ||
ray& scattered) const { | ||
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; | |
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
// Attenuation of the ray hitting the object is modified based on the color | ||
// at hit point | ||
auto& rng = ctx.rng; | ||
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; | |
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
auto& rng = ctx.rng; | ||
bool scatter(auto& ctx, const ray& r_in, const hit_record& rec, color& attenuation, | ||
ray& scattered) const { | ||
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; | |
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) }; |
I do not know why I am getting bored here...
std::memcpy(&x2, &val2.x(), sizeof(uint32_t)); | ||
std::memcpy(&y2, &val2.y(), sizeof(uint32_t)); | ||
std::memcpy(&z2, &val2.z(), sizeof(uint32_t)); | ||
uint32_t shifted1 = x1 << 26; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use auto
everywhere
uint32_t toseed(vec const& val1, vec const& val2) { | ||
uint32_t x1, y1, z1, x2, y2, z2; | ||
std::memcpy(&x1, &val1.x(), sizeof(uint32_t)); | ||
std::memcpy(&y1, &val1.y(), sizeof(uint32_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::memcpy(&y1, &val1.y(), sizeof(uint32_t)); | |
std::memcpy(&y1, &val1.y(), sizeof y1); |
everywhere.
This is clearer to use the type of the destination. Also it is shorter since you can avoid ()
with sizeof
expression. :-)
#include <algorithm> | ||
#include <chrono> | ||
#include <cstdint> | ||
#include <ctime> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why?
@return uint32_t | ||
*/ | ||
uint32_t toseed(vec const& val1, vec const& val2) { | ||
uint32_t x1, y1, z1, x2, y2, z2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the C++ version everywhere instead of the C version : std::uint32_t
int main(int argc, char* argv[]) { | ||
if (argc < 5 || argc > 7) { | ||
std::cerr << "Usage: sycl-rt OUT_WIDTH OUT_HEIGHT DEPTH SAMPLES " | ||
"[SPHERE_INC [RAND_SEED]]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the README to describe this new API
Add monostate in front of each variant (#41), replace remaining
float
withreal_t