Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize code for HLS #42

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Standardize code for HLS #42

wants to merge 15 commits into from

Conversation

lforg37
Copy link
Contributor

@lforg37 lforg37 commented Apr 14, 2021

Add monostate in front of each variant (#41), replace remaining float with real_t

@lforg37 lforg37 requested review from keryell and Ralender April 14, 2021 16:55
Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some ideas but think generic & functional programming! :-)

include/constant_medium.hpp Show resolved Hide resolved
@keryell
Copy link
Member

keryell commented Apr 15, 2021

The float_t will allow some fixed point artistic stuff by trying some famous bibliothèques lyonnaises. :-)

constexpr float infinity = std::numeric_limits<float>::infinity();
constexpr float pi = 3.1415926535897932385f;
constexpr real_t infinity = std::numeric_limits<real_t>::infinity();
constexpr real_t pi = 3.1415926535897932385f;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

literals are still assuming float which means that if we change real_t to something else we will get some conversions. maybe make our own user-defined literal could help abstract this.

real_t operator "" _r(long double d) { return d; }

this would force conversion to be on the literals directly such that they can be constant folded

Copy link
Member

@keryell keryell Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use constinit.
I guess a lot of constexpr should be constinit for FPGA. To mention in the poster...

Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
But I am not convinced by the new design.
I cannot see a reason to use this explicit visitor pattern from a pre-modern C++ or Java world when there was no generic lambdas...
You have created a new explicit visitor with all the object-dependent implementation in it breaking the encapsulation by in-lining here all what was in the hit function before.
After looking at the code now with the increased complexity with new kitchen-sink visitor requiring direct access to public object members or über-friendship, I am even less convinced...
Having a distributed hit like before does prevent to add some hierarchy if you have 2 different implementations of sphere for example, by having a hit implemented in some sphere specific class which is inherited by the various spheres.
To avoid blocking this PR I suggest to split it in 2, by moving this complex refactoring in a new one where we can think more about it.
Thanks for this experimentation! It would be interesting if there is any impact on performance and QoR now we have it...

}
};
} // namespace raytracer::visitor
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing end-of-line.
Yes, this is good to have this in its own file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious... An IDE configuration bug?

Comment on lines 10 to 17
point p; // hit point
vec normal; // normal at hit point
bool front_face; // to check if hit point is on the outer surface
/*local coordinates for rectangles
and mercator coordinates for spheres */
real_t u;
real_t v;

// To set if the hit point is on the front face
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are here, add more Doxygen

Suggested change
point p; // hit point
vec normal; // normal at hit point
bool front_face; // to check if hit point is on the outer surface
/*local coordinates for rectangles
and mercator coordinates for spheres */
real_t u;
real_t v;
// To set if the hit point is on the front face
point p; ///< Hit point
vec normal; ///< Normal at hit point
bool front_face; ///< To check if hit point is on the outer surface
/// Local coordinates for rectangles and mercator coordinates for spheres
real_t u;
real_t v;
/// To set if the hit point is on the front face

include/hitable_visitor.hpp Outdated Show resolved Hide resolved
include/localrandom.hpp Outdated Show resolved Hide resolved
include/primitives.hpp Outdated Show resolved Hide resolved
@lforg37
Copy link
Contributor Author

lforg37 commented Apr 16, 2021

Discussion about the refactoring can be continued on #43

Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional + generic C++ = Zen++ :-)

include/hitable.hpp Outdated Show resolved Hide resolved
@lforg37 lforg37 requested a review from keryell April 20, 2021 13:12
Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By looking at the code I wonder whether we cannot merge dev_visit(monostate_dispatch into a simpler monostate_visit including the visit in the implementation.

@keryell
Copy link
Member

keryell commented May 18, 2021

Where are we about this?

@keryell
Copy link
Member

keryell commented Jun 9, 2021

@lforg37 what's up with this?
Probably you should also at least make a WIP branch with the current version we discussed today with the random generator with low spatial quality.

@lforg37 lforg37 marked this pull request as draft June 9, 2021 18:56
@lforg37
Copy link
Contributor Author

lforg37 commented Jun 9, 2021

The current version is based on this branch, commits have been added.
I still need to integrate some suggestion from the review.
I'll "undraft" the PR once its done.

@@ -26,7 +26,7 @@ class box {
}

/// Compute ray interaction with the box
bool hit(auto& ctx, const ray& r, real_t min, real_t max, hit_record& rec,
bool hit(const ray& r, real_t min, real_t max, hit_record& rec,
material_t& hit_material_type) const {
hit_record temp_rec;
material_t temp_material_type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling that if we remove the std::monostate from the material it will always create a sphere here even when there is no intersection.

@@ -62,7 +60,8 @@ class constant_medium {
/// Distance between the two hitpoints affect of probability
/// of the ray hitting a smoke particle
const auto distance_inside_boundary = (rec2.t - rec1.t) * ray_length;
const auto hit_distance = neg_inv_density * sycl::log(rng.float_t());
auto rng = LocalPseudoRNG { toseed(r.direction()) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I realize you have removed the context passed everywhere and replaced basically the random generator with some local computation.
What was the rationale?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As getting a value from the RNG update its internal state, having a shared RNG creates a very long chain of read after write dependency that prevent the HLS compiler to parallelize the otherwise independent computation steps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could have this context with some special HLS decorations to ignore some dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, perhaps something like this pseudo HLS-SYCL code:

auto rng(auto local_seed = std::source_location::current()) {
  return [=] {
     static auto state = local_seed.line();
     #pragma HLS dependence variable=state inter false
     state = crunch(state);
     return mix(state);
  };
}

or

auto rng(auto local_seed = std::source_location::current()) {
  return [state = local_seed.line()] mutable {
     #pragma HLS dependence variable=state inter false
     state = crunch(state);
     return mix(state);
  };
}

Perhaps we need extension intel/llvm#3746 to have the big picture working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aisoard, @yu810226 any feedback?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above is basically a good idea, if I understand the intention, to separate the state into multiple independent RNGs. I'm not sure what you're trying to achieve with the false dependence pragma, however.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenneuendorffer yes this is the idea to distribute the state here in a lazy way to even initialize the seed with something depending from the source location (here with the line number, we could also use the file name, etc.).
I has put probably too much here with the

#pragma HLS dependence variable=state inter false

as a way to have an II=1 without consuming too much latency at the price of a semantics change in the code and hoping we still have enough randomness...
But I agree this is a second order optimization if we are still in trouble...
Perhaps this code will not work with SYCL-HLS because it uses some function pointer. If so, we can try with a macro to avoid returning the lambda, or perhaps just have a macro which inline a random generator wherever we want one... Ahem... :-)

include/hitable.hpp Outdated Show resolved Hide resolved
@lforg37 lforg37 marked this pull request as ready for review June 11, 2021 12:03
@lforg37 lforg37 marked this pull request as draft June 11, 2021 12:05
@lforg37 lforg37 marked this pull request as ready for review June 14, 2021 13:26
Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some comparisons between this version and the main one.
You have not update the README about how to run the program.
It looks like time ./sycl-rt 800 480 50 100 is a way to get the same behaviour as before.
It is a little bit faster than before, I guess because there are some parts of the variant which have been removed...
Also I wonder how it is possible to have the motion blur working without a random generator... by having the time or motion parameter included in the hash? It starts being painful...

include/rtweekend.hpp Show resolved Hide resolved
include/rtweekend.hpp Show resolved Hide resolved
include/rtweekend.hpp Outdated Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
uint32_t shifted4 = (y2 & 63) << 10;
uint32_t shifted5 = (z1 & 63) << 5;
uint32_t shifted6 = (z2 & 63);
return shifted1 ^ shifted2 ^ shifted3 ^ shifted4 ^ shifted5 ^ shifted6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I read too quickly and was thinking to 63 for 5 bits...
Actually that seems like a good idea then to have an xor. Would more than 6 bits better than? 127 or 255?

Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some super old comments and a merge conflict

@keryell
Copy link
Member

keryell commented Jul 5, 2021

@lforg37 where are we on this?

Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

}
};
} // namespace raytracer::visitor
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious... An IDE configuration bug?

auto& rng = ctx.rng;
bool scatter(auto& ctx, const ray& r_in, const hit_record& rec, color& attenuation,
ray& scattered) const {
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) };
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) };

auto& rng = ctx.rng;
bool scatter(auto&, const ray& r_in, const hit_record& rec, color& attenuation,
ray& scattered) const {
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) };
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) };

// Attenuation of the ray hitting the object is modified based on the color
// at hit point
auto& rng = ctx.rng;
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) };
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) };

auto& rng = ctx.rng;
bool scatter(auto& ctx, const ray& r_in, const hit_record& rec, color& attenuation,
ray& scattered) const {
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) };
LocalPseudoRNG rng { toseed(r_in.direction(), r_in.origin()) };

I do not know why I am getting bored here...

std::memcpy(&x2, &val2.x(), sizeof(uint32_t));
std::memcpy(&y2, &val2.y(), sizeof(uint32_t));
std::memcpy(&z2, &val2.z(), sizeof(uint32_t));
uint32_t shifted1 = x1 << 26;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use auto everywhere

uint32_t toseed(vec const& val1, vec const& val2) {
uint32_t x1, y1, z1, x2, y2, z2;
std::memcpy(&x1, &val1.x(), sizeof(uint32_t));
std::memcpy(&y1, &val1.y(), sizeof(uint32_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::memcpy(&y1, &val1.y(), sizeof(uint32_t));
std::memcpy(&y1, &val1.y(), sizeof y1);

everywhere.
This is clearer to use the type of the destination. Also it is shorter since you can avoid () with sizeof expression. :-)

#include <algorithm>
#include <chrono>
#include <cstdint>
#include <ctime>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why?

@return uint32_t
*/
uint32_t toseed(vec const& val1, vec const& val2) {
uint32_t x1, y1, z1, x2, y2, z2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the C++ version everywhere instead of the C version : std::uint32_t

int main(int argc, char* argv[]) {
if (argc < 5 || argc > 7) {
std::cerr << "Usage: sycl-rt OUT_WIDTH OUT_HEIGHT DEPTH SAMPLES "
"[SPHERE_INC [RAND_SEED]]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the README to describe this new API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants