-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libutils: util.h: corrections on some ROUND*() macros #7183
Conversation
e1cceb7
to
ef05d0e
Compare
I've force pushed a new series (instead of appendix fixup commits). I've preserved the PTA tests on Last commit "[RFC] tree-wide: use ROUND{UP|DOWN}_FAST() where applicable" proposes few places where |
I've changed my view on Before this PR, all the calls to It would be nice to evaluate the Calls that don't use a size of power of 2 should use a special macro since that's the exception. I can't come up with a good name at the moment, both |
Sorry, I missed that.
|
How about #define ROUNDUP_CONST(v, sz) \
(DIV_ROUND_UP(__builtin_choose_expr(__builtin_constant_p(v), \
v, (void)0), \
__builtin_choose_expr(__builtin_constant_p(sz) && \
__builtin_popcountl(sz) == 1, \
sz, (void)0)) * (sz))
uint8_t foo[ROUNDUP_CONST(SIZE, ALIGN)]; We may need some wrapping for the |
Hi @jenswi-linaro, sorry to bother you but I fail to use these builtin helpers. They don't behave as I would expect they do. /* Trigger a build error if constant value is not a power of 2. */
#define CONST_IS_POWER_OF_2(v) \
(__builtin_choose_expr(__builtin_popcountl(v) == 1, 1, (void)0))
/* Return whether variable value is power of 2, enforce constant value is. */
#define IS_POWER_OF_2(v) \
(__builtin_choose_expr(__builtin_constant_p(v), \
CONST_IS_POWER_OF_2(v), \
(((v) != 0) && (((v) & (~(v) + 1)) == (v)))))
int main(int argc, char **argv)
{
if (IS_POWER_OF_2(16) ||
IS_POWER_OF_2(argc) ||
IS_POWER_OF_2(3))
return 0;
return 1;
} I would expect build produce an error at line 15 (
It looks like the compile evaluates In the scope of this P-R, it means I failed to define a unique macro Any ideas? |
My idea with
It might be possible to work around it with macro trickery a la |
Agree.
The issue is to have the compiler not evaluating the part that doesn't apply (e.g. don't evaluate the My proposal: preserve currently massive use of |
I didn't mean that we should use |
Sorry, I can't find my way out of this, considering case known constant, variables and that Coming back to the initial goal of this P-R, I see that building core using either the "slow" variants of Of course, when macros are used with a variable size argument,the resulting instructions differ a bit: Therefore I think the simplest way would be to use the "slow" variants for For the few functions when really want to optimize rounding with a variable size argument (I assume mult/div instructions can take few more CPU cycles than add/mask instructions), we can add new The below C-source produce the very same instructions with the 2 variants of 134: static void rem_um_region(struct user_mode_ctx *uctx, struct vm_region *r)
135: {
136: vaddr_t begin = ROUNDDOWN(r->va, CORE_MMU_PGDIR_SIZE);
137: vaddr_t last = ROUNDUP(r->va + r->size, CORE_MMU_PGDIR_SIZE);
138: struct vm_region *r2 = NULL;
139:
...
------------
000000000e11dcd0 <rem_um_region>:
rem_um_region():
.../optee-4.4.0-qemuv8/optee_os/core/mm/vm.c:135
e11dcd0: a9bd7bfd stp x29, x30, [sp, #-48]!
.../optee-4.4.0-qemuv8/optee_os/core/mm/vm.c:137
e11dcd4: b24053e3 mov x3, #0x1fffff // #2097151
.../optee-4.4.0-qemuv8/optee_os/core/mm/vm.c:135
e11dcd8: 910003fd mov x29, sp
e11dcdc: a90153f3 stp x19, x20, [sp, #16]
e11dce0: aa0103f3 mov x19, x1
.../optee-4.4.0-qemuv8/optee_os/core/mm/vm.c:136
e11dce4: f9400821 ldr x1, [x1, #16]
.../optee-4.4.0-qemuv8/optee_os/core/mm/vm.c:135
e11dce8: a9025bf5 stp x21, x22, [sp, #32]
e11dcec: aa0003f6 mov x22, x0
.../optee-4.4.0-qemuv8/optee_os/core/mm/vm.c:137
e11dcf0: f9400e62 ldr x2, [x19, #24]
.../optee-4.4.0-qemuv8/optee_os/core/mm/vm.c:136
e11dcf4: 926ba835 and x21, x1, #0xffffffffffe00000
.../optee-4.4.0-qemuv8/optee_os/core/mm/vm.c:137
e11dcf8: 8b020022 add x2, x1, x2
e11dcfc: 8b030054 add x20, x2, x3 @jenswi-linaro, I know you said you would preferred to preserve the power-of-2 constraint on Regarding ROUNDUP_OVERFLOW(), we can keep the existing constraint and add an assertion on the size being a power of 2. |
I'm OK with your approach in general, with some small changes to make them easier to use. I've checked what other projects (Linux, OpenBSD, NetBSD, Zephyr) do about this and it's more or less the same as we have upstream. I've seen the optimization we have with a bitmask. NetBSD has an interesting optimization in their I propose adding new macros with the power-of-two constraint. Adding |
OK, sounds good to me, I'll update this P-R accordingly. Thanks for the feedback. |
To summarize and be sure I'm synced with you: |
6d58a70
to
24de64b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience and effort. This looks good. I only have a few comments.
lib/libutils/ext/include/util.h
Outdated
* ROUNDUP_OVERFLOW(v, size, res) | ||
* | ||
* @v: Input value to round | ||
* @size: Rouding operand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Rouding/Rounding/
I did not port the NetBSD roundup implementation because of license terms of our util.h file vs param.h. I think it's doable but rather with a dedicated util2.h header, maybe included from util.h. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience too :-) and your (always precious) help.
No worries, we can do that in another PR. |
I've tested 32b and 64bit with the NetBSD macro: it tends to add 1 instruction but not always, depending on the rounding operand and the surrounded instructions. In the end, heavy using the optimized |
Comments addressed. |
Please squash in the updates. |
c29503f
to
1e07db3
Compare
I've squashed the fixup commits, fixed few remaining typos. |
lib/libutils/ext/include/util.h
Outdated
if (__roundup_mod) \ | ||
__roundup_add = (typeof(v))(size) - __roundup_mod; \ | ||
ADD_OVERFLOW((v), __roundup_add, &__roundup_tmp) ? 1 : \ | ||
((void)(*(res) = __roundup_tmp), 0); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be simplified:
- ADD_OVERFLOW((v), __roundup_add, &__roundup_tmp) ? 1 : \
- ((void)(*(res) = __roundup_tmp), 0); \
+ ADD_OVERFLOW((v), __roundup_add, res); \
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed with:
@@ -68,15 +68,13 @@
*/
#define ROUNDUP_OVERFLOW(v, size, res) \
(__extension__({ \
- typeof(*(res)) __roundup_tmp = 0; \
typeof(v) __roundup_mod = 0; \
typeof(v) __roundup_add = 0; \
\
__roundup_mod = (v) % (typeof(v))(size); \
if (__roundup_mod) \
__roundup_add = (typeof(v))(size) - __roundup_mod; \
- ADD_OVERFLOW((v), __roundup_add, &__roundup_tmp) ? 1 : \
- ((void)(*(res) = __roundup_tmp), 0); \
+ ADD_OVERFLOW((v), __roundup_add, (res)); \
}))
/*
For "libutils: util.h: Relax ROUNDUP() and add ROUNDUP2()",
With that fixed. Please apply my RB for all commits: |
Correct inline description comment for ROUNDUP(), ROUNDUP_OVERFLOW(), ROUNDDOWN() where the second argument is expected to be a power of 2, not a multiple of 2. Add an inline description comment to ROUNDUP_OVERFLOW() to state this requirement. Signed-off-by: Etienne Carriere <[email protected]> Reviewed-by: Jens Wiklander <[email protected]>
Remove constraint on ROUNDUP_DIV() to have its size argument being a power of 2 and add new ROUNDUP2_DIV() macro with that constraint. The previous implementation of ROUNDUP_DIV(), optimized for when size argument is a power of 2, is now used for ROUNDUP2_DIV() but this latter also asserts (in debug build mode) that the size argument conforms to this condition. Performance sensitive routines should now on use ROUNDUP2_DIV() to leverage the power-of-2 rounding optimization. The new implementation of ROUNDUP_DIV() is less optimal but modern compilers produce the same optimized assembly code with this macro when the size argument is a constant value known from the compiler so all use of ROUNDUP_DIV() with a known constant value do not need move to ROUNDUP2_DIV(). By the way, fix the indentation in the macro implementation for consistency of the header file implementation. Signed-off-by: Etienne Carriere <[email protected]> Reviewed-by: Jens Wiklander <[email protected]>
Remove constraint on ROUNDUP_OVERFLOW() to have its size argument being a power of 2 and add new ROUNDUP2_OVERFLOW() macro with that constraint. The previous implementation of ROUNDUP_OVERFLOW(), optimized for when size argument is a power of 2, is now used for ROUNDUP2_OVERFLOW() but this latter also asserts (in debug build mode) that the size argument conforms to this condition. Performance sensitive routines should now on use ROUNDUP2_OVERFLOW() to leverage the power-of-2 rounding optimization. The new implementation of ROUNDUP_OVERFLOW() is less optimal but modern compilers produce the same optimized assembly code with this macro when the size argument is a constant value known from the compiler so all use of ROUNDUP_OVERFLOW() with a known constant value do not need move to ROUNDUP2_OVERFLOW(). By the way, fix the indentation in the macro implementation for consistency of the header file implementation and extend the inline description comment. Signed-off-by: Etienne Carriere <[email protected]> Reviewed-by: Jens Wiklander <[email protected]>
Remove constraint on ROUNDUP() to have its size argument being a power of 2 and add new ROUNDUP2() macro with that constraint. The previous implementation of ROUNDUP(), optimized for when size argument is a power of 2, is now used for ROUNDUP2() but this latter also asserts (in debug build mode) that the size argument conforms to this condition. The new implementation of ROUNDUP() is less optimal but modern compilers produce the same optimized assembly code with this macro when the size argument is a constant value known from the compiler so all use of ROUNDUP() with a known constant value do not need move to ROUNDUP2(). Performance sensitive routines should now on use ROUNDUP2() to leverage the power-of-2 rounding optimization. Signed-off-by: Etienne Carriere <[email protected]> Reviewed-by: Jens Wiklander <[email protected]>
Remove constraint on ROUNDDOWN() to have its size argument being a power of 2 and add new ROUNDDOWN2() macro with that constraint. The previous implementation of ROUNDDOWN(), optimized for when size argument is a power of 2, is now used for ROUNDDOWN2() but this latter also asserts (in debug build mode) that the size argument conforms to this condition. The new implementation of ROUNDDOWN() is less optimal but modern compilers produce the same optimized assembly code with this macro when the size argument is a constant value known from the compiler so all use of ROUNDDOWN() with a known constant value do not need move to ROUNDDOWN2(). Performance sensitive routines should now on use ROUNDDOWN2() to leverage the power-of-2 rounding optimization. Signed-off-by: Etienne Carriere <[email protected]> Reviewed-by: Jens Wiklander <[email protected]>
Use ROUNDUP2(), ROUNDUP2_OVERFLOW(), ROUNDUP2_DIV() and ROUNDDOWN2() at places where the rounding argument is a variable value and we want to leverage the implementation of these routines optimized for a power-of-2 rounding argument. Signed-off-by: Etienne Carriere <[email protected]> Reviewed-by: Jens Wiklander <[email protected]>
Remove trailing space char in inline description comment of DIV_ROUND_UP() macro. Signed-off-by: Etienne Carriere <[email protected]> Reviewed-by: Jens Wiklander <[email protected]>
Fix indentation of the value defined for ROUNDDOWN() macro and add inline comment to better highlight the surrounded macros are defined for assembler and like source files. Signed-off-by: Etienne Carriere <[email protected]> Reviewed-by: Jens Wiklander <[email protected]>
1e07db3
to
c09151c
Compare
Commit message fixed and |
It seems we haven't fixed all the issues (QEMUv8 with pager, regression_1006 timeout). Last time I checked, on my local build machine that test took ~5 minutes. The timeout in qemu-check.exp is 900 seconds (15 minutes), perhaps it's not enough on the CI machines. I will experiment a bit. In the meantime this PR can be merged. |
Thanks @jforissier. The qemu_armv8a test I ran on my side on this P-R were ok. bur I did not test all the 4 configs 'CI check QEMUv8 1 / 2' runs. |
Correct inline description comments for ROUND*() macros that request an operand to be a power of 2, not a multiple of 2.
Assert this condition macros where assertion are applicable.
Propose a implementation for relax this constraint for
ROUNDUP()
andROUNDDOWN()
macros where we can't use assertions.