-
Notifications
You must be signed in to change notification settings - Fork 801
[SYCL] queue use perfect forwarding #20867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
e8e1aa1 to
0567a41
Compare
0567a41 to
d6a5400
Compare
d6a5400 to
d8e2a34
Compare
| detail::code_location::current()) { | ||
| std::enable_if_t<ext::oneapi::experimental::is_property_list< | ||
| std::decay_t<PropertiesT>>::value, | ||
| event> single_task(PropertiesT &&Properties, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are changing a public API. Is it really necessary?
@gmlueck Could you please comment if spec gives us freedom to accept Properties by universal reference?
| void>::value)) { | ||
| return detail::submit_kernel_direct_single_task<KernelName, true>( | ||
| *this, KernelFunc, {}, Properties, TlsCodeLocCapture.query()); | ||
| *this, KernelFunc, {}, std::forward<PropertiesT>(Properties), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why haven't you updated the else path? And what about the subsequent calls down to the stack? If you forward here, but all next calls in the stack still accept by value, then this optimization makes little sense.
| detail::code_location::current()) { | ||
| std::enable_if_t<ext::oneapi::experimental::is_property_list< | ||
| std::decay_t<PropertiesT>>::value, | ||
| event> single_task(event DepEvent, PropertiesT &&Properties, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here as above
| const detail::code_location &CodeLoc) { | ||
| detail::tls_code_loc_t TlsCodeLocCapture(CodeLoc); | ||
| return impl->memset(Ptr, Value, Count, {DepEvent}, | ||
| return impl->memset(Ptr, Value, Count, {std::move(DepEvent)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right fix here should be to use sycl::span in the queue_impl::memset to accept DepEvents to avoid temporary std::vector creation.
Use perfect forwarding in queue inline implementations. In the same place use event move if possible. For sake of performance.
As the change is small one. Most benchmarks show no performance improvement, but some have improved, e.g. (see difference between green lines above and two last points of red line marked in a circle)


Not using perfect forwarding in the API taking universal references is a wrong pattern. I want it to be fixed, in order to