Skip to content

batch rays together, decimate allocations and speedup ~3x#4

Open
fjbarter wants to merge 1 commit into
mainfrom
batch_tracing
Open

batch rays together, decimate allocations and speedup ~3x#4
fjbarter wants to merge 1 commit into
mainfrom
batch_tracing

Conversation

@fjbarter

Copy link
Copy Markdown
Owner

rather large effort to batch rays together for passing to ImplicitBVH. overload LVT traversal algorithm to only trace 'active' rays, i.e. that have not been terminated (hit a sink, bbox, max bounces, max length etc)

this aims to effectively eliminate allocations in the ray tracing loop, as traversal caches are now being adequately utilised for ray tracing, and direction + position matrices do not need to be created per traverse_rays call. simply mutate the RayBatchBuffer

strong scaling is decent but not amazing: 1000 rays -> 11.3 s on 1 thread, 3.6 s on 4 threads for a ~3.1x speedup

@abhirup-roy

Copy link
Copy Markdown
Collaborator

AK dep missing in Project.toml!

@abhirup-roy abhirup-roy left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

Comment thread src/bvh.jl
verts::Vector{SVector{3,Float64}}
verts::Vector{NTuple{3,Float64}}
tris::Vector{NTuple{3,Int32}}
kinds::Vector{SurfaceKind}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this a BitVector seeing as there are 2 types? Maybe make into an is_sink var

Comment thread src/intersections.jl
tcur = wall_t[ray_idx]
idxcur = wall_idx[ray_idx]
if (t < tcur) || ((t == tcur) && (idxcur == 0 || leaf_idx < idxcur))
n = triangle_unit_normal(v0, v1, v2)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might get a speedup by precomputing the normals and storing in SurfaceBVH?

Comment thread src/intersections.jl
hit_found = true
tcur = sphere_t[ray_idx]
idxcur = sphere_idx[ray_idx]
if (t < tcur) || ((t == tcur) && (idxcur == 0 || Int(leaf_idx) < idxcur))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how much performance this gives but... if we change the ray_trianglle_intersect negative case to Inf, we 1. get a a Float64 (instead of a Union) and 2. get use a boolean op here. In my head the gains from the boolean op add up over time?

Comment thread src/utils.jl
@inline add3(a::NTuple{3,Float64}, b::NTuple{3,Float64}) = (a[1] + b[1], a[2] + b[2], a[3] + b[3])
@inline sub3(a::NTuple{3,Float64}, b::NTuple{3,Float64}) = (a[1] - b[1], a[2] - b[2], a[3] - b[3])
@inline mul3(a::NTuple{3,Float64}, s::Float64) = (a[1] * s, a[2] * s, a[3] * s)
@inline madd3(a::NTuple{3,Float64}, s::Float64, b::NTuple{3,Float64}) = (a[1] + s * b[1], a[2] + s * b[2], a[3] + s * b[3])

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yk there's a builtin func called muladd (found out by accident when i was showing someone what mullah means in arabic)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon we can use it here and in dot3 and cross3

Comment thread src/intersections.jl
tmin = -Inf
tmax = Inf
@inbounds for k in 1:3
dk = d[k]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowkey i think we're overthinking on this function? We can make it like

invd = 1.0 / d[k]
t1 = (mins[k] - p[k]) * invd
t2 = (maxs[k] - p[k]) * invd
tmin = max(tmin, min(t1, t2))
tmax = min(tmax, max(t1, t2))

if tmax < max(tmin, eps):
    return nothing # or Inf if you like my previous idea

return tmin > eps ? tmin : tmax

cus if the ray is parallel to the ray is exactly parallel to the axis, it will be Inf ygm?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this means it will parallelise better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants