Some instructions are not able to swizzle their sources, so we conservatively
refused to propagate moves into them to avoid needing a swizzle on the source.
This is too conservative: we only need to do this if the move swizzles. If there
is only an identity swizzle on the move, we can propagate it without issue. This
will mitigate some instruction count regression from the later modifier
propagation, which will leave lots of moves that need to be propagated.
total instructions in shared programs: 1514834 -> 1514477 (-0.02%)
instructions in affected programs: 132297 -> 131940 (-0.27%)
helped: 349
HURT: 3
Instructions are helped.
total bundles in shared programs: 645093 -> 645069 (<.01%)
bundles in affected programs: 9650 -> 9626 (-0.25%)
helped: 42
HURT: 23
Bundles are helped.
total quadwords in shared programs: 1130751 -> 1130469 (-0.02%)
quadwords in affected programs: 78790 -> 78508 (-0.36%)
helped: 269
HURT: 21
Quadwords are helped.
total registers in shared programs: 90563 -> 90577 (0.02%)
registers in affected programs: 163 -> 177 (8.59%)
helped: 4
HURT: 16
Registers are HURT.
total spills in shared programs: 1400 -> 1399 (-0.07%)
spills in affected programs: 2 -> 1 (-50.00%)
helped: 1
HURT: 0
total fills in shared programs: 5276 -> 5273 (-0.06%)
fills in affected programs: 151 -> 148 (-1.99%)
helped: 1
HURT: 3
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
If we need to insert a mov in order to schedule a branch, we do not schedule
anything writer to the source of that mov in the same bundle to avoid a
data race between the read and the write. That's too conservative, though: it is
legitimate to write in the first part of the ALU word (VMUL/SADD stages) and
then read from the second part (VADD/SMUL/VLUT stages). Reset the
predicate.exclude when going from scheduling the latter stages to the former, to
allow a sequence of code like:
FCMP.vector 0.xyzw, ...
branch 0.x
to be scheduled as
vmul.FCMP.vector 0.xyzw
smul r31.w, 0.x
branch 0.x
rather than getting split up into two bundles.
This mitigates a cycle count regression from the copyprop change.
total instructions in shared programs: 1514856 -> 1514834 (<.01%)
instructions in affected programs: 3087 -> 3065 (-0.71%)
helped: 5
HURT: 1
Inconclusive result (value mean confidence interval includes 0).
total bundles in shared programs: 645327 -> 645093 (-0.04%)
bundles in affected programs: 40498 -> 40264 (-0.58%)
helped: 230
HURT: 68
Bundles are helped.
total quadwords in shared programs: 1130554 -> 1130751 (0.02%)
quadwords in affected programs: 75323 -> 75520 (0.26%)
helped: 49
HURT: 231
Quadwords are HURT.
total registers in shared programs: 90559 -> 90563 (<.01%)
registers in affected programs: 119 -> 123 (3.36%)
helped: 5
HURT: 8
Inconclusive result (value mean confidence interval includes 0).
total threads in shared programs: 55590 -> 55594 (<.01%)
threads in affected programs: 4 -> 8 (100.00%)
helped: 4
HURT: 0
Threads are helped.
total spills in shared programs: 1402 -> 1400 (-0.14%)
spills in affected programs: 289 -> 287 (-0.69%)
helped: 1
HURT: 1
total fills in shared programs: 5285 -> 5276 (-0.17%)
fills in affected programs: 448 -> 439 (-2.01%)
helped: 2
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
If we have multiple reads of the same SSA def in the same block, we don't need
to emit multiple copies for it, we can just reuse a copy (OR'ing in the mask,
knowing the source is already fully written since it's SSA). This will prevent
some regressions in moves from the copyprop patch.
There is a bit of a tradeoff here between increased pressure and reduced
instruction count but I'm not too worried. The affect on pressure seems all over
the place -- register use decreases overall, threads increase (great!) but a few
shaders that were *already spilling*, spill a bit worse. I'm not terribly
worried there.
total instructions in shared programs: 1518289 -> 1514856 (-0.23%)
instructions in affected programs: 292854 -> 289421 (-1.17%)
helped: 1557
HURT: 232
Instructions are helped.
total bundles in shared programs: 646903 -> 645327 (-0.24%)
bundles in affected programs: 91872 -> 90296 (-1.72%)
helped: 910
HURT: 256
Bundles are helped.
total quadwords in shared programs: 1133728 -> 1130554 (-0.28%)
quadwords in affected programs: 187170 -> 183996 (-1.70%)
helped: 1399
HURT: 44
Quadwords are helped.
total registers in shared programs: 90640 -> 90559 (-0.09%)
registers in affected programs: 2676 -> 2595 (-3.03%)
helped: 202
HURT: 124
Inconclusive result (%-change mean confidence interval includes 0).
total threads in shared programs: 55561 -> 55590 (0.05%)
threads in affected programs: 50 -> 79 (58.00%)
helped: 23
HURT: 6
Threads are helped.
total spills in shared programs: 1386 -> 1402 (1.15%)
spills in affected programs: 231 -> 247 (6.93%)
helped: 2
HURT: 13
total fills in shared programs: 5159 -> 5285 (2.44%)
fills in affected programs: 1282 -> 1408 (9.83%)
helped: 11
HURT: 16
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
1. Always calculate when asked. This is the sort of optimization that just
introduces bugs. Like one I hit when shuffling register indices around with
the register access changes.
2. Ask before using in RA.
3. Account for precoloured blend inputs.
Small shader-db hit, didn't investigate too much.
total instructions in shared programs: 1518017 -> 1518168 (<.01%)
instructions in affected programs: 2895 -> 3046 (5.22%)
helped: 0
HURT: 24
Instructions are HURT.
total bundles in shared programs: 646756 -> 646782 (<.01%)
bundles in affected programs: 1119 -> 1145 (2.32%)
helped: 1
HURT: 19
Bundles are HURT.
total quadwords in shared programs: 1133694 -> 1133728 (<.01%)
quadwords in affected programs: 1736 -> 1770 (1.96%)
helped: 0
HURT: 20
Quadwords are HURT.
total registers in shared programs: 90596 -> 90612 (0.02%)
registers in affected programs: 108 -> 124 (14.81%)
helped: 0
HURT: 16
Registers are HURT.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Cc: mesa-stable
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
mir_prev_op will point to the last instruction of the block in that case because
the block instruction list is circular. That would cause an invald
write-after-read relationship between the move we insert with the constants and
the CSEL reading them, which DCE "helpfully" optimizes out, leaving a read from
an undefined def. That ends up getting RA'd to an invalid register.
All in all, pretty bad.
Identified due to a new assert fail after the proper temp_count fix.
Affects dEQP-GLES31.functional.separate_shader.random.12.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
If we start with unscheduled IR:
0 = comparison
csel 1, 2, 0
the old code will schedule this as
r31.w = comparison
csel 1, 2, 0
leaving 0 as a dangling source, which can confuse the rest of the compiler.
Instead rewrite this to
r31.w = comparison
csel 1, 2, r31.w
Note the swizzle as already taken care of (i.e. turned to .x for scalar
conditions) by the time we get to scheduling so we can force to .w.
This keeps register allocation from doing stupid things.
total instructions in shared programs: 1518138 -> 1518017 (<.01%)
instructions in affected programs: 37714 -> 37593 (-0.32%)
helped: 48
HURT: 42
Instructions are helped.
total bundles in shared programs: 646877 -> 646756 (-0.02%)
bundles in affected programs: 17024 -> 16903 (-0.71%)
helped: 48
HURT: 42
Bundles are helped.
total registers in shared programs: 90624 -> 90596 (-0.03%)
registers in affected programs: 361 -> 333 (-7.76%)
helped: 31
HURT: 5
Registers are helped.
total threads in shared programs: 55561 -> 55566 (<.01%)
threads in affected programs: 5 -> 10 (100.00%)
helped: 4
HURT: 0
Threads are helped.
total spills in shared programs: 1386 -> 1383 (-0.22%)
spills in affected programs: 19 -> 16 (-15.79%)
helped: 3
HURT: 0
total fills in shared programs: 5159 -> 5077 (-1.59%)
fills in affected programs: 1305 -> 1223 (-6.28%)
helped: 20
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
As an off-shoot of trying to delete modifiers (and nir_register) from NIR, I'd
like to get rid of some of the modifier NIR silliness that Midgard is doing.
The CSEL type selection heuristic at NIR->MIR time is peak backend silly, so
replace it with nir_gather_ssa_types.
Small win on shader-db. I didn't investigate much, but this matches my intution
for how this patch would perform: very small instruction/cycle count
improvements due to slightly better decisions around modifiers, more substantial
space savings due to more float constants getting inlined.
total instructions in shared programs: 1518422 -> 1518414 (<.01%)
instructions in affected programs: 1914 -> 1906 (-0.42%)
helped: 8
HURT: 0
Instructions are helped.
total bundles in shared programs: 646941 -> 646937 (<.01%)
bundles in affected programs: 344 -> 340 (-1.16%)
helped: 4
HURT: 0
Bundles are helped.
total quadwords in shared programs: 1134727 -> 1134324 (-0.04%)
quadwords in affected programs: 66752 -> 66349 (-0.60%)
helped: 351
HURT: 54
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
This is a generic algebraic optimization. Use the generic algebraic optimizer.
The only reason we can't rely on nir_opt_algebraic to do this is because we
generate inot's late in order to optimize some comparisons. But we already have
a pass to clean that up (midgard_nir_clean_inot), it just needs to be extended
to handle more cases.
shader-db is noise:
total bundles in shared programs: 646941 -> 646942 (<.01%)
bundles in affected programs: 100 -> 101 (1.00%)
helped: 0
HURT: 1
total quadwords in shared programs: 1134727 -> 1134726 (<.01%)
quadwords in affected programs: 318 -> 317 (-0.31%)
helped: 2
HURT: 1
total registers in shared programs: 90619 -> 90618 (<.01%)
registers in affected programs: 12 -> 11 (-8.33%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
Create a feedback buffer for each query pool and retrieve the query
results from the buffer instead of a roundtrip call in
vkGetQueryPoolResults.
VK_QUERY_RESULT_WAIT_BIT queries will poll until the queries are
available in the feedback buffer.
Query results in the feedback buffer are always VK_QUERY_RESULT_64_BIT
and if needed converted to what the app requests at
vkGetQueryPoolResults time.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23348>
vkCmdCopyQueryPoolResults cannot be called within a render pass so batch
and defer the query feedback copies until after the render pass.
Secondary command buffers inside render passes also have their query
feedback copies batched when recorded. When the secondary command buffer
is recorded via vkCmdExecuteCommands, it's batch is merged into the
primary command buffer's batch and is defered until the render pass ends.
If multiview is enabled, vkCmdCopyQueryPoolResults needs to copy
additional queries matching the number of bits set in viewMask.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23348>
vkCmdCopyQueryPoolResults cannot be called within a render pass/or
while the render pass is suspended so track when commands are inside
a render pass. Also track whether a secondary command buffer is
considered to be entirely inside a render pass.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23348>
Per spec 1.3.255: "If queries are used while executing a render pass
instance that has multiview enabled, the query uses N consecutive query
indices in the query pool (starting at query) where N is the number of
bits set in the view mask in the subpass the query is used in."
track viewMask so query feedback can copy the correct amount of queries
when multiview is enabled.
viewMask is passed in for vkCmdBeginRendering but for legacy
vkCmdBeginRenderPass/2 they are set by vkCreateRenderPass for each
subpass.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23348>
Add feedback commands to write query results into a coherent buffer to
optimize out roundtrip vkGetQueryPoolResults that poll until a result
is available.
Queries are available after vkCmdEndQuery or vkCmdWriteTimeStamp, so
append a vkCmdCopyQueryPoolResults to copy to query results to our
coherent buffer.
The coherent buffer also needs to be cleared after vkCmdResetQueryPool
so append vkCmdFillBuffer.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23348>
I've been betting support requests by people confused as to why their
builds broke because this option was removed. We can add the option back
with the deprecated flag set so that Meson will give a nice warning, but
builds will continue to work.
fixes: 1dd1147408
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23893>
this fixes the refcount for the separate shader program to not have a leaked ref
and then fixes the owned program to have the expected number of refs
this happened to work some of the time before because there was an arbitrary unref
in replace_separable_prog(), but this shouldn't have been necessary
Fixes: e3b746e3a3 ("zink: use GPL to handle (simple) separate shader objects")
fixes#9274
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23888>
fails:
dEQP-VK.descriptor_indexing.storage_texel_buffer
dEQP-VK.descriptor_indexing.storage_texel_buffer_after_bind
dEQP-VK.descriptor_indexing.storage_texel_buffer_after_bind_in_loop
dEQP-VK.descriptor_indexing.storage_texel_buffer_in_loop
dEQP-VK.descriptor_indexing.storage_texel_buffer_minNonUniform
They seem to be vertex-shader related.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22828>
Since we must not access the pipe_context concurrently, it makes sense
to add a lock for all kinds of quere related operations. This way, we
can safely create pipe resources inside Vulkan entry points that can be
used concurrently.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22828>