In the following cases:
- lower 16 bits are extracted and the shift amount is 16 or more
- lower 8 bits are extracted and the shift amount is 24 or more
the undesireable upper bits are already shifted out, and therefore
there is no need to add SDWA to the v_lshlrev instruction.
Fossil DB stats on Sienna Cichlid with NGGC on:
Totals from 58239 (45.27% of 128647) affected shaders:
CodeSize: 153498624 -> 153265616 (-0.15%); split: -0.15%, +0.00%
Instrs: 29636304 -> 29578064 (-0.20%); split: -0.20%, +0.00%
Latency: 136931496 -> 136876379 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 21134367 -> 21078861 (-0.26%); split: -0.26%, +0.00%
Copies: 2777550 -> 2777548 (-0.00%); split: -0.00%, +0.00%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13121>
When the determinant that we use for calculating triangle area
is NaN, it's not possible to decide the facing of the triangle.
This can happen when a coordinate of one of the triangle's vertices
is INFINITY. It's better to just accept these triangles in the shader
and let the PA deal with them.
Let's do the same for +/- Infinity too.
Though we haven't seen this yet, it may be troublesome as well.
Fixes: 651a3da1b5Closes: #5470
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13299>
We were treating them the same as regular cat2/cat3/cat4 immediates, but
that's not right because cat6 sources are only 8 bits.
Our bindless code was handling this before for bindless resources, and
it was disabled for most other things, so this was mostly harmless, but
fixing it will be necessary for handling ldc offsets.
In addition enable tests for this that were just commented out, and add
a custom test making sure that the immediate source is treated as
unsigned.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
Otherwise, when there are uses in multiple blocks the collect may not
dominate some of the uses.
This is a bugfix, but before it would've mattered only in weird
scenarios with interpolateAt*. When we start moving prefetch textures
into the block before the preamble it will start to matter more, because
it will need to read the barycentrics from a different block than the
bary.f instructions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
There's no reason to make this any different from the other builders,
since it just creates a collect instruction, and in the next commit
we'll need to create a collect in the first block for prefetch textures.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
In order for the load to never straddle the load can't extend past 8
bytes, not 16. For example a vec2 load with align_mul = 8 and
align_offset = 4 can straddle.
Fixes assertion failures when we stop pushing UBOs in the preamble on
a6xx.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
Fix a build error.
In file included from ../src/util/u_queue.h:38,
from ../src/freedreno/drm/freedreno_ringbuffer.h:33,
from ../src/freedreno/ds/fd_pps_driver.h:13,
from ../src/freedreno/ds/fd_pps_driver.cc:7:
../src/util/simple_mtx.h:35:12: fatal error: valgrind.h: No such file or directory
35 | # include <valgrind.h>
| ^~~~~~~~~~~~
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13280>
Let's assume we have a vec2 collect instruction with killed sources that
are non-contiguous and the entire rest of the register file is blocked,
which can happen when our register target is very tight. It's impossible
to just insert move instructions to resolve this, but we can make space
by swapping one of the killed sources with the value next to the other,
assuming it's also scalar.
This commit implements that idea, preventing us from falling back to the
terrible shuffle-everything approach in this case.
total instructions in shared programs: 1566648 -> 1565117 (-0.10%)
instructions in affected programs: 13332 -> 11801 (-11.48%)
helped: 30
HURT: 5
helped stats (abs) min: 6 max: 535 x̄: 51.77 x̃: 25
helped stats (rel) min: 2.67% max: 33.63% x̄: 12.28% x̃: 9.58%
HURT stats (abs) min: 1 max: 6 x̄: 4.40 x̃: 6
HURT stats (rel) min: 0.18% max: 5.13% x̄: 2.41% x̃: 2.13%
95% mean confidence interval for instructions value: -75.05 -12.43
95% mean confidence interval for instructions %-change: -13.18% -7.18%
Instructions are helped.
total mov in shared programs: 77336 -> 76683 (-0.84%)
mov in affected programs: 2135 -> 1482 (-30.59%)
helped: 29
HURT: 5
helped stats (abs) min: 2 max: 227 x̄: 23.31 x̃: 10
helped stats (rel) min: 6.06% max: 72.73% x̄: 31.83% x̃: 30.00%
HURT stats (abs) min: 2 max: 9 x̄: 4.60 x̃: 4
HURT stats (rel) min: 14.29% max: 69.23% x̄: 34.00% x̃: 27.78%
95% mean confidence interval for mov value: -33.21 -5.20
95% mean confidence interval for mov %-change: -32.94% -11.35%
Mov are helped.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13143>
We need the local size in RA for occupancy calculations. Not
initializing these had the unfortunate consequence of
ir3_get_reg_independent_max_waves() returning 0 for compute shaders with
shared variables, disabling the register limiting logic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13143>
There are roughly two cases when it comes to storage images. In the
easy case, we have full hardware support and we can just emit a typed
read/write message in the shader and we're done. In the more complex
cases, we may need to fall back to a typed read with a different format
or even to a raw (SSBO) read.
The hardware has always had basically full support for typed writes all
the way back to Ivy Bridge but typed reads have been harder to come by.
Starting with Skylake, we finally have enough that we at least have a
format of the right bit size but not necessarily the right format so we
can use a typed read but may still have to do an int->unorm or similar
cast in the shader.
Previously, in ANV, we treated lowered images as the default and write-
only as a special case that we can optimize. This flips everything
around and treats the cases where we don't need to do any lowering as
the default "vanilla" case and treats the lowered case as special.
Importantly, this means that read-write access to surfaces where the
native format handles typed writes now use the same surface state as
write-only access and the only thing that uses the lowered surface state
is access read-write access with a format that doesn't support typed
reads. This has the added benefit that now, if someone does a read
without specifying a format, we can default to the vanilla surface and
it will work as long as it's a format that supports typed reads.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13198>