For a long time, our Vulkan WSI code has acted as something of a layer.
The WSI code calls into various Vulkan entrypoints inside the driver to
create images, allocate memory, etc. It then implements the API-facing
interface almost entirely. The only thing the driver has to provide is
little wrappers that wrap around the WSI calls to expose them through
the API.
However, now that we have a common dispatch framework, we can implement
entrypoints directly in the WSI code. As long as the driver uses
vk_instance, vk_physical_device, and vk_device, we can provide common
wrappers for the vast majority of entrypoints. The only exceptions are
vkAcquireNextImage, vkQueuePresent, vkRegisterDeviceEventEXT, and
vkRegisterDisplayEventEXT because those may have to manually poke at
synchronization primitives. We provide wrappers for vkAcquireNextImage
and vkQueuePresent because some drivers can use the default versions.
For now, we're intentionally avoiding any link-time dependencies between
WSI and the common code. We only use VK_FROM_HANDLE and associated
inline helpers and vk_physical_device has a pointer to a wsi_device.
Eventually, we may tie the two together closer, but this lets us get 95%
of the way there without reworking the universe.
Acked-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13234>
In the following cases:
- lower 16 bits are extracted and the shift amount is 16 or more
- lower 8 bits are extracted and the shift amount is 24 or more
the undesireable upper bits are already shifted out, and therefore
there is no need to add SDWA to the v_lshlrev instruction.
Fossil DB stats on Sienna Cichlid with NGGC on:
Totals from 58239 (45.27% of 128647) affected shaders:
CodeSize: 153498624 -> 153265616 (-0.15%); split: -0.15%, +0.00%
Instrs: 29636304 -> 29578064 (-0.20%); split: -0.20%, +0.00%
Latency: 136931496 -> 136876379 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 21134367 -> 21078861 (-0.26%); split: -0.26%, +0.00%
Copies: 2777550 -> 2777548 (-0.00%); split: -0.00%, +0.00%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13121>
When the determinant that we use for calculating triangle area
is NaN, it's not possible to decide the facing of the triangle.
This can happen when a coordinate of one of the triangle's vertices
is INFINITY. It's better to just accept these triangles in the shader
and let the PA deal with them.
Let's do the same for +/- Infinity too.
Though we haven't seen this yet, it may be troublesome as well.
Fixes: 651a3da1b5Closes: #5470
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13299>
We were treating them the same as regular cat2/cat3/cat4 immediates, but
that's not right because cat6 sources are only 8 bits.
Our bindless code was handling this before for bindless resources, and
it was disabled for most other things, so this was mostly harmless, but
fixing it will be necessary for handling ldc offsets.
In addition enable tests for this that were just commented out, and add
a custom test making sure that the immediate source is treated as
unsigned.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
Otherwise, when there are uses in multiple blocks the collect may not
dominate some of the uses.
This is a bugfix, but before it would've mattered only in weird
scenarios with interpolateAt*. When we start moving prefetch textures
into the block before the preamble it will start to matter more, because
it will need to read the barycentrics from a different block than the
bary.f instructions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
There's no reason to make this any different from the other builders,
since it just creates a collect instruction, and in the next commit
we'll need to create a collect in the first block for prefetch textures.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
In order for the load to never straddle the load can't extend past 8
bytes, not 16. For example a vec2 load with align_mul = 8 and
align_offset = 4 can straddle.
Fixes assertion failures when we stop pushing UBOs in the preamble on
a6xx.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
Fix a build error.
In file included from ../src/util/u_queue.h:38,
from ../src/freedreno/drm/freedreno_ringbuffer.h:33,
from ../src/freedreno/ds/fd_pps_driver.h:13,
from ../src/freedreno/ds/fd_pps_driver.cc:7:
../src/util/simple_mtx.h:35:12: fatal error: valgrind.h: No such file or directory
35 | # include <valgrind.h>
| ^~~~~~~~~~~~
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13280>
Let's assume we have a vec2 collect instruction with killed sources that
are non-contiguous and the entire rest of the register file is blocked,
which can happen when our register target is very tight. It's impossible
to just insert move instructions to resolve this, but we can make space
by swapping one of the killed sources with the value next to the other,
assuming it's also scalar.
This commit implements that idea, preventing us from falling back to the
terrible shuffle-everything approach in this case.
total instructions in shared programs: 1566648 -> 1565117 (-0.10%)
instructions in affected programs: 13332 -> 11801 (-11.48%)
helped: 30
HURT: 5
helped stats (abs) min: 6 max: 535 x̄: 51.77 x̃: 25
helped stats (rel) min: 2.67% max: 33.63% x̄: 12.28% x̃: 9.58%
HURT stats (abs) min: 1 max: 6 x̄: 4.40 x̃: 6
HURT stats (rel) min: 0.18% max: 5.13% x̄: 2.41% x̃: 2.13%
95% mean confidence interval for instructions value: -75.05 -12.43
95% mean confidence interval for instructions %-change: -13.18% -7.18%
Instructions are helped.
total mov in shared programs: 77336 -> 76683 (-0.84%)
mov in affected programs: 2135 -> 1482 (-30.59%)
helped: 29
HURT: 5
helped stats (abs) min: 2 max: 227 x̄: 23.31 x̃: 10
helped stats (rel) min: 6.06% max: 72.73% x̄: 31.83% x̃: 30.00%
HURT stats (abs) min: 2 max: 9 x̄: 4.60 x̃: 4
HURT stats (rel) min: 14.29% max: 69.23% x̄: 34.00% x̃: 27.78%
95% mean confidence interval for mov value: -33.21 -5.20
95% mean confidence interval for mov %-change: -32.94% -11.35%
Mov are helped.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13143>
We need the local size in RA for occupancy calculations. Not
initializing these had the unfortunate consequence of
ir3_get_reg_independent_max_waves() returning 0 for compute shaders with
shared variables, disabling the register limiting logic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13143>