This reduces GLSL compile times with the gallium noop driver by 0.6%.
This might decrease register usage and do less code reordering because
nir_lower_io_vars_to_temporaries is no longer called for inputs, which
moved most input loads to the top.
radeonsi+ACO shader-db results are noise.
More uniforms are identified as inlinable.
TOTALS FROM ALL SHADERS (58138):
VGPRs: 2152680 -> 2158032 (0.25 %)
Code Size: 71008908 -> 71064812 (0.08 %) bytes
Max Waves: 916943 -> 916924 (-0.00 %)
Inline Uniforms: 6395 -> 6414 (0.30 %)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This can be used to move input loads to top after we stop using
nir_lower_io_vars_to_temporaries that does it unconditionally.
It's more flexible than what nir_lower_io_vars_to_temporaries was doing,
and can be extended to handle any instructions.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This is a partial replacement for nir_lower_io_vars_to_temporaries.
It supports all input and output loads. It doesn't handle stores.
The motivation is to improve compile times.
The main differences compared to nir_lower_io_vars_to_temporaries are:
- it only lowers indirect loads to temps and doesn't touch direct loads
which improves compile times and removes the need for nir_lower_vars_to_ssa
afterward because indirect temp access can't be lowered to SSA
- it doesn't move all input loads to the top; it only moves those input
loads to the top whose indirect loads are lowered (which improves
register usage because direct loads are not moved)
- it doesn't have to deal with complexities of variables
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This commit ports instruction latency information found in codegen emitter.
Previously every instruction was delayed by 16 cycles even if it was not
necessary.
PixMark Piano is highly affected by instruction latencies and gets a 2.5x boost,
other benchmarks still get better performance.
The other two missing pieces to get feature parity with codegen are
functional unit resource tracking and instruction dual-issue.
Performance measures on a GT770 (with 0f pstate)
Pixmark piano: 519 -> 14526 pts (has rendering issues in both!)
Furmark: 3247 -> 5786 pts
The talos principle (high settings): 30-33 -> 55-60 FPS
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35821>
From Maxwell onward, GPUs can encode at most 15 cycles of delay,
Kepler instead can encode up to 32 cycles.
This patch makes the maximum encodeable delay architecture-dependent.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35821>
Previous code took inspiration from the SM50 encoder where scheduling
instructions are interleaved every 3 instructions and jumps between
scheduling blocks are not permitted.
in Kepler scheduling instructions are interleaved once every 7
instructions, if we disallow jumps inside scheduling blocks we need
to fill the remaining instructions in the block with NOPs.
This lead to 1-instruction basic block generating 6 unnecessary NOPs.
In the new code basic blocks are tightly packed, only inserting padding
NOPs at the end of the function, reducing the emitted code in complex
CFGs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35821>
First, we handle the case where GetMemoryFdKHR fails. This is unlikely
and, if it's a Mesa driver it probably won't stomp the FD but we should
be extra careful. Then, we can close the dma-buf file immediately after
we call drmIoctl() on it, ensuring we don't leak the dma-buf file
descriptor if drmIoctl() fails. If ImportSemaphoreFdKHR() fails, then
we need to clean up the sync file.
Fixes: d4f8ad27f2 ("zink: handle implicit sync for dmabufs")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36048>
Similar to how support for X11's DRI2 protocol was deprecated in 24.2,
begin deprecating EGL_WL_bind_wayland_display (including
eglBindWaylandDisplayWL et al) by moving it behind a legacy-wayland
build option.
This extension was originally created in a pre-dmabuf world, where we
didn't have a universally-accepted way of exchanging buffers between
client and compositor, or even really the ability to describe formats
and modifiers universally.
Since then, the world has settled on dmabuf with DRM FourCC and
modifiers. We've had the zwp_linux_dmabuf_v1 protocol for 10 years now:
both clients and compositors implement this protocol to handle buffer
sharing. Compositors either use EGL_EXT_image_dma_buf_import or the
Vulkan dmabuf extensions to import these into GPU world.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36026>
There is a bug in Weston 10 that's causes instability when we don't have
wl_drm which isn't likely to get fixed in a point release. Most of CI
is fine but the final patch in this MR causes AMD raven to kill weston
part-way through runs, destroying the run. Just update weston to
14.0.1.
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36026>
Set values for VkVideoEncodeQualityLevelPropertiesKHR
and its child value such as VkVideoEncodeH264QualityLevelPropertiesKHR
, VkVideoEncodeH265QualityLevelPropertiesKHR and
VkVideoEncodeAV1QualityLevelPropertiesKHR.
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35767>
It doesn't do anything since IO variables are lowered to intrinsics,
which simplifies and eliminates a lot of variable-specific stuff
like declared but dead builtin varyings and unused components
of builtin varying arrays.
This reduces GLSL compile times by 2.4% with the gallium noop driver.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36023>
nir_opt_varyings reduces the number of varyings. Check against limits after
that, so that old and limited GPUs don't fail linking when nir_opt_varyings
is able to reduce varyings to or below the limit.
The previous code only checked FS inputs, which is glaringly obvious
from the removed var_counts_against_varying_limit function.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36023>
The current wsi_common_get_image api is mainly used for aliased wsi
image binding. However, it's not a convenient api because what's missing
is the swapchain image memory bound, with which we can trivially fix the
VkBindImageMemoryInfo so it works just like a normal binding request
from the driver pov. To be noted, besides the simplification of the
driver codes, this helper is also to prepare for common BindImageMemory2
implementation.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35875>