Since the CCU only gets used for unaligned attachment stores or resolves
with the wrong formats, we can use that space for attachments in many
cases.
This gets two more of vk-5-normal's main renderpass's attachments to fit
in the next gmem_pixels increment, leaving 1 to go. Other renderpasses do
get better gmem_pixels, and a few get better tile sizes as a result, but
the fps increase from those looks to be <.2% at least.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16921>
We now choose between two (equal as of this commit) layouts based on
whether the renderpass's stores will use the CCU space, and assert that we
always know the chosen layout when we go using the gmem offsets.
This required making vkCmdClearAttachments in a secondary take the 3D path
instead of gmem blits, since secondaries only have to be compatible with
the primary's renderpass, rather than equal.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16921>
Wa_14010455700 is dependent on the format and sample count, but our
code to track whether or not it had been applied was only dependent on
the format.
As a result, we failed to enable the workaround when an app used a D16
2xMSAA buffer, then a D16 1xMSAA buffer right afterwards.
Make the workaround tracking code sample-dependent to fix this.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17859>
Wa_14010455700 is dependent on the format and sample count, but our
code to track whether or not it had been applied was only dependent on
the format.
As a result, we failed to enable the workaround when an app used a D16
2xMSAA buffer, then a D16 1xMSAA buffer right afterwards.
Make the workaround tracking code sample-dependent to fix this.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17859>
I'm sorry to whoever wrote this, but
(x - (int) (x < 0)) ^ -((int) (x < 0))
is not an acceptable way to write iabs.
Shader-db results on Intel Tiger Lake with lower_idiv enabled:
total instructions in shared programs: 21122548 -> 21122570 (<.01%)
instructions in affected programs: 2369 -> 2391 (0.93%)
helped: 2
HURT: 8
total cycles in shared programs: 791609360 -> 791608062 (<.01%)
cycles in affected programs: 114106 -> 112808 (-1.14%)
helped: 9
HURT: 1
If we make the Intel back-end less stupid, we get to 9/1 helped/HURT for
instructions as well but that's for a different MR.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17845>
if the tcs was generated, then the prgram was added to the non-tcs cache,
which means deleting it from the tcs+tes cache will fail and then
context_destroy will explode
Fixes: 4123ee3c71 ("zink: invoke descriptor_program_deinit for programs on context destroy")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17866>
This fixes a problem where the destination has a DCC-incompatible view
format and triggers a DCC decompression using a custom u_blitter path, which
is disallowed inside u_blitter due to it being a u_blitter recursion that
always crashes.
This is also better because we'll get the best codepath (u_blitter or
compute) instead of just u_blitter,
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17864>
Now, only one instance of virgl_screen exists for a device
(/dev/dri/cardX), and it is shared by different frontends (eg GLX,
GBM, etc.). There is a problem with this, as follows:
/* Init GLX */
...
glXCreateContext(...);
...
/* GBM */
gbm_fd = open("/dev/dri/card0", O_RDWR);
dev = gbm_create_device(gbm_fd);
bo = gbm_bo_create(dev, ...);
plane_handle = gbm_bo_get_handle_for_plane(bo, ...);
drmPrimeHandleToFD(gbm_fd, handle.u32, flags, &plane_fd);
The above drmPrimeHandleToFD() call will fail with ENOENT.
The reason is that GBM and GLX share the same virgl_screen (file
descriptor), and it is not gbm_fd that is used to create gbm_bo,
but other fd (opened during GLX initialization). Since the scope
of prime handle is limited to drm_file, the above plane_handle is
invalid under gbm_fd.
By canceling the sharing of virgl_screen between different drm_files,
GBM can use the correct fd to create resources, thereby avoiding the
problem of invalid prime handle.
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16738>
This allows the software scoreboarding pass, scheduler, and so on
to handle the individual instructions and handle them, rather than
trusting in the generator to do scoreboarding correctly when expanding
the virtual instruction to multiple actual instructions.
By using SHADER_OPCODE_READ_SR_REG, we also correctly handle the
software scoreboarding workaround when reading DMask/VMask.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17530>
The hwconfig api may change unexpectedly prior to public release of
new platforms. Also, public documentation of the hwconfig api
sometimes lags the release.
For these reasons, warnings about unhandled hwconfig keys are noisy,
likely to occur, and unhelpful to most users. This commit drops those
warnings, in favor of a separate internal process for tracking
hwconfig api changes.
Suggested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17846>
Currently, res->maybe_busy is false by default. If wait immediately
after the resource is created, virgl_drm_resource_wait() will return
directly without checking the actual state of the kernel, which will
cause synchronization problems, such as:
On Guest:
pipe_buffer_create [mesa]
virgl_drm_winsys_resource_create
virtio_gpu_resource_create_ioctl [kernel]
virtio_gpu_fence_alloc
virtio_gpu_object_create
virtio_gpu_cmd_resource_create_3d
VIRTIO_GPU_CMD_RESOURCE_CREATE_3D
virtio_gpu_object_attach
virtio_gpu_cmd_resource_attach_backing
VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING
resource_wait [mesa]
virgl_drm_resource_wait /* return directly without fence waiting */
pipe_buffer_map [mesa]
virgl_drm_resource_map
virtio_gpu_map_ioctl [kernel]
os_mmap
memcpy /* <== here */
On Host (with QEMU):
VIRTIO_GPU_CMD_RESOURCE_CREATE_3D
virgl_cmd_create_resource_3d [qemu]
virgl_renderer_resource_create [virglrenderer]
VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING
virgl_resource_attach_backing [qemu]
virtio_gpu_create_mapping_iov
virgl_renderer_resource_attach_iov [virglrenderer]
virgl_resource_attach_iov
vrend_pipe_resource_attach_iov
vrend_write_to_iovec /* <== here */
virtio_gpu_cleanup_mapping_iov [qemu]
In the example above, there is a race condition between memcpy and
vrend_write_to_iovec.
Signed-off-by: Jiang Feng <jiangfeng@kylinos.cn>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17592>
When we don't want to communicate with minio, e.g. running
lava_job_submitter script locally, MINIO_RESULTS_UPLOAD should be unset.
But this variable is already set by generate-env script, so we need to
remove it from the /set-job-env-vars.sh to avoid declaring it in
unexpected scenarios.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17645>
Is a phi source is an undef, there's no point in copying it or really
caring about it at all. We would just end up inserting a mov from an
undef to a register. Instead, treat phi sources which point to an undef
as if the phi source doesn't exist.
This also prevents them from being included in phi webs which should
reduce the overall interference seen in the shader. Currently, if two
phis share an undef, their phi webs are consdiered to interfere. By
ignoring undefs we can get rid of this false interference and reduce the
size of phi webs. Reducing the number of things being copied by the
parallel copy instructions should also free up the paralle copy
algorithm and reduce the over-all churn of movs.
Shader-db results on Haswell:
total instructions in shared programs: 8156608 -> 8155406 (-0.01%)
instructions in affected programs: 164838 -> 163636 (-0.73%)
Shader-db results on Skylake:
total instructions in shared programs: 18227370 -> 18227359 (<.01%)
instructions in affected programs: 519 -> 508 (-2.12%)
helped: 6
HURT: 0
Shader-db results on Tigerlake:
total instructions in shared programs: 21167987 -> 21168025 (<.01%)
instructions in affected programs: 23701 -> 23739 (0.16%)
helped: 21
HURT: 27
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16817>
Undefs can happen even in real GLSL shaders so it's best to handle them.
Lowering to zero is a perfectly valid implementation. Also, run DCE
because some of the undefs may be dead after from_ssa and there's no
point in processing those in the back-end.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16817>
This was disabled ages ago because it provoked bugs between us and
xserver about context creation attributes, hopefully those servers are
out of circulation by now, let's find out.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17756>
The nir_opt_conditional_discard pass is called anyway and covers
discard/demote/terminate.
iris shader-db:
total instructions in shared programs: 8933422 -> 8933426 (<.01%)
instructions in affected programs: 48 -> 52 (8.33%)
helped: 0
HURT: 4
which is a synmark shader going from 12 to 13 instrs.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17664>
This pattern almost always gets peephole-selected out anyway, but I
noticed it once I removed glsl opt_conditional_discard.
iris shader-db:
total instructions in shared programs: 8933934 -> 8933158 (<.01%)
instructions in affected programs: 75575 -> 74799 (-1.03%)
helped: 179
HURT: 15
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17664>
As it is needed to have V3D_DEBUG defined. For the v3d case, I did it
restoring v3d_process_debug_variable, as it is at v3d_debug.c that
DEBUG_GET_ONCE_FLAGS_OPTION is called.
Fixes: 106b33405e ("vc4/v3d: stop adding NORAST when SHADERDB debug option is used")
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17812>
This is needed because when we switch between GLES and GL on the host,
we have to lower atomics to ssbo, and with that the shaders can't be
pulled from the cache anymore. Likewise when we move the disk image with
a shader cache to a different host, other features might change that
will need lowering. To avoid using stale shaders in this case, merge the
caps into into the shader cache sha.
Fixes: d6db4d2e08
virgl: Add simple disk cache
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17798>
This is a missed format to properly support media interop for Android.
Currently only used when layering GL atop Vulkan on Android, but will
be used directly with Vulkan when the platform default renderer has
switched to skiavk in modern Android.
Test: CtsMediaTestCases and CtsVideoTestCases with angle on venus on anv
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17808>
When the compiler pads a data structure, the padded bytes will not be
initialized. Shader keys are compared with memcmp and unitialized
bytes within the structure breaks this mechanism.
Explicitly pad the structures with members, so the compiler is forced
to initialize them. Add a warning to indicate if a change to
alignment in any of the data structures requires additional padding.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17749>
When the compiler pads a data structure, the padded bytes will not be
initialized. Shader keys are compared with memcmp and unitialized
bytes within the structure breaks this mechanism.
Explicitly pad the structures with members, so the compiler is forced
to initialize them. Add a warning to indicate if a change to
alignment in any of the data structures requires additional padding.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17749>
Right now if we use the option SHADERDB, NORAST is added
automatically. There's no comment justifying it, neither a lot of info
on the commits that added that. But I guess that the purpose is that
SHADERDB option is assumed to be used only to gather shader-db stats,
so setting setting NORAST would allow to get those dumps faster.
But adding debug options automatically can be confusing, as we could
get a behaviour that we were not expecting. At least I needed to check
why using SHADERDB was getting a black screen. And if we want to get
this behaviour, we can easily add manually the NORAST.
Finally, v3dv doesn't support NORAST right now (and we don't have
immediate plans to implement it), so it is somewhat inconsistent to
get different behaviour from the same debug option from the two
drivers.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17788>
Skip gather_push_constants when shared consts are enabled. This makes
sure push_consts is only zero-initialized, and reserved_user_consts is
0. This saves some space in the const file.
This change also adds a few asserts and a comment to
lower_load_push_constant. Because shared consts share the same range
for all stages, we should not apply per-stage offsets in
lower_load_push_constant. It worked because nir_lower_explicit_io
always sets base to 0 for nir_var_mem_push_const and
shader->push_consts.lo was always 0 for all stages.
Fixes: 0c787d57e6 ("tu: increase maxPushConstantsSize to 256.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17777>
Error message on OSX:
../src/glx/tests/fake_glx_screen.cpp:101:20: error: thread-local declaration of '__glX_tls_Context' with dynamic initialization follows declaration with static initialization
thread_local void *__glX_tls_Context = &dummyContext;
^
../src/glx/glxclient.h:655:36: note: previous declaration is here
extern __THREAD_INITIAL_EXEC void *__glX_tls_Context;
Fixes: be00a7c8ac ("glx: using C++11 keyword thread_local")
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17784>
There is non pipe-loader source code depends on it.
After doing this, we found that shared library pipe_swrast depends on libswdri
The error message is:
src/gallium/targets/pipe-loader/pipe_swrast.so.p/pipe_swrast.c.o:pipe_swrast.c:swrast_driver_descriptor: error: undefined reference to 'dri_create_sw_winsys'
src/gallium/targets/pipe-loader/pipe_swrast.so.p/pipe_swrast.c.o:pipe_swrast.c:swrast_driver_descriptor: error: undefined reference to 'kms_dri_create_winsys'
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17784>
Fixes the following EU validation error:
ERROR: Header must be present for all URB messages.
The message header is ignored for URB fence messages, so I doubt that
this actually matters in practice. But we should probably mark it as
present, because you have to send something, and according to the
documentation, there is a message header, it's just ignored.
Fixes: e6a9501aa2 ("intel/fs: Add the URB fence message")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
When this rule started causing issues, I looked it up in the
documentation, and found the rule for 64-bit destinations and
integer DWord multiplication, but there was no mention of floating
point destinations, as the text in brackets suggested. The actual
restriction text had been updated, so this led to some confusion
where I thought the conditions had been changed in newer docs.
However, what's actually going on is that there are two separate
conditions, each listed in separate rows of the table. One lists
64-bit destinations or integer DWord multiplication, and the other
mentions floating-point destinations. In both cases, the actual
restrictions are identical, so we handle them together in the code.
Try to update the comment to avoid future confusion.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
Recently, we started using <1;1,0> register regions for consecutive
channels, rather than the <8;8,1> we've traditionally used, as the
<1;1,0> encoding can be compacted on XeHP. Since then, one of the
EU validator rules has been flagging tons of instructions as errors:
mov(16) g114<1>F g112<1,1,0>UD { align1 1H I@2 compacted };
ERROR: Register Regioning patterns where register data bit locations are changed between source and destination are not supported except for broadcast of a scalar.
Our code for this restriction checked three things:
#1: vstride != width * hstride ||
#2: src_stride != dst_stride ||
#3: subreg != dst_subreg
Destination regions are always linear (no replicated values, nor
any overlapping components), as they only have hstride. Rule #1 is
requiring that the source region be linear as well. Rules #2-3 are
straightforward: the subregister must match (for the first channel to
line up), and the source/destination strides must match (for any
subsequent channels to line up).
Unfortunately, rules #1-2 weren't working when horizontal stride was 0.
In that case, regions are linear if width == 1, and the stride between
consecutive channels is given by vertical stride instead.
So we adjust our src_stride calculation from
src_stride = hstride * type_size;
to:
src_stride = (hstride ? hstride : vstride) * type_size;
and adjust rule #1 to allow hstride == 0 as long as width == 1.
While here, we also update the text of the rule to match the latest
documentation, which apparently clarifies that it's the location of
the LSB of the channel which matters.
Fixes: 3f50dde8b3 ("intel/eu: Teach EU validator about FP/DP pipeline regioning restrictions.")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
When the EU validator encountered an error, it would add an annotation
to the disassembly. Unfortunately, the code to insert an error assumed
that the next instruction would start at (offset + sizeof(brw_inst)),
which is not true if the instruction with an error is compacted.
This could lead to cascading disassembly errors, where we started trying
to decode the next instruction at the wrong offset, and getting lots of
scary looking output:
ERROR: Register Regioning patterns where [...]
(-f0.1.any16h) illegal(*** invalid execution size value 6 ) { align1 $7.src atomic };
(+f0.1.any16h) illegal.sat(*** invalid execution size value 6 ) { align1 $9.src AccWrEnable };
illegal(*** invalid execution size value 6 ) { align1 $11.src };
(+f0.1) illegal.sat(*** invalid execution size value 6 ) { align1 F@2 AccWrEnable };
(+f0.1) illegal.sat(*** invalid execution size value 6 ) { align1 F@2 AccWrEnable };
(+f0.1) illegal.sat(*** invalid execution size value 6 ) { align1 $15.src AccWrEnable };
illegal(*** invalid execution size value 6 ) { align1 $15.src };
(+f0.1) illegal.sat.g.f0.1(*** invalid execution size value 6 ) { align1 $13.src AccWrEnable };
Only the first instruction was actually wrong - the rest are just a
result of starting the disassembler at the wrong offset. Trash ensues!
To fix this, just pass the instruction size in a few layers so we can
record the next offset properly.
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
For some reason the Vulkan spec required that these features must be
supported even though they only affect features that are optional
in Vulkan 1.2 and that we don't support, so enabling them doesn't
have any practical implications for us.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17786>
A simple implementations of breadcrumbs tracking of GPU progress
intended to be the last resort when debugging unrecoverable hangs.
For best results use Vulkan traces to have a predictable place of hang.
Requires compilation with TU_BREADCRUMBS_ENABLED=1. See tu_cs_breadcrumbs.c
for details on how to use this feature.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15452>
It moves the explicit clamping of incoming Z from vertex stages
after interp, to the depth clamp function.
It adds support to the depth clamp function to restrict incoming
Z values to 0..1 range.
It fixes the depth test conversions to allow unrestricted depth
values.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17612>
For unrestricted depth ranges, depth clear values for unorm buffers
need to be explicitly clamped. However this has to happen in the
driver when we know the depth buffer format, not at the API level.
Just add clamps to the non-f32 cases and separate it out.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17612>
GLES always clamps for 32-bit float buffers, GL doesn't require
it but setting this per API causes virgl to fail some tests.
To fix is properly we'd need to introduce a CAP to expose
this between host/guest.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17612>
Some of these are actually bugfixes, some like the drawcall information
are just for autotune so they are just performance fixes. However this
came from an audit into what state is used in CmdEndRenderPass().
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17378>
Emit it at EndRenderPass time, if the renderpass has tessellation. This
avoids all the special handling for secondaries, will work with
suspended/resumed render passes, and will handle secondaries that
contain render passes which will be allowed with dynamic rendering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17378>
The workaround for draws that need a CP_WAIT_FOR_ME didn't work if the
barrier before the draw is in a separate command buffer from the draw.
The barrier would add a pending CP_WAIT_FOR_ME, but it would get dropped
on the floor at the end of the command buffer and the draw wouldn't have
a pending CP_WAIT_FOR_ME so it wouldn't emit one. We don't know in the
barrier if the destination is a draw with the workaround, so we have two
options:
- Emit any pending CP_WAIT_FOR_ME at the end of the command buffer (and
before secondaries) in case there is a workaround draw later. This
will emit an extra CP_WAIT_FOR_ME at the end of the command buffer in
case there is an indirect command barrier.
- Always assume at the beginning of the command buffer that there is a
pending CP_WAIT_FOR_ME. This will emit an extra CP_WAIT_FOR_ME before
the first workaround-requiring draw in the command buffer, in case
there was a barrier earlier.
The only draws requiring a workaround are currently
vkCmdDraw*IndirectCount(), which we assume are rarer than indirect
command barriers, so we implement the second option. This entails
treating it as a cache invalidate.
This fixes some upcoming dynamic rendering CTS tests that do
vkCmdDrawIndirectCount() in a secondary but put the barrier for it in
the primary.
Fixes: 37939e9c54 ("turnip: Fix the lack of WFM before indirect draws")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17378>
The only thing it's used for is to get the image view, and we can't rely
on it existing anyway. With dynamic rendering, we only have the format
of the attachments and sample count, so moving forward we can't rely on
anything other than that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17378>
Note: The format __DRI_IMAGE_FORMAT_ABGR16161616 was already
in the dri2_format_table, but had been hidden from outside view,
for lack of an official DRM fourcc. Since its fourcc has now been
assigned, it is safe to reveal the format.
Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14580>
The DRM fourcc for this format is DRM_FORMAT_ABGR16161616 = 'AB48', not
__DRI_IMAGE_FOURCC_RGBA16161616 = 'RA48'. This should have no outward
effect for clients, since the format does not get revealed by
dri2_query_dma_buf_formats, and is otherwise only used within the
library.
Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14580>
When a flush happens the per-context setup is used to hold the fence
for the last scene sent to the rasterizer. However when multiple
contexts are in use, this fence won't get returned to be blocked on.
Instead move the last fence to the rasterizer object, and return
that instead as it should be valid across contexts.
Fixes gtk4 bugs on llvmpipe since overlapping vertex/fragment.
Fixes: 6bbbe15a78 ("Reinstate: llvmpipe: allow vertex processing and fragment processing in parallel")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17702>
Semaphore waits in a command buffer only affect the first jobs we execute
in each hardware queue since jobs in the same queue are serialized against
each other. Binning syncs in particular, only affect CL jobs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17594>
We only want to cleant BCL barrier flags if we consume a BCL barrier.
For example, if the client records a barrier for an index buffer
it should apply to the next draw call that uses an index buffer
which may not be the current draw call but one coming after it.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17594>
previously this would opportunistically promote barriers to the unordered
cmdbuf only if a renderpass was active or there was no access, which
was the wrong approach
instead, opportunistically promote barriers to the unordered cmdbuf
any time it's possible to do so, which is when one of these conditions is true:
* when there is no access to the resource on the current cmdbuf
* when the only access to the resource is in the unordered cmdbuf
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17667>
Instead of having 2 VkMemoryType pointing to the same VkMemoryHeap, we
have each VkMemoryType with VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT (one
host visible, the other not) point to its own VkMemoryHeap. For the
local heap that is host visible, we'll use the
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag at GEM BO creation.
When the smallbar uAPI is not available we fallback to a single heap
and do not use I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS.
v2: Handle probed_cpu_visible_size == probed_size (Matthew)
v3:
* Jordan: Use region info from devinfo
v4: Also make the vram host visible heap as local (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16739>
The CTS does a lot of 1x1x1 compute shaders (all that stuff like
dEQP-GLES31.functional.shaders.builtin_functions.precision.mul.highp_compute.scalar)
which finish with store_ssbos. Instead of doing the invocation loop in
that case (which LLVM has to later unroll), just emit the single
invocation's store.
Fixes timeouts running
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36, which does
a spectacular number of SSBO stores in a long 1x1x1 compute shader.
Reduces runtime of on llvmpipe from 66s to 29s locally, and virgl from
1:38 to 43s. virgl
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.22
goes down to 7 seconds.
Fixes: #6797
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17730>
Let's be slightly more defensive here. If a texture image doesn't have
an associated pipe_resource allocated, then render_texture() will pass
that along to _mesa_update_renderbuffer_surface(), which will crash on a
NULL pointer dereference. So, if there isn't a pipe_resource, then we
should just skip this altogteher.
Today, this isn't an issue, because each gl_texture_image always
allocates a pipe_resource up front. On a branch of mine, I prototyped
some improvements to the compressed texture fallback handling, where it
would defer resource allocation, examine the source image's block data,
and dynamically select a format based on that, then allocate it later.
With that prototype in place, we saw crashes the Android "My Talking
Tom" series of games, which appear to be attaching ASTC textures to a
framebuffer color attachment. That FBO would be incomplete anyway, as
ASTC textures aren't renderable, but we got into a situation where the
render-to-texture code was crashing due to the lack of pt before it
could properly signal that it was incomplete and bailing.
Technically, we don't need this now, but I figure that being defensive
won't hurt and this would probably save whoever encounters such an issue
in the future a bunch of frustrating debugging.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17508>
If we don't append subpass->self_dep_info last, other __vk_append_struct()
calls will update its pNext chain which lives in the subpass which
should be treated as immutable. This is easily fixable by just making
it the last thing we append to the chain.
Fixes: 7e11cdc77a ("vulkan/render_pass: Pass sample locations to barriers")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17748>
No shader-db changes on any Intel platform
Fossil-db results:
Tiger Lake
Instructions in all programs: 156926440 -> 156926470 (+0.0%)
Instructions hurt: 15
Cycles in all programs: 7513099349 -> 7513099402 (+0.0%)
Cycles hurt: 15
Ice Lake and Skylake had similar results. (Ice Lake shown)
Cycles in all programs: 9099036492 -> 9099036489 (-0.0%)
Cycles helped: 1
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
The changes to fs_visitor::validate() helped track down a place where I
initially forgot to convert a message to the new sources layout. This
had caused a different validation failure in
dEQP-GLES31.functional.tessellation.tesscoord.triangles_equal_spacing,
but this were not detected until after SENDs were lowered.
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19951145 -> 19951133 (<.01%)
instructions in affected programs: 2429 -> 2417 (-0.49%)
helped: 8 / HURT: 0
total cycles in shared programs: 858904152 -> 858862331 (<.01%)
cycles in affected programs: 5702652 -> 5660831 (-0.73%)
helped: 2138 / HURT: 1255
Broadwell
total cycles in shared programs: 904869459 -> 904835501 (<.01%)
cycles in affected programs: 7686744 -> 7652786 (-0.44%)
helped: 2861 / HURT: 2050
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 141442369 -> 141442032 (-0.0%)
Instructions helped: 337
Cycles in all programs: 9099270231 -> 9099036492 (-0.0%)
Cycles helped: 40661
Cycles hurt: 28606
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
The code assumed that if the source was d32s8 then the destination would
also be d32s8, in particular that depth_base_addr/stencil_base_addr
would also be filled out. Move the destination and source handling into
two different ifs with different conditions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17684>
This was missed when we added support for VK_KHR_depth_stencil_resolve.
There is a similar feature where the stencil aspect of a D24S8 can be
copied "tightly" in CopyImageToBuffer, but it used the texture swizzle
and so required the 3d path. To get it to work with the 2D path, which
is required for resolves, we have to instead use the A8_UNORM format,
which works for texture sampling even for tiled images. We also have to
reuse the pre-existing image views because subpass resolves work on
image views rather than images, whereas before the fixup was applied
while creating the image view. This means threading through the
corresponding "opposite" format through setup, src, and dst functions,
doing the fixup there (through some shared helpers), and then getting
every user to specify the right format. As a bonus, we no longer need to
force the 3d path for the CopyImageToBuffer and CopyBufferToImage
special cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17684>
primitive_param takes up 2 vec4's. Remove an align that I don't
understand.
The align upset
Test case 'dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcast_vec4'..
deqp-vk: ../src/freedreno/ir3/ir3_nir.c:1039:
void ir3_setup_const_state(nir_shader *, struct ir3_shader_variant *, struct ir3_const_state *):
Assertion `constoff <= ir3_max_const(v)' failed.
with an older version (android11-tests-dev branch) of deqp-vk. This is
because ir3_nir_opt_preamble uses the function for the worst case but
the function fails to replace the align by the worst case.
No regression with dEQP-VK.*tess*.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17570>
RECT textures used to be required to be supported by drivers. But since
the state-tracker learned how to lower these to 2D textures, some
drivers no longer support them.
While we have lowering in place for this, lowering it involves some
needless overhead. So let's just use a 2D texture instead of a RECT
texture.
Because having two versions and switching between them is more
complicated than it needs to be, let's just always use a 2D texture.
Similarly, let's just always multiply the reciprocal here, so we don't
have to test for PIPE_CAP_TGSI_DIV first.
Cc: mesa-stable
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17707>
this adds a mangled variation on nir_inline_uniforms that enables inlining
from any uniform buffer in order to try inlining every possible load
if the shader is too small or the ssa_alloc delta from inlining is too small,
then inlining is disabled for that shader to avoid pointlessly churning
the same shaders for no gain
with certain types of shaders, the speedup is astronomical
before:
dEQP-VK.graphicsfuzz.cov-int-initialize-from-multiple-large-arrays (4750.76s)
after:
dEQP-VK.graphicsfuzz.cov-int-initialize-from-multiple-large-arrays (0.505s)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17722>
Yes this is a round trip, but X_PresentPixmap is not itself a blocking
operation, it just instructs the server to do the next presentation at
some time. More importantly, if _we_ don't catch the presentation error,
xlib's error queue will, and the calling code is certainly not prepared
to handle errors from Present.
Forcing the round trip here is also a bit more correct semantically.
This is the end of the Vulkan client part of the present queue, and the
X_PresentPixmap request transfers the queue operation to the server, so
we should not return until we are sure the handoff has happened.
Fixes some flakiness with piglit@glx-visuals-* with zink+radv.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17685>
Now there are two paths for push constants.
When it's range is under 128b, we can use shared consts.
When it's over 128b, we can instead do loading data through
regular path, which is same as the previous way.
Now we can satisfy emulations like vkd3d that requires 256b for
its root signatures and we think it fairly maps to push constants
rather than inline uniform blocks that requires one indirection.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
Follow the way blob is doing for PushConstants though it supports only
128b, same as previous.
v1. Rename tu_push_constant_range.count into dwords to redue confusion.
( Danylo Piliaiev <dpiliaiev@igalia.com> )
v2. Enable shared constants only if necessary.
v3. Merge the two draw states TU_DRAW_STATE_SHADER_GEOM_CONST and
TU_DRAW_STATE_FS_CONST as shared constants are used.
Note that this leaves tu_push_constant_range in tu_shader so we could
use it again in the following patch.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
Adds a shared consts base offset and a size of it(dwords) to ir3_compiler
since they might be depending on gpu generations. (Danylo Piliaiev <dpiliaiev@igalia.com> )
Adds a flag to present whether shared consts are enabled to
ir3_shader_options and then it sets to ir3_const_state when creating
an ir3 variant. Although this state is not per-shader state, this is
necessary when figureing out real constlens.
v1. Define a hw quirk for geometry shared const files and use it when
calculating const length.
v2. Don't hardcode when calculating a safe const length.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
According to the observation on a630/a650/a660, max_const_pipeline has
to be 512 when all geometry stages are present. Otherwise a gpu hang
happens. Acoordingly maximum safe size for each stage should be under
(max_const_pipeline / 5 (stages)).
Only when VS and FS stages are present, the limit is 640.
v1. Align max_const_safe to 4 vec4's.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
This is a huge pain because it's an array, meaning that accessing
an entry in the array now depends on the validator version to use
the right element stride.
We could always just store the v1 and downconvert if needed... but
this isn't *that* bad that I felt I had to do it that way.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
This starts actually updating the always-read/never-written
masks while processing the shader. Note that we follow DXC's
lead here and treat "always read" as "sometimes read."
This isn't strictly required, but might help drivers out.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
This version of the validator starts adding usage masks into
the DXIL, which then are expected to match the PSV and signature
data. The usage masks are "correct" meaning that the never-writes
mask no longer includes bits outside of components 0-3.
A future change will actually compute useful masks.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
We're about to lower I/O to scalar, which means we'll end up with
multiple writes to position, and none of them has enough info to
fill in the blanks.
This causes a test that previously crashed on WARP (due to
StoreOutput with an undef not being handled) to fail more
gracefully - but that failure means that the test spends
forever just outputting errors, so explicitly skip it.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
First, preprocess the signatures, strictly based on the variables
in the nir shader. Then, later, after the actual shader contents
have been processed, we emit the metadata.
This lets shader processing rely on the pre-processed data (e.g.
the row -> ID mapping needed for large VS inputs) while also allowing
the signature data to rely on data gathered during the shader traversal
(e.g. which components are actually used).
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
Instead of using the short-lived semantic structure (that's used to
fill out the long-lived signature and PSV data), use the long-lived
ones. This is staging so we can hold off on emitting the metadata
until later.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
Instead of starting a new block when the kcache handling failed,
try to continue scheduling instructions until kcache allocation
fails for all ready instruction.
With that we avoid a CF split withing an LDS fetch/read group.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17678>
When lowering gl_Clipertex the driver_location may no longer correspond
to the array index, so fill the array by counting the array index up
according to outputs that need to be handled by the state setup.
Fixes: 3340c7ce35
r600/sfn: lower CLIPVERTEX to clip planes
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17678>
PAN_MESA_DEBUG=overflow will place objects as close as possible to a
protected region at the end of the buffer, so that overflows segfault.
Caught the bugs in all four of the preceding commits.
v2: memset the BO to 0xbb to catch code expecting zeroed allocations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
Otherwise the indirect draw shader can read uninitialised data for the
stride, and the position varying buffer may be outside the heap BO.
The next commit fixes a bug that masked this one.
Fixes: 2e6d94c198 ("panfrost: Add helpers to support indirect draws")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
create_vertex_elements_state is sometimes called with a too large
num_elements argument, for example with util_blitter, which causes a
buffer overflow.
There is no documentation to forbid this practice, so don't rely on
so->num_elements being correct and instead use the vertex shader
attribute count, which matches the value used to allocate the
descriptors.
Use attributes_read_count rather than attribute_count because the
latter also includes images and PAN_VERTEX_ID/PAN_INSTANCE_ID.
Fixes: 76de3e691c ("panfrost: Merge attribute packing routines")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
nr_images is the trigger for allocating double the number of buffers
for attributes. When there are no images, there is not always enough
space for ALIGN_POT(k, 2) to not move k out of bounds, so don't
execute the line in that case.
Fixes: dc85f65e05 ("panfrost: emit shader image attribute descriptors")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
Like 90a8fb0355.
fossil-db results:
All Skylake and newer Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141442369 -> 141442363 (-0.0%)
Instructions helped: 1
Cycles in all programs: 9099270231 -> 9099270187 (-0.0%)
Cycles helped: 1
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17637>
Before this patch, we were leaking compressed resources in iris_create_surface.
Specifically, when we failed to create an uncompressed ISL surface and view for
a compressed resource, we didn't unreference the resource pointer we referenced
into the pipe_surface.
Fix this by delaying the pipe_surface initialization code to after attempting
to create the uncompressed surface and view.
Cc: 22.1 <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17598>
Before this patch, we were leaking surface states in iris_create_surface.
Specifically, when we failed to create an uncompressed ISL surface and view for
a compressed resource, we didn't free surface states we allocated for it.
Fix this by attempting to create the uncompressed surface and view before we
allocate the surface states.
Cc: 22.1 <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17598>
Now that the old state tracking code is removed, implementation details
no longer need to be leaked out of this single source file. Remove structs,
function declarations, 'd3d12_' prefixes, and add static when possible.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17688>
Most call sites for transitions will only apply transitions to one or two
resources, and don't need to use the bo set, where each call is guaranteed
to insert the bo, only to walk the set immediately afterwards. Instead, they
can just append the barriers to the dynarray directly and skip the bo set.
Draws and dispatches still use the append approach, to accumulate the full
set of state needed for each subresource for the case where a single
[sub]resource is bound to the pipeline in multiple places.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17688>
When a resource is destroyed, we'll need to let the contexts know.
This is guarded by the submit mutex, because we'll already be holding
that for at least one place where we want to iterate this list, and
it's low-frequency enough that re-using it is simpler than adding more
locks and creating confusing lock ordering.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17688>
Otherwise we will mix and match mesa's custom cross slot packing
with arb_enhanced_layouts style packing and we won't correctly
handle the size of the vars needed for the mesa custom packing.
The code was working correctly if the shader interface had both
a matching input and output but when we only had one side of
the interface we were only marking a single slot location as
packed.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: e5122a5543 ("glsl: add a NIR based varying linker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6853
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17550>
When we initialize the graphics pipeline state, we skip VI and FSR state
if they're 100% dynamic. We need to do this if the current set of
dynamic things contains VI/FSR or if the set of dynamic state already in
the vk_graphics_pipeline_state has them dynamic. Look state->dynamic
after we've merged instead of just looking at the dynamic set from the
VkGraphicsPipelineCreateInfo we were passed.
Also, when we validate, we need to assume that VI and FSR exist or else
we'll assert if dynamic VI or FSR are set.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17696>
Test runtime has crept up with more CTS tests and more features. The last
vk_full 1/2 run I tried timed out at:
Pass: 268488, Fail: 2, ExpectedFail: 7, Warn: 1, Skip: 602571, Duration: 1:29:29, Remaining: 45
Rude.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17662>
Allow folding constants/undef sources by sharing more code with the image_store
16bit folding pass.
Allow more than one set of sources because RADV wants two, one for
G16 (ddx/ddy) and one for A16 (all other sources).
Allow folding cube sampling destination conversions on radeonsi/radv because
I think the limitation only applies to sources.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16978>
The script that generates the format tables does not set every pipe_format.
In practice, the length of the format tables is equal to PIPE_FORMAT_COUNT.
I just added the explicit size to future-proof it.
(If the largest valid format is not part of the format tables,
there will be a mismatch between the array length and PIPE_FORMAT_COUNT)
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17490>
Empirically, a successful LAVA boot time should take less than 3
minutes.
LAVA itself is configured to attempt thrice to boot the device,
summing up to 9 minutes.
It is better to retry the boot than cancel the job and re-submit to
avoid the enqueue delay.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17646>
The shared reg usage involved in the subgroup-related macros can cause
trouble for the spiller, and spilling may be implicated in CTS failures
with old versions of the subgroup tests, so let's make sure we get some
coverage. It does seem to catch a couple of failures.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17642>
In debugoptimized builds, DEBUG is not set (and neither is NDEBUG). The
intention of NDEBUG is to disable assertions. As such, list assertions should be
gated on !NDEBUG as opposed to on DEBUG.
But assert() is already disabled in that case, so we don't need our own special
assert (Eric).
This would have caught an assertion failure (due to the wrong iterator used)
sooner for the Valhall compiler.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17408>
It will get turned into SSA and copy-propagated in NIR, no need to walk
the IR collapsing it here.
iris shader-db results appear to be noise:
total instructions in shared programs: 8932195 -> 8932147 (<.01%)
instructions in affected programs: 537 -> 489 (-8.94%)
LOST: 12
GAINED: 11
lost/gained are simd32 switches in unigine, l4d2, portal2, asphalt9.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17613>
The DISPATCH_TASKMESH_INDIRECT_MULTI_ACE packet has a firmware bug,
it hangs the GPU when the draw count is zero.
This commit adds a workaround sequence using COND_EXEC packets
which make sure that this indirect packet is never executed when
the draw count is zero.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
Add a separate flush_bits field for tracking cache
flushes in the ACE internal cmdbuf.
In barriers and image transitions we add these flush bits to ACE.
Create a semaphore in the upload BO which makes it possible
for ACE to wait for GFX for the purpose of synchronization.
This is necessary when a barrier needs to block task shaders.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
This implements NV_mesh_shader draw calls with task shaders.
- On the GFX side:
DISPATCH_TASKMESH_GFX for all draws
- On the ACE side:
DISPATCH_TASKMESH_DIRECT_ACE for direct draws
DISPATCH_TASKMESH_INDIRECT_MULTI_ACE for indirect draws
Additionally, the NV_mesh_shader indirect BO layout is
incompatible with AMD HW, so we add a function that copies
that into a suitable layout.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
This is mainly going to be used by task shaders, because
the HW implementation mismatches the API:
- In the API, task shaders are considered graphics shaders which
are part of a graphics pipeline and the draws are submitted to
a graphics queue.
- The HW requires the driver to dispatch task shaders on
an async compute queue.
When a pipeline is bound that has a task shader, create a
driver-internal ACE (async compute engine) cmdbuf which
we are going to submit to an ACE queue.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
in some cases it becomes desirable to "maybe" stop and start the current
renderpass, such as when updates MAY result in layout changes for attachments
for such cases, avoid splitting the renderpass unless it actually needs to
be split
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17640>
Fix defects reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member sgpr_spill_slots is not
initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member vgpr_spill_slots is not
initialized in this constructor nor in any functions that it calls.
Fixes: 7d34044908 ("aco: refactor VGPR spill/reload lowering")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17583>
this is a weird corner case where glsl permits a zero value, so clamp to 1
and then don't emit any vertices to avoid driver hangs
affects:
dEQP-GL45-ES31.functional.geometry_shading.emit.points_emit_0_end_0
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17639>
Two advantages:
* When using NIR_DEBUG=nir_print_xx, will print outcome only if
there is a change
* We can use NIR_PASS(_, ...) instead of NIR_PASS_V, that has
slightly more validation checks.
This includes:
* v3d_nir_lower_image_load_store
* v3d_nir_lower_io
* v3d_nir_lower_line_smooth
* v3d_nir_lower_load_store_bitsize
* v3d_nir_lower_robust_buffer_access
* v3d_nir_lower_scratch
* v3d_nir_lower_txf_ms
As we are here we also simplify some of them by using the
nir_shader_instructions_pass helper.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>
The trigger for this commit was when we found that we were not calling
nir_metadata_preserve when lowering the layout code. But then I found
that it would be better to just update the code to use
nir_shader_instructions_pass, so we can avoid to manually:
* Initialize the nir_builder
* Call nir_foreach functions (we pass the callback)
* Call nir_metadata_preserve functions (that as mentioned we were not calling)
We also get a nice cleanup of several functions by reducing the number
of parameters (we pass a state struct).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>
Without it we got a metadata assert:
deqp-vk: ../src/compiler/nir/nir_metadata.c:108: nir_metadata_check_validation_flag: Assertion `!(function->impl->valid_metadata & nir_metadata_not_properly_reset)' failed
if we try to use NIR_PASS(_, instead of NIR_PASS_V (that among other
things, do more validations).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>
VK_NULL_HANDLE descriptor set layouts are allowed when creating pipeline
layouts without VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT.
From VUID-VkPipelineLayoutCreateInfo-graphicsPipelineLibrary-06753:
> If graphicsPipelineLibrary is not enabled, elements of pSetLayouts
> must be valid VkDescriptorSetLayout objects
From VUID-VkPipelineLayoutCreateInfo-pSetLayouts-parameter:
> If setLayoutCount is not 0, pSetLayouts must be a valid pointer to an
> array of setLayoutCount valid or VK_NULL_HANDLE VkDescriptorSetLayout
> handles
Signed-off-by: Ricardo Garcia <rgarcia@igalia.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17629>
From the Vulkan spec:
"If poolSizeCount is not 0, pPoolSizes must be a valid pointer to an
array of poolSizeCount valid VkDescriptorPoolSize structures"
So 0 is actually allowed and there is a CTS to check it is handled gracefully.
Fixes:
dEQP-VK.api.descriptor_pool.zero_pool_size_count
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17648>
Previously we would just unroll the loop one extra iteration and let
other optimisation passes clean up the mess. This worked to a degree
but if the loop happened to be nested inside another loop we would
end up with phi chains that would block other passes from being able
to do the cleanup.
With this commit we explicitly clone the variables create by lcsaa
and insert them directly in the last continue branch after we are done
unrolling. With this optimisation passes can recognise both sides
of the if output the same values and can progress further.
Help with the issues described in:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/6051
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17611>
LRZ fast-clear works on all gens, however blob disables it on
gen1 and gen2. We also elect to disable fast-clear on these gens
because for close to none gains it adds complexity and seem to work
a bit differently from gen3+. Which creates at least one edge case:
if first draw which uses LRZ fast-clear doesn't lock LRZ direction
the fast-clear value is undefined.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6829
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17599>
We're getting several more 630s in the lab, but need a bit of time to swap
out some broken old ones and stabilize the new ones. Fritz thinks this
should be done in an hour or so, but I want to turn off the CI for main so
that we don't block anyone else.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17641>
We should not scrub raster dedicated states when dynamic state includes
VK_DYNAMIC_STATE_RASTERIZER_DISCARD_ENABLE.
Test:
- dEQP-VK.pipeline.extended_dynamic_state.*_raster
- dEQP-VK.api.pipeline.pipeline_invalid_pointers_unused_structs.graphics
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17582>
zink_kopper_acquire_readback will flush any outstanding clears, this means that
the current clears need to be applied first before calling zink_kopper_acquire_readback.
This was already done for native_blit and native_resolve, also do this for the emulated draw path.
Seen as intermittent failures in cts case GTF-GL33.gtf21.GL2FixedTests.buffer_clear.buffer_clear.
Cc: mesa-stable
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17631>
Spirv spec does not allow the use of OpEmitVertex or OpEndPrimitive when there are multiple streams.
Instead emit the multi-stream version of these with stream set to 0.
This issue was seen when testing cts case KHR-GL46.transform_feedback.draw_xfb_stream_test
Fixes: 35e346f428 ("zink: handle vertex streams")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17513>
When we made depth/stencil dynamic, we lost the optimization. This is
particularly important for cases where the stencil test is enabled but
never writes anything as certain combinations with discard can cause
the stencil write (which doesn't do anything) to get moved late which
can be a measurable perf hit. According to 028e1137e6 ("anv/pipeline:
Be smarter about depth/stencil state", it was a couple percent for DOTA2
on Broadwell back in the day. No idea how it affects current titles.
This may also improve the depth/stncil PMA workarounds on Gen8 and Gen9
since they're now looking at optimized depth/stencil state.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
For one thing, we were deceptively setting it wrong in genX_cmd_buffer.c
and then overwriting it in each of of gfx7_cmd_buffer.c and
gfx8_cmd_buffer.c. Pull it all into genX_cmd_buffer.c so it's no longer
duplicated. Also, stop doing the PATCHLIST conversion in anv_pipeline.c
and just store the number of patch vertices.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
The only reason why we recorded them per-sample-count is because Intel
hardware is weird starting with Broadwell. The API, requires that the
dynamic sample pattern be reset every time the sample count changes so
we only need to record the pattern for the current sample count.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
If the driver can do 16-bit ALU ops, then store RelaxedPrecision phi
values into 16-bit NIR variables with downconverts/upconverts on the way
in/out.
This has no impact on shader-db on freedreno (not that we have a ton of
GLES content there), but it does cause an ANGLE-translated CTS shader on
vulkan to get consistent conversions between two copies of a value, and
avoid a test bug.
Reviewed-by: Emma Anholt <emma@anholt.net>
Closes: #6585
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14018>
This can't do it in the loop because it doesn't easily know what
attributes use a binding.
We could do it in a separate loop, but there's no point, especially since
zink does CmdSetVertexInputEXT() after CmdBindVertexBuffers2().
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: c335a4d70e ("radv: dynamically calculate misaligned_mask for dynamic vertex input")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17521>
We can't do uniform buffer objects and from the hardware
perspective constants (uniforms) and immediates are treated in
the same way. They are uploaded together and fit together into the
(rather low) total constant limit. Therefore, there is actually no
advantage in converting immediates to uniforms, and a whole lot of
disadvantages (less possible optimizations and no inlining).
Fixes the dEQP regressions from https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770
The tests failed because when there is an indirect array access the
compiler inserts a big if ladder and converting the temp to a uniform
means more instructions because the ifs cant be lowered at least
partially to selects. It is particularly visible, because the code
NIR currently emits for the indirect access doesn't really fit the
hardware, see: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6366
No change in my shader-db, so this concerns the tests only.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17576>
This implements the "tortoise and the hare" algorithm for detecting
cycles in graphs. We use the caller's iterator as the hare and our own
internal copy as the tortoise. Conveniently, VkBaseOutStructure (and
VkBaseInStructure which is identical except the pointer type on pNext)
have a pointer we can use for the tortoise and an sType which we can use
for a counter to ensure we only increment the tortose every other loop
iteration.
There are more efficient algorithms than tortoise and hare but they
require allocating memory for something like a hash set of seen nodes.
Since this for debug purposes only, it's ok for it to be a bit
inefficient in the case where it hits the assert. In the usual case of
no loops, it's the same runtime efficiency as the unchecked version
except that it does a tiny bit of math and 50% more pointer chases.
Version 1 worked fine with clang and with GCC 12.1 with -O0 but not with
optimizations. See https://gitlab.freedesktop.org/mesa/mesa/-/issues/6895.
Unfortunately, the first version required modifying a temporary declared
const inside the for loop and that seems to have been the problem. This
version instead has an iterator struct which is managed by an outer for
loop and the inner for loop exists to declare the user's requested
iteration variable and manage the actual iteration. Because the outer
for loop is effectively `for (bool done = false; !done; done = true)`,
it will execute exactly once, regardless of the inner loop, so break and
continue inside the inner loop should work the same as if it's a single
for loop.
The other major difference with the new version is that the code is the
same for debug and release except the half_iter and loop check are gone.
I've verified by hand that this produces virtually identical code to the
old simple iterators on both GCC andl clang with an optimized build.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17630>
Instead of having stencil writes as an out parameter of the optimization
function, we add a new write_enable field for stencil that's equivalent
to the similarly named field for depth. This doesn't mean drivers must
actually support disabling stencil writes independently but the
information may be helpful on some hardware.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17328>
Because the data register of atomic VMEM instructions
is shared between src and dst, it might be necessary
to create live-range splits during RA.
Make the live-range splits explicit in WQM mode.
Totals from 7 (0.01% of 134913) affected shaders: (GFX10.3)
Latency: 17209 -> 17210 (+0.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15347>
When the application uses depth values outside of the [0.0,1.0] range
with VK_EXT_depth_range_unrestricted, or when explicitly disabled.
Otherwise, the driver can clamp to [0.0,1.0] internally for optimal
performance.
From the Vulkan spec "in 28.10.1. Depth Clamping and Range Adjustment":
"If depth clamping is not enabled and zf is not in the range [0, 1]
and either VK_EXT_depth_range_unrestricted is not enabled, or the
depth attachment has a fixed-point format, then zf is undefined
following this step."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16349>
This feature allows shaders to use pointers to buffers which may
not be bound via descriptor sets. Access to these buffers is done
via global intrinsics.
Because the buffers are not accessed through descriptor sets, any
live buffer flagged with VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT_KHR
can be accessed by any shader using global intrinsics, so the driver
needs to make sure all these buffers are mapped by the kernel when
it submits the job for execution.
We handle this by tracking if any draw call or compute dispatch in
a job uses a pipeline that has any such shaders. If so, the job is
flagged as using buffer device address and the kernel submission
for that job will add all live BOs bound to buffers flagged with the
buffer device address usage flag.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>
This adds support for global 64-bit GPU addresses as a pair of
32-bit values. This is useful for platforms with 32-bit GPUs
that want to support VK_KHR_buffer_device_address, which makes
GPU addresses explicitly 64-bit.
With the new format we also add new global intrinsics with 2x32
suffix that consume the new address format.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>
If we have 2 pipelines that consume the same push constant data
but where one of them only uses direct access and the other has
indirect access, a draw with the first pipeline would clear the
dirty flag without updating the UBO and by the time we bind and
draw with the second pipeline we won't upload the constants either
because the first draw cleared the dirty flag.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17536>
Since we allocate this ourselves we can immediately add it to the
job at the time we allocate it.
This also fixes a bug we introduced when we implemented inline
uniforms because since that commit, if we had an inline uniform
buffer at index 1 which happend to have indirect access we would
track it in slot 0 instead of slot 1, potentially overwriting
the push constant buffer reference.
Fixes: ea3223e7a4 ('v3dv: implement VK_EXT_inline_uniform_block')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17536>
We have code in there to allocate various segments of MAX_PUSH_CONSTANTS_SIZE
to handle the case of various draw calls in the same command buffer requiring
different push constants, so we are implicitly expecting it to be larger than
this. In fact, this only works now because when we allocate a BO we are always
at least allocating a full page, so the least we ever allocate is 4096 bytes,
so be explicit about it to avoid confusion.
Also, since we were always mapping MAX_PUSH_CONSTANTS_SIZE and the mapping
always starts at the beginning of the BO, it looks like after the first copy
when the resource offset is not zero, we would be writing outside the mapped
range. Always map the full size of the BO instead to ensure this doesn't
happen.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17536>
Those fields have confusing names. They hold non dynamic state,
specified at pipeline creation that we copy into the dynamic state of
the command buffer whenever we bind a pipeline. This non dynamic state
might get picked up in the dynamically emitted instructions of the 3D
pipeline because our HW packets are not exactly splitted like the
Vulkan API.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17601>
This implements the "tortoise and the hare" algorithm for detecting
cycles in graphs. We use the caller's iterator as the hare and our own
internal copy as the tortoise. Conveniently, VkBaseOutStructure (and
VkBaseInStructure which is identical except the pointer type on pNext)
have a pointer we can use for the tortoise and an sType which we can use
for a counter to ensure we only increment the tortose every other loop
iteration.
There are more efficient algorithms than tortoise and hare but they
require allocating memory for something like a hash set of seen nodes.
Since this for debug purposes only, it's ok for it to be a bit
inefficient in the case where it hits the assert. In the usual case of
no loops, it's the same runtime efficiency as the unchecked version
except that it does a tiny bit of math and 50% more pointer chases.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17596>
In addition to hopefully generating shorter code, this optimizes out a
comparison of a mediump-cast value in
dEQP-GLES2.functional.shaders.algorithm.rgb_to_hsl_fragment passed
through ANGLE, and allows the test to pass. We believe it to be a
test bug, but emitting better code like apparently everyone else does
is also a fine result.
No change on GLES gfxbench shaders.
Fixes: #6585
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17546>
dri3_glx.c includes xshmfence and glxcmds.c includes xf86vm, neither of
which are listed as dependencies of the glx lib in the meson.build file.
Consequently, those files would fail to compile on machines that did not
have xshmfence and xf86vm installed globally. This commit rectifies the
issue by adding the missing dependencies
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17585>
GL drivers have an implicit default of "in bounds" for unwritten clipdistance
values, but some (layered) drivers have to deal with api mismatch which
prevents that implicit value from being used after the shader
gets mangled by various compiler passes
to avoid issues here, write out all members of the clipdistance array
with zeroes at the very start of the shader such that any values the
shader actually writes will naturally overwrite the implicit zero and
any unwritten values will now be written as zero
fixes#6845
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17498>
this can get called from multiple threads with the recent llvmpipe
overlapping rendering changes, so make sure to lock around the
map/unmapping so they can't race.
This should fixes some crashes seen with kwin.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Tested-by: Adam Williamson (Fedora)
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17531>
When doing 10bit encoding in ffmpeg it uses
VaDeriveImage, and that could result in missing
mapping the chroma buffer of the input frame.
This WA to disallow ffmpeg using VaDeriveImage
function, so that VaCreateImage and VaPutImage can
be used and WA the chroma buffer mapping issue.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17472>
When using a static callable stack, the required scratch has already
been allocated.
Dynamic stacks are located at the end of scratch memory
and are allocated on demand using radv_set_rt_stack_size.
Static stacks live at the start of scratch memory and are allocated in
create_rt_shader by setting scratch_size.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17579>
in the scenario where:
* at least 1 color buffer was bound and a depth buffer was bound
* no color clear was enabled
* a zs clear was enabled
* the zs clear was never flushed
* the zs clear needs a renderpass
* the fb state changes
the color buffer(s) would be unbound, following which the depth buffer unbind
would trigger a renderpass, which would utilize the just-unbound color buffers,
which have no batch tracking, thus creating a case where the surface was destroyed
while it was still in use
cc: mesa-stable
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17366>
formats like GL_RGB10_A2UI can be cleared with out of range values,
so to ensure consistent driver behavior, pre-clamp to the valid range
affects:
KHR-GL46.direct_state_access.renderbuffers_storage_multisample
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17366>
We don't need to go grubbing around in the ARB program when we can use the
right variable type at prog_to_nir time. This does leave
fp->system_values_read/inputs_read as they were, but I don't see anywhere
that that matters (the NIR will have its info gathered appropriately, and
other lowering may also cause mismatch between the gl_program and the
NIR).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17528>
This uses APIs that are not available on Win7. Since this is a build-time
configuration, and since we can't use the SDK version as an indicator
(since you can support Win7 via new SDKs), a new option is added to allow
disabling it, to maintain Win7 support if desired.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17431>
Previously we suballocate only for host visible memory type to reduce
the kvm mem slot usage. That is no longer an issue given the limit has
been raised. However, we should still suballocate to make layering
clients performant. So we just suballocate regardless of mem type.
This change also increases the allowed suballocation size request from
64K to 128K, which makes layering clients happier.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17497>
get_slot_components() returns the total number of output components
for arrays for initial evaluation phase, but during the packed->inlined
conversion the arrayed size must be normalized to the slot's component count
in order to effectively catch and inline the array
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17404>
We used to do it for every queue, which was duplicate work as pstate is
per-device. It could also cause trouble when multiple hw_ctx are created as
the call will succeed for only one of them and the rest will return -EBUSY.
Simplify and fix this by only setting for the first non-null hw_ctx.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17541>
while some (tg4) sample ops can use different bit sizes in spirv, most
cannot, and all the shader variables are always emitted as 32bit, so
ensure the 32bit type is always what's being used for sampling
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17427>
It's everyone's favourite day, infrastructure maintenance Friday.
This includes manual disables for a618-vk and zink-anv-tgl, because
apparently the disable-on-variable rules don't carry through to those
jobs for ... some reason.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17553>
When the shader state is destroyed before the async shader compile is
done, we get a use after free in the async thread, as the shader state
it is operating on is gone. Fix this by dropping any pending job from
the async queue or wait for it to finish before destroying the state
by calling util_queue_drop_job.
Also call util_queue_fence_destroy, which would have caught this issue
by asserting that the queue_fence is in signalled state when the
shader state is destroyed.
Fixes: 1141ed5859 ("etnaviv: async shader compile")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17534>
Instead of trying to compact the surface state table to get rid of any
unused render targets, emit MAX(1, colorAttachmentCount) surface states
always. This ensures that secondaries will always match with primaries
when we go to do the copy since there's no rule requiring the secondary
to have VK_FORMAT_UNDEFINED when the primary has a NULL image view.
Fixes: 3501a3f9ed ("anv: Convert to 100% dynamic rendering")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17543>
On Linux, the static glapi path sees libGL.so implementing the static
glapi, and the drivers (libgallium_dri.so) updating/reading the TLS
vars.
On Windows, to allow libgallium_wgl.dll to be a full ICD, it's
responsible for implementing the actual static glapi. However, before
this change, OpenGL32.dll was also implementing the static glapi,
meaning that GL API calls from OpenGL32.dll didn't route to the driver
correctly because the TLS vars were never actually set - the driver set
its copy, and OpenGL32.dll read its own copy.
Now, always build a bridge and static version of glapi when not using
shared. The bridge version is linked into OpenGL32.dll, and the static
version is linked into the driver on Windows. GLES only builds with
shared glapi - but after this, shared glapi is not really needed on
Windows for GLES, since the driver has all of the data.
Fixes: f36921ef ("wgl: Refactor drivers to a libgallium_wgl.dll")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6560
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16713>
The sampler views array needs to be dimensioned by
PIPE_MAX_SHADER_SAMPLER_VIEWS, not PIPE_MAX_SAMPLERS.
This fixes out-of-bounds array writes when using more than 32
textures in a shader.
Also add some assertions to check array indexing elsewhere.
And change loop limits to be based on ARRAY_SIZE().
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
For the linear rendering fast path, we need to know whether the
texcoord argument to tex instructions comes directly from an FS input
(swizzling OK, but no arithmetic, etc). Use the input register info
to fill in the tex_info object.
This is part of a fix for some linear rendering bugs.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
We were saving the address of the constants[] and nir_constants[]
arrays in the jit structure. But those arrays went out of scope
before use.
This patch moves the constants[] array to the function scope and
consolidates the TGSI/NIR paths.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
Unlike image writes, buffer writes may access the same memory in
different ways, which we've seen in the past can cause problems. Use an
incoherent access to force flush/invalidate between accesses to the same
buffer, unless we know the barrier applies to images only.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17193>
Certain player has hard coded max_transform_hierarchy_depth_inter and
max_transform_hierarchy_depth_intra values set through VA-API, which
doesn't work on radeon HW. Until properly fixing it on player side,
temporarily adding this workaround to use calculated values instead.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17489>
Remove the previous compile-time early-ZS implementation and replace it with the
decoupled early-ZS implementation. This uses more efficient settings in some
cases (depth/stencil tests always passes or do not write), and fixes the
settings used in another case (alpha-to-coverage enabled with an otherwise
early-ZS shader.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Closes: #6206
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
If we know ahead-of-time that depth/stencil testing will always pass, it's
better to use weak_early than force_early. However, if depth/stencil testing
could fail (discarding pixels), we'd rather use force_early. Determine which
case we're in at CSO create time.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
The new early-ZS helpers are pure functions, leaf nodes of the call graph, and
implemented with a different algorithm from the "oracle" table of correct values
for various combinations of states. Further, incorrect settings often still pass
CTS while causing game bugs or inefficiencies. That combination makes the
helpers an excellent candidate for unit tests. Add some.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
Bifrost (and Valhall) separate early-ZS configuration into two fields: when does
the depth/stencil buffer update happen? and when are pixels killed by the
depth/stencil tests? The driver separately configures these to occur early
(before the shader executes) or late (after the ATEST instruction executes at
the end of the shader). Early tests are generally more efficient, but various
combinations of API state and fragment shader properties can require late
updates and/or late kills for correctness. Determining how to configure these
fields is nontrivial.
Our current implementation (on Bifrost) configures these fields at fragment
shader compile time and bakes the settings into the RSD. This is both wrong
(using early testing when late testing is required) and suboptimal (using late
testing when early testing would suffice). We need to defer this configuration
until draw time, when we know rasterizer and Z/S state.
Reclassifying at draw time (as we currently do on Valhall) would be expensive,
especially with the extra terms added in here. To cope, decouple the shader
classification from the draw-time configuration. Since there are only a few bits
of draw state involved, this implementation just calculates all possible states.
Then the draw time classification is just indexing into a lookup table.
The actual algorithm used to classify is written with correctness and clarity in
mind. Unlike the current classification algorithm (which tries to match what the
DDK does, poorly), this algorithm embeds its proofs of correctness.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
In theory this wait is required for correct behaviour of discarded threads with
ATEST. Mesa usually waits before the instruction after ATEST, so this wait will
get optimized out by va_merge_flow, but as our scheduler gets more sophisticated
this could become an issue.
Let's stay on the safe side and insert the recommended wait.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
In theory, ATEST can take any combination of registers for inputs.
Experimentally, however, ATEST requires the coverage mask in R60. This avoids
regressing the following dEQP tests, which write their coverage mask with
pixel-frequency-shading but without writing to the depth/stencil buffer.
dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_pixel.*
This issue is known to affect both Mali-G52 (v7) and Mali-G57 (v9). I am unsure
if this is a silicon bug or just an obscure implementation detail.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
We don't really want to base this on a late nir_gather_info for two
reasons:
1) The Vulkan spec says that if a sample-qualified input, SampleID, or
SamplePosition are in the entry-point's interface, you get
per-sample dispatch. This means we really should gather this
information before dead-code has a chance to delete anything.
2) We want to be able to add nir_intrinsic_load_sample_pos intrinsics
as part of lowering passes without causing per-sample interpolation.
This means nir_gather_info needs to stop gathering it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
We don't really want to base this on a late nir_gather_info for two
reasons:
1) The GL spec says that any static use of a sample-qualified input,
gl_SampleID, or gl_SamplePosition causes per-sample dispatch. This
means we really should gather this information before dead-code has
a chance to delete anything.
2) We want to be able to add nir_intrinsic_load_sample_pos intrinsics
as part of lowering passes without causing per-sample interpolation.
This means nir_gather_info needs to stop gathering it.
For 1, this doesn't actually get us quite there as GLSL IR may have
deleted something already. However, it does get us closer.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
On Intel, we have to do this because we can't ask for the per-sample
barycentrics without setting the per-sample dispatch bit or the GPU will
hang. However, nothing we're doing in this pass is Intel-specific and
it may be a useful optimization for someone else so we may as well make
it a generic NIR pass. This version actually does a bit more than the
current brw_nir_demote_sample_qualifiers() pass as it also handles
pre-nir_lower_io interp_dref_at* as well as a couple system values which
we can easily constant-fold.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
Pandecode is not thread-safe (for a large number of reasons) and does not even
try to be. This is a problem when tracing (or just using PAN_MESA_DEBUG=sync)
multithreaded applications. The most common symptom of the problem are assertion
failures deep in the red-black tree implementation, which is not thread-safe.
Just protect the whole thing by a "in pandecode?" mutex, since this is not
performance sensitive code and we don't really care about the extra
serialization incurred. As pandecode does not recurse into itself, we may simply
lock at the beginning and unlock at the end of each entrypoint in pandecode,
which is thread-safe regardless of how pandecode is used. A few entrypoints are
refactored to avoid early returns to keep the lock/unlock calls in obvious
visual pairs.
Fixes flakes when running the CL CTS with PAN_MESA_DEBUG=sync like we would in
CI (e.g: events.event_flush)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Tested-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17409>
The physical tile buffer size (and hence the maximum available tilebuffer size)
are implementation-defined. Track this information on the device so we can
correctly select tile sizes, instead of hardcoding the value for Midgard.
Implementation values are pulled from the "Tile bits/pixel" row of the public
Mali data sheet [1]. That row lists the maximum number of bits available for a
pixel given the maximum tile size and pipelining. For currently supported
hardware (v9 and older), that maximum tile size is 16x16. So those values should
be multiplied by (16 * 16 * 2) / 8 to get the physical size in bytes.
This may improve Bifrost/Valhall performance on workloads using multiple render
targets. It also gets us ready for the dazzling array of tile sizes available
with v10.
[1] https://developer.arm.com/documentation/102849/latest/
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17432>
Separate out "calculating the size of each pixel", "selecting a tile size", and
"calculating the colour buffer allocation". Then implement the middle (selecting
a tile size) with a simple constant time expression, rather than a loop. There's
a bit of related clean up in here.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17432>
The if statement we insert would insert a new block before the end block
(and remove the old pre-end-block). If the new block ended up later in
the HT due to its pointer's hash value, you'd emit another copy of the if
statement after the last one. I saw this happen up to 4 times in testing.
The worst case would be if all those additions and removals ended up
reallocating the HT, at which point we might use-after-free.
Fixes inconsistent shader-db results with geometry shaders.
Cc: mesa-stable.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17501>
va_pack asserts a large number of invariants about the instruction being packed.
If one of these fails (due to an invalid instruction), it's helpful to inspect
the failing instruction, as it may not be apparent in a large shader. Pass the
instruction through with all the assertions in va_pack for easier debugging.
Now assertion failures when packing are easier to debug:
Invalid invariant lo.type == hi.type:
= STORE.i32.flow2.wls br5, br2, wls_ptr[1], byte_offset:0
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17224>
When we assert out due to certain invalid encoding, it's helpful to know what
instruction is causing the failure, since it may not be obvious from the
assembly for large shaders. Now we get nice errors when failing:
Invalid opcode:
br0 = VAR_TEX.f32.flow8.store.skip.lod_mode.center , texture_index:0, varying_index:0
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17224>
Apparently, the point coord origin within a batch can change with Gallium
(seemingly even with GLES? where that's impossible at an API level?) but that
doesn't matter if we're not drawing points. This might have to do with internal
Gallium CSOs (e.g. u_blitter). Issue noticed in SuperTuxKart, which was getting
state change flushes. Performance on one level on a Valhall GPU improved from
26fps to 29fps.
Fixes: 3641dfe436 ("panfrost: Flip point coords in hardware")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17430>
b6a30b72ab ("panfrost: Implement provoking vertices on Valhall") added an
assertion that every draw selects a particular provoking vertex. The intent was
to ensure provoking vertex selection actually happened. Unfortunately, the
assertion is too strong, as the provoking vertex is irrelevant for some (most)
draws. For those, we don't *want* to commit to a particular provoking vertex for
those to avoid flushing.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17430>
driFetchDrawable is only ever called from the MakeCurrent path, which
means it has to handle the case of pre-GLX-1.3 Windows being named as
the drawable. When it finds the drawable in the hash, it increments its
refcount before returning it, so for a GLXWindow it would be 2 on first
return, one from glXCreateWindow and one from glXMakeCurrent. But when
it does not find the drawable and creates one for the naked Window, the
reference count on first return would only be 1. As a result, if this
context was then ever bound to a different drawable, the old Window's
DRI drawable state (like the back buffer) would be destroyed.
Fixes piglit's glx-multi-window-single-context and glx-make-current for
a variety of drivers.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6713
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17479>
Texture descriptors currently waste a massive ammount of memory, as every
one is allocated via a separate BO. As the allocation granularity of the
kernel is 4KB and the descriptor is only 256B, 93.75% of the allocated
memory is wasted.
Add a simple suballocator for the texture descriptors, to allocate multiple
ones out of a single kernel BO. This isn't perfect, as freed slots in the
suballocated resource are not reused, but worst-case we end up with the
same waste as we had before. This also potentially improves efficiency at
the kernel side, as this reduces the number of BOs needed for the sampler
views in each submit.
As the BO is now used by multiple descriptors, avoid syncing with the GPU
via the cpu_prep/fini calls, as to not introduce stalls between pending
rendering and new descriptors being filled. This is safe, as each
suballocation slot is only used once, so newly filled slots are certainly
not in use by the GPU.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17448>
The dummy texture descriptor and the dummy render target relocs are not ever
changed by a context operation, so we can save some space by moving them to
the screen and potentially share them and the BOs backing them between
multiple contexts.
Also don't hold two pointers to the same BO, one in the reloc and one raw,
but always just use the reloc one.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17448>
libc++ does not yet implement the c++17 monotonic_buffer_resource,
so emulate it by doing normal allocations that are cleaned up
when the resource is destroyed.
v2: - Use C include and version without namespace aligned_alloc
- add include for sstream needed with clang++ (maurossi)
Closes: #6836
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17452>
The compiler has improved significantly since we found this issue
and this is no longer required.
Notice that because we are increasing the number of samplers
supported beyond what we can loop unroll (currently capped at 16),
some piglit tests that test the maximum number of samplers supported
start to fail because they use indirect indexing on a sampler array
and we don't support that (previously the indirect indexing was
removed by loop unrolling). This is a bug in tests which the
GLSL linker detects, failing to compile the shaders.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17509>
SIGSEGV is used by Vulkan API trace layers to track user changes in
device memory mapped to user space. Now with drivers such as Zink, GLES
applications are translated into Vulkan API calls and therefore it is
possible to be tracked by Vulkan api trace layers.
Blocking SIGSEGV hinders one of the memory tracking mechanisms used by
such layers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17273>
With dynamic vertex bindings the vertex format lookups are a lot
more frequent (vs being baked in the pipeline). Add a simple lookup
cache using a dynamic array to keep track of the hw values, and
avoid repeated translation.
This also reduces the memset to just the bitfields since all
the others will be overwritten.
Seen in perf traces gputest gimark with zink on radv.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15846>
While SPIR-V's OpKill is block terminating, the converted discard
intrinsic is not block terminating. This can lead to issues where
instruction could be placed after discard.
This patch adds an extra pass that drops all instructions after discard
before we convert discards.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17474>
From looking at the CTS,
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_SIZE_KHR
refers to the serialization size and not to the
actual, current size.
Fixes the following CTS:
dEQP-VK.ray_tracing_pipeline.acceleration_structures.query_pool_results.cpu.buffer.size
dEQP-VK.ray_tracing_pipeline.acceleration_structures.query_pool_results.cpu.memory.size
dEQP-VK.ray_tracing_pipeline.acceleration_structures.query_pool_results.gpu.buffer.size
dEQP-VK.ray_tracing_pipeline.acceleration_structures.query_pool_results.gpu.memory.size
Fixes: 5d56c2c ("radv: Add accel struct queries for maintenance1")
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17444>
This intrinsic is only produced when the compiler is instructed
to handle layer id as a system value, which we don't use. Also,
we have been supporting layered rendering for a while and passing
all the relevant tests which would've failed if we were hitting
this lowering.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17483>
When using the texel buffer copy path to copy a buffer we need to
sample from the buffer and for that we need a texture shader state
record where we specify the base offset of the texture (the buffer).
If the copy operation has a start offset we can't add that offset
to the base address of the buffer because the texture state record
requires the base pointer to be 64-byte aligned, so it would only
work for offsets that are multiple of 64B. Instead, we pass the
offset (in elements) to the shader and we use that to shift the
indices into the buffer when selecting the source texel to copy.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17482>
This hasn't actually been exported for a while. I think I probably broke
this in
commit 63a6b719d9
Author: Adam Jackson <ajax@redhat.com>
Date: Tue Dec 5 11:10:09 2017 -0500
glx: GLX_MESA_multithread_makecurrent is direct-only
in which I made it no longer default to having client support, but
failed to instruct dri{2,3,sw} to enable it. In any case, it was never
widely used, there is no EGL equivalent, and we've had zero complaints
about it getting nerfed.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17473>
The GL extension, EXT_texture_compression_astc_decode_mode, enables
applications to specify the desired decoding precision when decoding
non-sRGB ASTC textures. The options for the channels are FP16 (the
default), UNORM8, and RGB9_E5.
The ASTC LDR decoder outputs to UNORM8 by doing the following
conversions: UNORM16 -> FP16 -> UNORM8. This doesn't seem to be defined
by any specification and is costly according to perf profiles. To
conform to the decode mode spec (and for better performance), we convert
UNORM16 to UNORM8 by simply storing/keeping the top 8 bits.
In a texture upload microbenchmark, this decreases the upload time for
textures in the linear color space by about 34%.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17195>
The ASTC extension specs state that a vector of UNORM8 values are
returned when decoding sRGB ASTC textures. For the alpha channel
however, they don't seem to specify how to get there from the UNORM16
produced after interpolation (or returned from a void-extent block).
The ASTC decoder in the VK-GL-CTS project treats the alpha channel like
the RGB channels and simply uses the top 8 bits of the UNORM16. For
better performance, we choose to do the same.
In a texture upload microbenchmark, this decreases the upload time for
textures in the sRGB color space by about 13%.
Ref: https://gitlab.khronos.org/egl/DataFormat/-/merge_requests/32
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17195>
Because clear colors are stored as 4 32bit component values, there is
an issue if you try to format instance :
- clearing in R16G16_UNORM
- draw in R32_UINT
Clear will use 2 components of the clear color in dword0 & dword1.
While draw will use only one component of dword0.
This change uses the mutable format information to track whether clear
colors can be non-zero for fast clears.
With :
- non mutable formats, we can fast clear with any color on Gfx > 8
- mutable formats with incompatible component sizes, we can only
fast clear with 0 color
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5930
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17329>
Over-estimating latency can cause us to delay the critical paths of
the shader unnecessarily, producing larger QPU programs that take more
time to execute as a result (and it also adds register pressure) so
striking a balance is important. The thread switching model in V3D
is quite effective at hiding latency and usuallly we just need to
hint it to delay TMU instructions a little bit to find the best
compromise for performance.
The new latency numbers have been chosen empirically by testing
V3DV with Sponza and a few UE4 samples.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17451>
Based on empirical testing with Sponza and a few UE4 samples this is
consistently slightly benefitial for performance.
The most likely reason why this helps is that thrsw is probably
already quite effective at hiding latency and we are already trying
to hide latency at NIR scheduling and also via TMU pipelining, so
piling up on this when scheduling QPU typically ends up providing no
benefit at all for latency and is instead possibly preventing us to
unblock critical paths in the shader that depend on the TMU result,
requiring us to execute more cycles to complete the program.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17451>
This workaround looks actually broken. We added it in the past
because otherwise the game would just report 3GiB of video memory
(ie. size of GTT on SD). Though, with this workaround enabled, the
game explodes in memory easily.
One theory is that because we fake integrated GPUs as discrete GPUS,
and because we report 6GiB of VRAM (ie. driver redistributes memory
for small carveout), the game thinks there is 6GiB of VRAM only and
then keep allocating stuff.
People reported that the memory explosion is gone without this
workaround applied and I confirmed this myself.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17421>
This pass tries to move register usage closer to SSA, and for large
shaders this means we can overflow the register index, which only has
RC_REGISTER_INDEX_BITS size. This creates invalid code and leads to
crash at a later stage. Limit the pool of available registers to
RC_REGISTER_MAX_INDEX, currently is was two times the number of
shader instructions.
This means we'll fail the compile right away if we wanted more than
RC_REGISTER_MAX_INDEX temps, but when we've got that many we're
already well past how many instructions we can support anyway.
CC: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6017
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17393>
The set of supported vector sizes in NIR has holes in it. For example, we
support vec5 and vec8, but not vec6 or vec7. However, this pass did not take
that into account, and would happily shrink a vec8 down to a vec7, causing NIR
validation to fail. Instead, the pass should round up to the next supported
vector size.
Fixes NIR validation fail in OpenCL's test_basic hiloeo subtest.
v2: Clamp -> round rename.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17194>
the EXT_external_object spec originally was underspecified with regards
to this function, leaving room for synchronization errors where:
* app calls SignalSemaphoreEXT to signal a semaphore
* mesa defers pipe_context::fence_server_signal with threaded context
* driver defers gpu submission
* SignalSemaphoreEXT has long since returned, app submits vk cmdbuf waiting on semaphore
* spec violation / device lost
to prevent this, the spec is being changed to:
1) require an implicit flush when calling SignalSemaphoreEXT
2) require that this implicit flush also forces GPU submission before SignalSemaphoreEXT returns
all affected drivers have been updated
fixes#6568
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17376>
The hardware doesn't support 3D textures. We had been lying about 3D
texture level support in the past so that we got GL 2.1, but now reporting
levels==0 doesn't disable GL 2.1 (since we don't check for GL2 extensions
any more). But, by not lying, we now fix the majority of the remaining
GLES2 deqp failures.
This regresses a few desktop GL piglits which get GL errors that they
notice instead of what would be silent rendering failures on 3D texturing
operations.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17350>
This will be used for vc4, where incorrectly exposing 3D textures accounts
for most of the GLES2 conformance failures it has. This leaves
EXT_texture3d exposed in the (already non-conformant) GL2.1 support it
exposes, which has always been a best-effort thing.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17350>
The dri2_allocate_buffer() can be called with arbitrary height, however
the struct pipe_resource .height0 member is uint16_t. Check height for
maximum size to avoid overflow. Note that .width0 is unsigned int, so
it does not have the same issue.
The uint16 limit comes from commit:
e6428092f5 ("gallium: decrease the size of pipe_resource - 64 -> 48 bytes")
The overflow can be triggered e.g. by requesting large BO:
```
gbm_bo_create(dev, 1, 640*480*4, GBM_FORMAT_R8, GBM_BO_USE_LINEAR);
```
Signed-off-by: Marek Vasut <marex@denx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16513>
I mistakenly applied .gl-rules to the non-freedreno perf jobs, which
caused them to be incorrectly run pre-merge when core GL files changed.
Pull the freedreno core GL performance job rules out, explain a bit more
what is going on, and use it from iris and virgl performance testing.
This also drops running freedreno performance when core vulkan files
change -- freedreno perf testing doesn't have any turnip usage, nor does
it watch for turnip file changes.
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17386>
If you accidentally re-included your test job core definition after your
driver-specific ruleset, you'd end up running the driver job on every
source code change. This had happened with a630_gles_asan: it included
.baremetal-test-arm64-asan (and thus .baremetal-test) after including
.a630-test, to override .baremetal-test-arm64's depednencies to use asan
artifacts instead.
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17386>
... and explain what they're doing, compared to the test rules in
test-source-dep.yml.
Unfortunately, we can't really pull them into test-source-dep.yml with
other source deps, because of various '&'-'*' references.
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17386>
If we didn't EmitPrimitive() then the shadow (old) outputs would not
get copied to the emit temps (to eventually be copied back to the real
outputs. This isn't so bad except that means the realy vertex_flags
output has an undefined value.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17341>
Before rebasing on top of Ken's split-SEND optimization (see !17018),
this commit just caused some scheduling changes in various tessellation
and geometry shaders. These changes were caused by the addition of real
latency information for the URB messages.
With the addition of the split-SEND optimization, the changes
are... staggering. All of the shaders helped for spills and fills are
vertex shaders from Batman Arkham Origins. What surprises me is that
these shaders account for such a high percentage of the spills and fills
in fossil-db. 85%?!?
v2: Use FIXED_GRF instead of BRW_GENERAL_REGISTER_FILE in an assertion.
Suggested by Ken.
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20013625 -> 19954020 (-0.30%)
instructions in affected programs: 4007157 -> 3947552 (-1.49%)
helped: 31161
HURT: 0
helped stats (abs) min: 1 max: 400 x̄: 1.91 x̃: 2
helped stats (rel) min: 0.08% max: 59.70% x̄: 2.20% x̃: 1.83%
95% mean confidence interval for instructions value: -1.97 -1.86
95% mean confidence interval for instructions %-change: -2.22% -2.18%
Instructions are helped.
total cycles in shared programs: 859337569 -> 858636788 (-0.08%)
cycles in affected programs: 74168298 -> 73467517 (-0.94%)
helped: 13812
HURT: 16846
helped stats (abs) min: 1 max: 291078 x̄: 82.83 x̃: 4
helped stats (rel) min: <.01% max: 37.09% x̄: 3.47% x̃: 2.02%
HURT stats (abs) min: 1 max: 1543 x̄: 26.31 x̃: 14
HURT stats (rel) min: <.01% max: 77.97% x̄: 4.11% x̃: 2.58%
95% mean confidence interval for cycles value: -55.10 9.39
95% mean confidence interval for cycles %-change: 0.62% 0.77%
Inconclusive result (value mean confidence interval includes 0).
Broadwell
total cycles in shared programs: 904844939 -> 904832320 (<.01%)
cycles in affected programs: 525360 -> 512741 (-2.40%)
helped: 215
HURT: 4
helped stats (abs) min: 4 max: 1018 x̄: 60.16 x̃: 39
helped stats (rel) min: 0.14% max: 15.85% x̄: 2.16% x̃: 2.04%
HURT stats (abs) min: 79 max: 79 x̄: 79.00 x̃: 79
HURT stats (rel) min: 1.31% max: 1.57% x̄: 1.43% x̃: 1.43%
95% mean confidence interval for cycles value: -75.02 -40.22
95% mean confidence interval for cycles %-change: -2.37% -1.81%
Cycles are helped.
No shader-db changes on any older Intel platforms.
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 142622800 -> 141461114 (-0.8%)
Instructions helped: 197186
Cycles in all programs: 9101223846 -> 9099440025 (-0.0%)
Cycles helped: 37963
Cycles hurt: 151233
Spills in all programs: 98829 -> 13695 (-86.1%)
Spills helped: 2159
Fills in all programs: 128142 -> 18400 (-85.6%)
Fills helped: 2159
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
The lowering is currently fake. It just changes the opcode from the
_LOGICAL version to the non-_LOGICAL version.
v2: Remove some rebase cruft. 's/gfx8_//;s/simd8_/' in
brw_instruction_name. Both suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
If these checks had been in place previously, some bugs
that... eh-hem... practically took down the Intel CI would have been
caught earlier. *blush*
v2: Update to account for split sends.
v3: Add some more Gfx version checks. Remove the redundant "src0 is a
GRF" check. Both suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
An argument could be made that all stage-specific opcodes for vec4
stages should be prefixed with VEC4_ like the stage-agnostic opcodes.
I'll leave those additional sed jobs for another day.
egrep -lr '(VS|GS|TCS)_OPCODE_URB_WRITE' src |\
while read f; do
sed --in-place 's/\(VS\|GS\|TCS\)_OPCODE_URB_WRITE/VEC4_\1_OPCODE_URB_WRITE/g' $f
done
egrep -lr 'T.S_OPCODE[_A-Z]*URB_OFFSETS' src |\
while read f; do
sed --in-place 's/\(T.S_OPCODE[_A-Z]*URB_OFFSETS\)/VEC4_\1/g' $f
done
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
We should be explicit that we are cancelling jobs once the script finds
some log messages that are linked with known issues. That means the
script preemptively retried the job without giving chances to recover.
Adds magenta color to cancelled jobs.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17389>
Implement a log-based retry hint for R8152 issue described in #6681,
which is based on detecting these two consecutive lines:
```
r8152 <USB> eth0: Tx status -71
nfs: server <IP> not responding, still trying
```
Where <IP> and <USB> could be any IP and USB addresses, respectfully.
This commit is a temporary fix since it requires a section-aware log
follower, implemented in !16323. When the cited MR is merged, one will
make a proper fix on top of that.
Closes: #6681
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17389>
if a texture barrier occurs while clears are pending, these clears should
show up if the fb attachments are read in shaders, so trigger a renderpass
to flush out the clears
cc: mesa-stable
fixes#6766
fixes (radv):
dEQP-GLES3.functional.draw_buffers_indexed.overwrite_common.common_advanced_blend_eq_buffer_advanced_blend_eq
dEQP-GLES3.functional.draw_buffers_indexed.overwrite_common.common_blend_eq_buffer_advanced_blend_eq
dEQP-GLES3.functional.draw_buffers_indexed.overwrite_common.common_separate_blend_eq_buffer_advanced_blend_eq
dEQP-GLES3.functional.draw_buffers_indexed.overwrite_indexed.common_advanced_blend_eq_buffer_advanced_blend_eq
dEQP-GLES3.functional.draw_buffers_indexed.overwrite_indexed.common_advanced_blend_eq_buffer_blend_eq
dEQP-GLES3.functional.draw_buffers_indexed.overwrite_indexed.common_advanced_blend_eq_buffer_separate_blend_eq
Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17363>
The Vulkan specification states:
> Query commands, for the same query and submitted to the same queue,
> execute in their entirety in submission order, relative to each other. In
> effect there is an implicit execution dependency from each such query
> command to all query commands previously submitted to the same queue.
Fixes dEQP-VK.query_pool.statistics_query.reset_after_copy.*
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17400>
Fixed:
- VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT was able to stop prim counter
when pipeline stats query is running.
- This may have happened when prim gen query was in secondary cmdbuf.
- VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT counting geometry in each tile.
- VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT counting geometry in each tile
when pipeline stats query is started inside prim gen query and inside
a renderpass.
The matter of VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT and pipeline stats
interaction is solved by tracking whether pipeline stats query is
running both on CPU (for non secondary cmdbuf case) and on GPU (for
secondary cmdbuf).
Note, prim gen query is not allowed with secondary command buffers, so
only pipeline stats query is tracked on gpu.
See https://gitlab.khronos.org/vulkan/vulkan/-/issues/3142
Counting geometry per each tile is solved by:
- Conditionally executing START/STOP_PRIMITIVE_CTRS to not run in tiling
pass. Solves the case when prim gen query is inside a renderpass.
- Stop prim counters before executing `draw_cs` and restarting them
afterwards. Solves prim gen query being outside a renderpass.
Fixes GL CTS tests with Zink + `TU_DEBUG=gmem`:
GTF-GL46.gtf30.GL3Tests.transform_feedback.transform_feedback_max_separate
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_basic
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_framebuffer
GTF-GL46.gtf40.GL3Tests.transform_feedback3.transform_feedback3_streams_overflow
GTF-GL46.gtf40.GL3Tests.transform_feedback3.transform_feedback3_streams_queried
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_states
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6602
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17163>
For a5xx+ renamed:
- RST_PIX_CNT -> START_FRAGMENT_CTRS
- RST_VTX_CNT -> STOP_FRAGMENT_CTRS
- TILE_FLUSH -> START_COMPUTE_CTRS
- STAT_EVENT -> STOP_COMPUTE_CTRS
I'm not sure about a5xx itself but I'll take a chance of it being
similar to a6xx in this regard.
Knowing this emit_begin_stat_query/emit_end_stat_query can now emit
only events that are needed for the pool's flags.
Also primitive generated query clearly doesn't need fragment and
compute counters.
Passes tests:
dEQP-VK.query_pool.statistics_query.*
dEQP-VK.transform_feedback.primitives_generated_query.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17163>
D3D uses an int which seems like it'd support value between 0 and 100,
but in reality it only accepts values of exactly 0, or 100. The space
is left in case future values were to be added, so that comparisons
would work (e.g. MEDIUM_HIGH < HIGH).
Treat higher than 0.5 to be HIGH, and anything less to be NORMAL.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17406>
Flushing the batch midframe (splitting a renderpass) is expensive on a tiler, as
it requires the GPU to flush the framebuffer contents to main memory and read
them back. Clearing the framebuffer should not trigger a flush. Apps expect
clears to be (almost) free, flushing for a clear is at the very least unexpected
behaviour.
The only reason we previously flushed is to ensure we could always use a "fast"
clear. But a slow clear is a heck of a lot faster than a flush ;-) Instead of
flushing, we should clear with a draw (via u_blitter) in case a fast clear isn't
possible.
This fixes pathological performance for applications that rely on partial clears
within a frame. This issue was identified with Inochi2D, which repeatedly clears
the stencil buffer midframe, in order to implement masking efficiently with the
stencil buffer. In total, the all-important workload of rendering Asahi Lina is
improved from 17fps to 29fps on a panfrost device.
Fixes: c138ca80d2 ("panfrost: Make sure a clear does not re-use a pre-existing batch")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17112>
The last few frames of the trace are expensive (in terms of GPU time) and are
close to hitting the timeout. With the next commit, they do hit the timeout due
to using a larger batch. Nevertheless the next commit should be an overall perf
improvement on average, so this remove to unblock CI.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17112>
Otherwise the list 'next' changing will cause the assertion in
list_for_each_entry to be hit.
This was not hit before because list_assert is defined for debug
builds but not debugoptimized.
Fixes: 5067a26f44 ("pan/bi: Use flow control lowering on Valhall")
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>
I wrote fiction about dreaming that these were supported but after
waking finding that they were not. Somehow I came to consider that
fiction as fact and I never thought to test if they did work.
While reverse engineering the polygon list format, I found that these
were supported after all.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>
This maps to CL_DEVICE_MAX_COMPUTE_UNITS, which is defined as:
The number of parallel compute cores on the OpenCL device.
Since all supported Malis are unified shader cores, the number of parallel
compute cores is simply the number of shader cores.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>
Whereas the compiler needs to know the warp size for lowering divergent
indirects, the driver needs to know it to report the subgroup size. Move the
Bifrost-specific helper to common and add the trivial implementation for
Midgard.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>
To query the core count, the hardware has a SHADERS_PRESENT register containing
a mask of shader cores connected. The core count equals the number of 1-bits,
regardless of placement. This value is useful for public consumption (like
in clinfo).
However, internally we are interested in the range of core IDs.
We usually query core count to determine how many cores to allocate various
per-core buffers for (performance counters, occlusion queries, and the stack).
In each case, the hardware writes at the index of its core ID, so we have to
allocate enough for entire range of core IDs. If the core mask is
discontiguous, this necessarily overallocates.
Rename the existing core_count to core_id_range, better reflecting its
definition and purpose, and repurpose core_count for the actual core count.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>
Float conversions with explicit rounding modes are required for OpenCL,
as well as for Vulkan with the VK_KHR_16bit_storage extension (mandatory
in Vulkan 1.1). Since the hardware conversion instructions allow
configuring the round mode, this is easy to support :-)
Fixes test_half.vstore_half_rtz.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17262>
This got messed up when scalarizing the IR. Fix the definition of the opcode to
return (instead of break, asserting out) and to respect the swizzle (instead of
failing validation). Noticed when bringing up OpenCL on Valhall.
Fixes: 5febeae58e ("pan/bi: Emit collect and split")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17222>
this is for drivers (like freedreno) which need the format in the sampler
state in order to accurately handle border colors
when set, drivers MAY receive a format in the sampler state if the frontend
supports it (e.g., nine does not), and the cso sampler cache will include
the format member of the struct
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17189>
EOT messages need to use g112-g127 for their sources. With the new
opt_split_sends pass, we may be constructing an EOT message from two
different registers, and be able to copy propagate the original values
into those SENDs.
This can cause problems if we copy propagate from a large register
(say an RGBA value which is 4 GRFs in SIMD8 or 8 GRFs in SIMD16), in a
situation where the SEND only read a subset of that (say the alpha value
out of an RGBA texturing result). g112-127 can only hold 16 registers
worth of data, and sometimes we can only use g112-126. So, we can't
propagate if the GRFs in question are larger than 15 GRFs.
Fixes a shader validation failure in Alan Wake. Thanks to Ian Romanick
for catching this!
shader-db on Icelake shows that only SIMD32 programs are affected, and
the results are pretty negligable:
total instructions in shared programs: 19615228 -> 19615269 (<.01%)
instructions in affected programs: 10702 -> 10743 (0.38%)
helped: 1 / HURT: 43 / largest change: +/- 2 instructions
total cycles in shared programs: 852001706 -> 852001566 (<.01%)
cycles in affected programs: 767098 -> 766958 (-0.02%)
helped: 68 / HURT: 64 / largest change: +/- 774 cycles
GAINED: 2 / LOST: 0
Fixes: 589b03d02f ("intel/fs: Opportunistically split SEND message payloads")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6803
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17390>
This is a rewite of the NIR backend. it adds some optimization
and a scheduler.
v2: - replace some magic numbers by constants
- make sure constructor is always used with new
- use default initialization in more places
(changes suggested by Filip Gawin)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17076>
D3D12 requires the offset to be 512B-aligned and row pitch to be
256B-aligned for copy commands. There will need to be a fallback
written eventually because Vulkan has no such requirements but these
will remain the optimal limits as they allow using the D3D12 copy
commands directly.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17388>
We can't unconditionally support separable shader objects on the host,
so submit the property only if the shader is actually separable, the
host knows about the property, and supports SSO.
Without support for SSOs, the host can still compile and link the shaders,
it needs to do more work on interface matching though.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17344>
Most drivers don't care about the property, and virgl should only handle
it if the host supports it.
This is a partial revert of b634030, i.e. we keep the definition of the
property, but we don't set it only based on the shader info.
Fixes: b634030542
tgsi: Add SEPARABLE_PROGRAM property
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17344>
We can produce slightly better code for these in the backend, so
do that. For this we need to:
1. Fix our implementation of uadd_carry (which wasn't used) to return
an integer instead of a boolean value.
2. Add an implementation of usub_borrow.
Notice these are only used in Vulkan. In GL these instructions are
always unconditionally lowered by the state tracker in GLSL IR so
we never get to see them in the backend.
Shader-db stats from a collection of Vulkan samples:
total instructions in shared programs: 122351 -> 122345 (<.01%)
instructions in affected programs: 196 -> 190 (-3.06%)
helped: 2
HURT: 0
total uniforms in shared programs: 18670 -> 18672 (0.01%)
uniforms in affected programs: 59 -> 61 (3.39%)
helped: 0
HURT: 2
total max-temps in shared programs: 13145 -> 13147 (0.02%)
max-temps in affected programs: 27 -> 29 (7.41%)
helped: 0
HURT: 2
total inst-and-stalls in shared programs: 123052 -> 123046 (<.01%)
inst-and-stalls in affected programs: 197 -> 191 (-3.05%)
helped: 2
HURT: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17372>
These opcodes where fixed to return an integer instead of a boolean
value some time ago but the documentation for them was not updated
and still talked about a boolean result.
Fixes: b0d4ee520 ('nir/opcodes: Fix up uadd_carry and usub_borrow')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17372>
In some jobs, such as
https://gitlab.freedesktop.org/gallo/mesa/-/jobs/24904100, the kmsg is
interleaved with stderr/stdout in serial console, making it difficult to
confidently find the log messages to detect when the DUT is booting,
when the DUT is running etc.
Luckily, LAVA sends redundant messages about their signals. We can use
them to mitigate the chance of missing an interleaved message by being
more open to different messages, using the regex on both `debug` and
`target` LAVA log levels.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
There are some leftovers in the jobs logs after the result log line.
Only print until the init-stage2.sh output, to raise the chance to check
for the test script results at the first glance in the Gitlab logs.
Extra changes:
- Add `hung` status for jobs considered hanging in the Gitlab
- print them after the retry loop
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
test_full_yaml_log is a test that will look for a LAVA log YAML file at
`/tmp/log.yaml` and consume it as it was a realtime CI job.
It is useful for debugging issues related with LAVA.
Let's keep it skipped by default, to avoid introducing entire logs into
the codebase.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
Currently, the submitter consider that every new log that comes from the
DUT console is a signal that the device is healthy, but maybe that is
not the case, since in some kernel hangs/failures, no output is
presented except from some kernel messages.
This commit bypass the heartbeat when the LogFollower detect a kernel
message. Any log line that does follow the kmsg pattern will make the
job labeled as healthy again.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
- Create LogFollower to capture LAVA log and process it adding some
- GitlabSection and color treatment to it
- Break logs further, make new gitlab sections between testcases
- Implement LogFollower as ContextManager to deal with incomplete LAVA
jobs.
- Use template method to simplify gitlab log sections management
- Fix sections timestamps
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
It is optional and is needed only when a layer has physical device
extensions that may be unknown to the loader.
This simplifies the layer a bit, but more importantly, it works around a
bug in the loader when there is another layer in the layer chain that
wraps VkInstance.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16307>
There's been several requests to push the release back a bit to allow
more time in the 22.2 cycle, so we'll bump by two weeks. This adds an
extra 22.1 release to the schedule as well, due to the move.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17384>
The currently available D3D12 API headers have incorrect C function
prototypes for these functions when compiling for non-Windows platforms.
Future changes here will move these helpers into the DirectX-Headers
project, but:
* The process of getting a fix into the headers is still ongoing
* I'd prefer to avoid taking an immediate dependency on just-published
headers again
So, for now add some helpers to work around this problem in Dozen
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17340>
WSL doesn't have DXGI, but it does have DXCore. DXCore also has a nice
property that it filters to only D3D12-capable adapters. We can rely
on DXCore as a first option even for Windows, because we'll be able
to let the Vulkan loader do preference sorting, instead of having to
rely on DXGI to do it for us.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17340>
Per the Vulkan spec, the device UUID should be identical between reboots.
It should also uniquely identify different instances of the same device,
e.g. 2 identical GPUs connected to different PCI ports, but D3D doesn't
currently expose a way to do both of these things. Prefer persistence
over uniqueness here.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17340>
As long as we left-shift the unsigned version, this has no undefined
behavior and is fewer instructions. The only tricky bit is that a right
shift of a negative number is technically implementation-defined (not
undefined) behavior in C. However, if it's ever anything other than an
arithmatic right-shift, there's lots of other places where Mesa will
break today.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17214>
Also move it to the end of the switch as is more conventional. For some
reason, later patches in the series make ANV fail to build because GCC
stops detecting the assert(!"str") as not returning.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17214>
Now that usage flags can be specified even when using the modifier path for
allocation and frontends like GBM and EGL wayland do this properly, we can
drop the assumption that all resources allocated through the modifier
enabled path need to be SCANOUT capable.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17364>
Use a typeless format when VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT is
set, so we can cast to compatible types at least. Still doesn't
work when formats are of the same size but from incompatible
types (like R32_FLOAT and RGBA8_UNORM), which Vulkan considers
as compatible while D3D12 doesn't, but it gets us closer to what
the Vulkan API wants.
D3D12_FEATURE_DATA_D3D12_OPTIONS12::RelaxedFormatCastingSupported
should address the remaining limitations.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17368>
D3D12 supports fomat casting through optional features. Let's
add a helper to query whether 2 formats are compatible or not.
The compatibility depends on the formats+usage pair
(CopyTextureRegion() is less strict than the texture sampling
logic).
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17368>
TGSI has no legitimate[1] notion of linked shaders, which means tgsi_to_nir
should conservatively assume everything all shaders are separable. This requires
setting nir->info.separate_shader to warn drivers that shader CSOs might be
mixed and matched. Otherwise, the driver might enable optimizations that
are invalid for separate shaders, causing issues when the shaders are
later treated as separable.
This will fix varying linking with u_blitter's shaders on Panfrost (Bifrost and
older), when util_blitter_clear is used with Panfrost.
[1] There was a TGSI property added recently to forward
nir->info.separate_shader up to virglrenderer, but it's not actually used for
anything in virglrenderer and I am still struggling to understand what the use
case would be. My gut says we should revert b634030542 ("tgsi: Add
SEPARABLE_PROGRAM property"), but I'm not interested in fighting that yak right
now. Notably, the u_blitter and hud shaders are separable but are not marked
with this property.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17282>
New tests, dEQP line rasterization test fix that lets Intel pass.
Clears out bogus xfails from 1.3.2.0 uprev on a630, which I suspect were
"we lost the device twice on a full run once, and those fails got pasted
in without checking if it happened a full run again" (since we haven't
seen them in other full run attempts).
Also clears out the a630 vk asan xfails (essentially all tests run) by
turning off leak detection which was just catching leaks in vkcts.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17304>
If a shader ends with a workgroup barrier, it must wait for slot #7 at the end
to finish the barrier. After inserting flow control, we get:
BARRIER
NOP.wait
NOP.end
Currently, the flow control pass assumes that .end implies all other control
flow, and will merge this down to
BARRIER.end
However, this is incorrect. Slot #7 is no longer waited on. In theory, this
cannot affect the correctness of the shader. In practice, the hardware checks
that all barriers are reached. Terminating without waiting on slot #7 first
raises an INSTR_BARRIER_FAULT. We need to weaken the flow control merging
slightly to avoid this incorrect merge, instead emitting:
BARRIER.wait
NOP.end
Of course, all of these cases are inefficient: terminal barriers shouldn't be
emitted in the first place. I wrote out an optimization for this. We can merge
it if we find a workload that it actually helps.
Fixes test_half.vstore_half.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17264>
Before, we needed CP post-sched to copy-propagate references to NIR
registers produced by out-of-ssa. Now that we're in SSA, this pass ends
up not doing anything useful, and actually gets in the way by occasionally
creating a cycle in the DAG.
The entire shader-db impact is:
instructions HURT: shaders/closed/steam/tropico-5/78.shader_test FRAG: 238 -> 242 (1.68%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17320>
Turns out this was likely a vkd3d-proton issue because it can no
longer be reproduced since it switched to dynamic rendering by default.
AMDGPU-PRO was also affected by the same issue at that time.
According to Hans-Kristian, some bugs related to that have also been
fixed at the same time.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17296>
For unknown reasons, COPY_DATA can hang on GFX10+ while it doesn't
hang on GFX9. Adding PFP_SYNC_ME before/after the COPY_DATA doesn't
fix the hang either.
Using a LOAD_CONTEXT_REG_INDEX packet shouldn't be needed unless the
driver supports preemption (shadow memory) which RADV doesn't support.
I don't have a real explanation but PFP_SYNC_ME+LOAD_CONTEXT_REG_INDEX
fixes a GPU hang with Space Engineers (game uses a bunch of consecutive
calls to vkCmdDrawIndirectByteCountEXT without anything in-between).
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5838
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17290>
Here we move the nir_get_single_slot_attribs_mask() call that sets the
inputs_read mask after NIR optimisations have finished and after
st_nir_assign_vs_in_locations() has been called.
Besides fixing a bug where the mappings would be missaligned if
further NIR optimisations resulted in less inputs being read, it
also allows us to drop an additional nir gather info call.
Fixes: 0909a57b63 ("radeonsi/nir: Set vs_inputs_dual_locations and let NIR do the remap")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6240
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17326>
While a resource might be shared across different contexts all synchronization
of commands is the resposibility of the user (OpenGL spec Chapter 5 "Shared
Objects and Multiple Contexts").
Currently etnaviv tries to be extremely helpful by flushing foreign contexts
when using a resource that is still pending there. This introduces a lot of
issues, as context flushes can now happen at basically any time and also
introduces a lot of overhead due to the needed tracking and locking. Get rid
of all this cross-context tracking and flushing.
The only real requirement here is that we need to track pending resources
without mutating the state of the etna_resource, as this might be shared
across multiple contexts and thus be used by multiple threads concurrently.
Introduce a hash table to track the current pending resources and their
states in the local context.
A side-effect of this change is that we don't need to keep all used
resources referenced until context flush time, as we don't need to mutate
them anymore to track the status. This allows to free some of them a bit
earlier. Note that this introduces a small possibility of a new resource
matching the key of a already destroyed resource still in the
pending_resources hashtable, but even if we get such a collision the worst
outcome is a not strictly necessary flush of the local context being
performed, which is acceptable if it doesn't happen very often.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14466>
This might be called from multiple threads at the same time. To avoid
taking a global lock just to guard against the fairly low chance of
multiple threads calling this on the same BO at the same time, we allow
for the threads to race. All threads will set up a mapping, but only
the first thread is able to set the map member of the etna_bo, all other
threads just roll back and use the mapping set up by the winning thread.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14466>
The mmap offset is the only information we currently get from
DRM_ETNAVIV_GEM_INFO and there is no point in storing this
offset after the mapping has been established. Reduce the
shared mutable state on the etna_bo by inlining fetching the
offset into etna_bo_map.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14466>
Currently the buffer index hash is only used if the BO is used in
multiple streams and the current index is cached on the BO. This
introduces some shared state on the BO, which necessitates the use
of a lock to keep this state consistent across threads, which
negates some of the benefits of caching the index.
Always use the hash to keep track of the submit BOs, to get rid
of the shared state and simplify the code.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14466>
The LLVM backend is not officially supported by the RADV developers,
but it has been useful early during bring-up, or later when users are
experiencing what looks like a compiler bug. It is thus beneficial to
keep it working.
However, maintaining the vkcts expectations for every platform requires
more work and machine time than what we would like to commit to. This
is why we agreed that we would only keep LLVM tested on the latest
family of Radeon GPUs.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16841>
Most CI tests are currently running on LLVM 11 (released over 2 years
ago), which predates some of the GPUs we have in CI and prevents
testing RADV's LLVM backend.
LLVM 13 is known to work for RADV, released almost 8 months ago, and
is already available in most distributions. Fedora 36 is even already
on LLVM 14.
So this commit updates x86 testing on llvm 13.
v2:
- store the llvm apt repo key locally (Michel Dänzer)
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17188>
This avoids using a VGPR and uses two SGPRs
instead since we only need to store 2 bits.
Quake II RTX:
Totals from 7 (0.46% of 1513) affected shaders:
CodeSize: 229364 -> 229148 (-0.09%); split: -0.12%, +0.02%
Instrs: 41937 -> 41879 (-0.14%)
Latency: 977374 -> 976723 (-0.07%)
InvThroughput: 651582 -> 651148 (-0.07%)
Copies: 5064 -> 5033 (-0.61%)
PreSGPRs: 430 -> 433 (+0.70%)
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17293>
It was previously done dzn_cmd_buffer_flush_transition_barriers(),
leaving the queue+flush case unhandled. Let's fix that by moving
this piece of code to dzn_cmd_buffer_exec_transition_barriers().
Fixes: 35356b1173 ("dzn: Cache and pack transition barriers")
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17314>
While we've taken advantage of split-sends in select situations, there
are many other cases (such as sampler messages, framebuffer writes, and
URB writes) that have never received that treatment, and continued to
use monolithic send payloads.
This commit introduces a new optimization pass which detects SEND
messages with a single payload, finds an adjacent LOAD_PAYLOAD that
produces that payload, splits it two, and updates the SEND to use both
of the new smaller payloads.
In places where we manually used split SENDS, we rely on underlying
knowledge of the message to determine a natural split point. For
example, header and data, or address and value.
In this pass, we instead infer a natural split point by looking at the
source registers. Often times, consecutive LOAD_PAYLOAD sources may
already be grouped together in a contiguous block, such as a texture
coordinate. Then, there is another bit of data, such as a LOD, that
may come from elsewhere. We look for the point where the source list
switches VGRFs, and split it there. (If there is a message header, we
choose to split there, as it will naturally come from elsewhere.)
This not only reduces the payload sizes, alleviating register pressure,
but it means that we may be able to eliminate some payload construction
altogether, if we have a contiguous block already and some extra data
being tacked on to one side or the other.
shader-db results for Icelake are:
total instructions in shared programs: 19602513 -> 19369255 (-1.19%)
instructions in affected programs: 6085404 -> 5852146 (-3.83%)
helped: 23650 / HURT: 15
helped stats (abs) min: 1 max: 1344 x̄: 9.87 x̃: 3
helped stats (rel) min: 0.03% max: 35.71% x̄: 3.78% x̃: 2.15%
HURT stats (abs) min: 1 max: 44 x̄: 7.20 x̃: 2
HURT stats (rel) min: 1.04% max: 20.00% x̄: 4.13% x̃: 2.00%
95% mean confidence interval for instructions value: -10.16 -9.55
95% mean confidence interval for instructions %-change: -3.84% -3.72%
Instructions are helped.
total cycles in shared programs: 848180368 -> 842208063 (-0.70%)
cycles in affected programs: 599931746 -> 593959441 (-1.00%)
helped: 22114 / HURT: 13053
helped stats (abs) min: 1 max: 482486 x̄: 580.94 x̃: 22
helped stats (rel) min: <.01% max: 78.92% x̄: 4.76% x̃: 0.75%
HURT stats (abs) min: 1 max: 94022 x̄: 526.67 x̃: 22
HURT stats (rel) min: <.01% max: 188.99% x̄: 4.52% x̃: 0.61%
95% mean confidence interval for cycles value: -222.87 -116.79
95% mean confidence interval for cycles %-change: -1.44% -1.20%
Cycles are helped.
total spills in shared programs: 8387 -> 6569 (-21.68%)
spills in affected programs: 5110 -> 3292 (-35.58%)
helped: 359 / HURT: 3
total fills in shared programs: 11833 -> 8218 (-30.55%)
fills in affected programs: 8635 -> 5020 (-41.86%)
helped: 358 / HURT: 3
LOST: 1 SIMD16 shader, 659 SIMD32 shaders
GAINED: 65 SIMD16 shaders, 959 SIMD32 shaders
Total CPU time (seconds): 1505.48 -> 1474.08 (-2.09%)
Examining these results: the few shaders where spills/fills increased
were already spilling significantly, and were only slightly hurt. The
applications affected were also helped in countless other shaders, and
other shaders stopped spilling altogether or had 50% reductions. Many
SIMD16 shaders were gained, and overall we gain more SIMD32, though many
close to the register pressure line go back and forth.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>
SEND messages with EOT need to use g112-g127 for their sources so that
the hardware is able to launch new threads while old ones are finishing
without worrying about register overlap when pushing payloads. For the
newer split-send messages, this applies to both source registers.
Our special case for this in the register allocator was only considering
the first source. This wasn't a problem because we hadn't ever tried to
use split-sends with EOT before. However, my new optimization pass is
going to introduce some shortly, so we'll need to handle them properly.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>
We had been using thread_local index -> opcode_desc tables to avoid
plumbing through a storage location throughout all the code. But now
we have done so with the new brw_isa_info structure. So we can just
store the tables there, and initialize it with the compiler.
This fixes crashes in gtk4-demo on iris, and should help with some
programs on zink as well. Something was going wrong with the
thread_local variables not being set up correctly. While we might be
able to work around that issue, there's really no advantage to storing
these lookup tables in TLS (beyond it being simpler to do originally).
So let's simply stop doing so.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6728
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6229
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
This patch creates a new header file, brw_isa_info.h, which will
contains all the functions related to opcode encoding on various
generations. Opcode numbers may have different meanings on different
hardware, so we remap them between an enum we can easily work with
and the hardware encoding.
We move the brw_inst setters and getters to brw_inst.h.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
src/mesa/main includes are for Mesa's OpenGL implementation, and the
compiler is used in Vulkan drivers and other tools. We really only
needed one #define, which is that we offer 32 samplers. It probably
makes more sense to have our own defined limit for that rather than
importing a project-wide value which theoretically could be adjusted,
so swap MAX_SAMPLERS for a new BRW_MAX_SAMPLERS and call it a day.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
Gallium drivers shouldn't be including src/mesa/main headers, but we're
picking up a rogue main/config.h via the compiler, so this code I ported
over from i965 kept compiling. Use the PIPE_* defines instead so that
we can stop including that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
Gallium drivers shouldn't be including src/mesa/main headers, but we're
picking up a rogue main/config.h via the compiler, so this code I ported
over from i965 kept compiling. Use the PIPE_* defines instead so that
we can stop including that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
VUID-VkViewport-minDepth-01234 specifies that depth must be in the range [0.0, 1.0],
so the viewport must always be clamped to this range
this affects texture clears using u_blitter, as this expects to be able
to use the GL range of [-1.0, 1.0], so pass the depth value as though it's
been de-converted back to a GL z coordinate to account for viewport transform
cc: mesa-stable
fixes#6757
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17319>
This drops the mesa/gallium lists from some build rules, since zink common
rules brings them in already. If we do more driver common rules, we might
end up with those core lists appearing in the yaml multiple times, but
that seems like a small price to pay for not being able to forget some.
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17287>
nir_ine_imm(b, nir_iand_imm(b, x, mask), 0) and
nir_i2b(b, nir_iand_imm(b, x, mask)) are common
patterns which become quite messy when they are
part of a larger expression. Clang-format does
not improve things either and we can end up with
some rather interesting looking code.
(RADV ray tracing pipeline and query lowering)
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17242>
Change NGG GS emit vertex code to emit combined shared stores,
also change the export vertex code to emit combined shared loads.
This results in more optimal code generation, ie. fewer LDS
instructions are generated.
GS vertices are stored using an odd stride to minimize the chance
of bank conflicts, which means that unfortunately
we still can't use an alignment higher than 4 here,
so the best we can get are some ds_read2_b32 instructions.
Fossil DB stats on Navi 21 (formerly Sienna Cichlid):
Totals from 135 (0.10% of 128653) affected shaders:
VGPRs: 6416 -> 6512 (+1.50%)
CodeSize: 529436 -> 503792 (-4.84%)
MaxWaves: 2952 -> 2924 (-0.95%)
Instrs: 93384 -> 90176 (-3.44%)
Latency: 290283 -> 293611 (+1.15%); split: -0.36%, +1.50%
InvThroughput: 81218 -> 82598 (+1.70%)
Copies: 6603 -> 6606 (+0.05%)
PreVGPRs: 5037 -> 5076 (+0.77%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11425>
In order to reduce moves when coalescing multiple registers into a
larger register, RA will try to coalesce MERGE instructions with their
definitions. For example, for something like this in GLSL:
uint a = ...;
uint b = ...;
uint64 x = packUint2x32(a, b);
The compiler will try to coalesce x with a and b, in the same way as
something like:
uint a = ...;
uint b = ...;
...
uint x = phi(a, b);
with the crucial difference that the definitions of a and b only clobber
part of the register, instead of the whole thing. This information is
carried through the compound flag and compMask bitmask. If compound is
set, then the value has been coalesced in such a way that not all the
defs clobber the entire register. The compMask bitmask describes which
subregister each def clobbers, although it does it in a slightly
convoluted way. It's an invariant that once compound is set on one def,
it must be set for all the defs in a given coalesced value.
In more detail, the constraints pass will first create extra moves:
uint a = ...;
uint b = ...;
uint a' = a;
uint b' = b;
uint64 x = packUint2x32(a', b');
and then RA will merge values involved in MERGE/SPLIT instructions,
merging x with a' and b' and making the combined value compound -- this
is relatively simple, and will always succeed since we just created a'
and b', so they never interfere with x, and x has no other definitions,
since we haven't started coalescing moves yet. Basically, we just replaced
the MERGE instruction with an equivalent sequence of partial writes to the
destination. The tricky part comes when we try to merge a' with a
and b' with b. We need to transfer the compound information from a' to a
and b' to b, which copyCompound() does, but we also need to transfer it
to any defs coalesced with a and b, which the code failed to do. Similarly,
if x is the argument to a phi instruction, then when we try to merge it
with other arguments to the same phi by coalescing moves, we'd have
problems guaranteeing that all the other merged defs stay up-to-date.
One tricky part of fixing this is that in order to properly propagate
the information from a' to a, we need to do it before the defs for a and
a' are merged in coalesceValues(), since we need to know which defs are
merged with a but not a' -- after coalesceValues() returns, all the defs
have been combined, so we don't know which is which. I took the approach
of calling copyCompound() inside coalesceValues(), instead of
afterwards.
v2: (mhenning) This now loops over mergedDefs in copyCompound, to update
it for changes made in bcf6a9ec
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Karol Herbst <kherbst@redhat.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17115>
If there is a z24 unorm depth buffer, but it's the hw is using
a z32, the border color needs to be clamped appropriately.
This creates a second sampler with the clamped border color,
and uses it if needed. The checks might need some tightening up.
Fixes: zink on radv
dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth
dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth_uint_stencil_sample_depth
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17305>
D3D12 wants Width, Height and Depth to be aligned on the block width,
height and depth size. But Vulkan allows the width, height or depth to
be unaligned at the image boundary if image.{width,height,depth} is
not aligned.
Let's explicitly align things in the copy paths.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17250>
D3D12_RESOURCE_STATE_DEPTH_READ only provides access for fixed-function
depth/stencil test. If we want the shaders to be able to read the
depth/stencil attachment, we need to combine
D3D12_RESOURCE_STATE_DEPTH_READ and
D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17250>
Sometimes there's no obvious mappings between a VkImageLayout and
a D3D12_RESOURCE_STATE, so let's just provide a helper that takes
before/after resource states instead of old/new layouts, and use it
to fix the resolve case.
Fixes: 35356b1173 ("dzn: Cache and pack transition barriers")
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17250>
Some Vulkan -> D3D12 API mismatches force us to do behind-the-scene
transitions to get resources in the right state when executing
non-native operations. In this case, caching the transition until the
resource is actually used might save us unneeded transitions.
The packing aspect is mostly useful to limit the ExecuteBarriers()
call overhead. Right now we do per-resource packing, and any hole
in the subresource range will trigger several ExecuteBarriers()
calls. This can be improved by collecting barriers in a separate
array, and flushing the collected transition barriers just before
executing the operation using the subresources pointed by those
barriers. While not impossible, it'd be more verbose than what we
have right now, so I'm not entirely convinced it's worth it.
Caching could be improved to avoid any unnecessary flush when we do
blit or copy operations and transition the resources back to their
original state, since the user might decide to transition the image to
a new layout just after that. But doing that would require keeping
track of all resources used by dispatch/draw operations, which in turn
implies keeping info about which of the descriptor set resources are
used by the graphics/compute pipelines. Not sure the it's worth the
extra complexity given D3D12 enhanced barriers are just around the
corner, and those map pretty nicely to the vulkan barrier+image-layout
model.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17274>
When the stencil aspect of a depth+stencil texture is sampled, it's
actually integer. Also fixup st_translate_color() that assumed it was
float. This fixes the border color on zink+turnip.
v2: Add "|| texBaseFormat == GL_STENCIL_INDEX" to catch the case where
S8 is emulated as D24S8.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17177>
On a650+ we use the new Z24UINT_S8UINT format to sample the stencil
aspect of D24S8, which returns stencil in the second component and also
uses the second integer component for the border color. However Vulkan
mandates that the first component is used for the stencil border color.
There's no workaround we know of, so we have to fall back to the old
behavior where there is a workaround. If we know the format, we can
fixup the border color ourselves though.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17177>
From the API's point of view, border color replacement looks like this:
--------------------
| API Border Color |
--------------------
|
----------- | ---------------- ----------
| API fmt |-------->| User Swizzle |----->| Shader |
----------- ---------------- ----------
From the HW point of view, it looks like this:
-------------------
| HW Border Color |
-------------------
|
---------- ----------- | --------------- ----------
| HW fmt |-----| HW swap |------>| Tex Swizzle |----->| Shader |
---------- ----------- --------------- ----------
When the HW fmt + HW swap isn't enough to represent an API format, we
need to prepend our own swizzle to the the user's swizzle, so the tex
swizzle is a "format swizzle" composed with the user swizzle. However,
we don't want this format swizzle to be applied to border colors, so
there's a workaround in freedreno which is meant to undo the format
swizzle so that the HW border color with the format swizzle applied
equals the API border color, and everything is ok. However, on a6xx at
least it was incorrectly undoing the entire tex swizzle. This broke
border color with a user swizzle, because it was now effectively not
getting applied for the border color. It also made it seem like the user
swizzle is required for the workaround, which would have implications
for VK_EXT_border_color_swizzle with turnip, but it's not.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17177>
LAVA job logs have an ongoing problem of message interleaving with kmsg.
So any kernel dumps and LAVA signals (which are being printed in kmsg)
will have a chance to clutter the pattern matching for `hwci: mesa:
(pass|fail)` line.
v2:
- Add an 1 second sleep before exiting the test script, to give enough
time to print the result message without conflicting with LAVA ENDTC
signal from kmsg
Closes: #6714
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17175>
Early depth test is broken when the linear render target mode is used
and depth is written from the PE stage. It seems RA and PE disagree
about the cache layout in that case, so the RA sees unwritten/invalid
depth cache lines leading to random depth test fails. Early test works
fine if depth is written from the RA stage.
To work around this issue, detect the combination of linear RT, early
test and late write and switch to late test in that case.
Fixes: 53445284a4 ("etnaviv: add linear PE support")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17215>
On-GPU LRZ direction tracking allows LRZ to support secondary cmdbufs,
reusing LRZ between renderpasses, and in future to support LRZ when
VK_KHR_dynamic_rendering is used.
With on-gpu tracking we have to be careful keeping LRZ state in sync
with underlying depth image, which means we should invalidate LRZ
when underlying image is changed or the view of image is different
from previous renderpass.
All of this resulted in LRZ logic being thinly spread through the code,
making it hard to understand. So most of it was moved to tu_lrz.c.
For more details on past and new LRZ features see comment at the
top of tu_lrz.c.
Note about blob:
- Blob is much more happy to do LRZ_FLUSH, it flushes at the start
of the renderpass, after binning, and at the end of the renderpass.
- Blob seem not to care about changes in depth image done via
vkCmdCopyImage.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6347
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16251>
since !17208 there are 2 paths for disk_cache_get_function_identifier
on mingw: DETECT_OS_WINDOWS or HAVE_DLADDR (if dlfcn shims is present)
../src/util/disk_cache_os.c:47:1: error: redefinition of 'disk_cache_get_function_identifier'
47 | disk_cache_get_function_identifier(void *ptr, struct mesa_sha1 *ctx)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../src/util/disk_cache_os.c:36:
../src/util/disk_cache.h:121:1: note: previous definition of 'disk_cache_get_function_identifier' with type '_Bool(void *, struct _SHA1_CTX *)'
121 | disk_cache_get_function_identifier(void *ptr, struct mesa_sha1 *ctx)
here we disable the dladdr path from meson for consistency with msvc
fixes: 2dcbe8727
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17261>
We are already caching DXIL shaders individually, but that forces us
to retrieve the NIR shader, do the linking and binding translation
steps, to finally query the cache for each DXIL shader. This pipeline
caching is about skipping those steps when we can.
Note that a graphics/compute pipeline cache entry doesn't contain the
DXIL shaders, but hashes to retrieve those shaders from the same cache.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17140>
Binding translation has an impact on the final DXIL shader, and this
binding translation depends on the pipeline layout. Let's extend the
adjust_var_bindings() pass to caculate a hash we can then use in the
DXIL shader hash caculation.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17140>
DXIL shaders depend on the vulkan -> d3d12 binding translation done in
adjust_var_bindings(). In order to maximize our chances to re-use those
shaders, we need to hash the binding translation information and take
this hash into account when computing the DXIL shader hash.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17140>
The limitations of current approach for Gallium trace diffing via
'tracediff.sh' are becoming all too apparent, as we are first dumping
both trace to text and performing plain line-based sdiff on them.
This obviously loses the context of calls and sometimes results in
confusing diffs when subsequent calls are similar enough. It also
prevents us from formatting the diff output in ways that would
benefit readability.
In attempt to rectify the situation, reimplement the diffing completely
in Python, using difflib and adding the necessary plumbing into the trace
model objects etc.
Signed-off-by: Matti Hamalainen <ccr@tnsp.org>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17135>
Previously tracediff.sh used postprocessing sed-script to remove unwanted
calls from the dump output. Instead of that, add option to parse.py to
ignore a list of calls at parsing phase. Currently this list is hardcoded
in parse.py.
Also clean up the trace model code and pointer tracking a bit to avoid
static state in Pointer class.
Signed-off-by: Matti Hamalainen <ccr@tnsp.org>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17135>
The postponed spill is predicated using the condition from the
last write, but this is only correct if the register was only
written once in the TMU sequence, or if it is always written with
the same predication.
While we could try to track whether this is the case or not, it
would make the postponed spill path even more complex than it
already is, so let's just avoid predicating these. We are already
discouraging TMU spilling of registers in the middle of TMU
sequences, so this should not be a very common case.
Cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17201>
If we are spilling a register that is used in the middle of a TMU
sequence, we postpone the spill until the TMU sequence finishes,
at which point we inject the spill and rewrite the original
instruction to write to the new temp.
However, this doesn't work if the register is written multiple
times during the TMU sequence. In that scenario, we need to ensure
that all writes are rewritten to use the new temp, not just the last
one.
Cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17201>
Utgard supports MSAA 4x, so wire it up.
RSW bits were already REd by Luc, the only remaining part was storing
non-resolved buffers, reloading them (including for depth/stencil) and
doing MSAA resolve.
To store non-resolved buffer we need to set mrt_pitch and mrt_bits
registers in WB, and to resolve non-resolved buffer we need to reload
it into individual samples and then write out with mrt_bits = 0, it's
now done by lima blitter.
We also need to do resolve on transfer_map() of multi-sampled buffers,
so utilize u_transfer_helper for that.
As a side fix, it turns out that our wb_reg definition wasn't correct,
'zero' isn't always zero, it's set if we need to swap channels, and
it goes before mrt_bits. mrt_bits actually enables multiple MRTs,
so this commit renames 'zero' to 'flags' and changes its position.
If mrt_bits == 0 and MSAA is enabled, GPU does resolve
in place, to expose this functionality we set PIPE_CAP_SURFACE_SAMPLE_COUNT.
Fixes dEQP-GLES2.functional.multisample.*
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13963>
We have a lot of spilling coverage in a618 pre-merge, don't do it all (~2
minutes) here. Also, force-gmem touch testing should probably test less than
the default run does!
This should help make up for having added the tu-zink run.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17125>
ralloc is not thread-safe. While a given context can only be accessed from a
single thread at once, multiple contexts can be created against the same screen
at once. The ralloc allocations against the shared screens will race. Depending
on the result of the race, the same block of memory can be returned as the two
new contexts in two different threads, causing a use-after-free when the context
is freed later.
We free the context explicitly when it's destroyed anyway. If screens are
getting destroyed without the contexts getting destroyed first, that's a state
tracker bug, not a Panfrost one.
This matches what Iris does.
Fixes crash in test_integer_ops.int_math on Panfrost.
Fixes: 0fcf73bc2d ("panfrost: Move to use ralloc for some allocations")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17234>
We want to release the drawables of the context we're coming from, but
we were releasing them from the context we're switching to. This is
probably not a big deal normally because both contexts are likely to be
on the same display, which is all that driReleaseDrawables is really
sensitive to. But if the contexts are on different Displays this would
go quite wrong.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17176>
It's not needed and causes issues for mesh code (it doesn't
mark the output as per-primitive, which confuses brw_compute_mue_map)
Fixes many tests matching:
dEQP-VK.fragment_shading_rate.dynamic_rendering.*.ms
Fixes: 1542ab70eb ("anv: handle primitive shading rate for mesh")
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16196>
Otherwise passes which expect offsets to be in bytes (like
brw_nir_lower_mem_access_bit_sizes, called from brw_postprocess_nir)
may produce incorrect results.
Fixes 64-bit load/stores in task/mesh shaders.
Fixes: c36ae42e4c ("intel/compiler: Use nir_var_mem_task_payload")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16196>
depthClampEnable is actually the case we support properly.
!depthClampEnable requires extra work to make sure the
depth clamping that's forced by D3D12 is inactive (setting the
viewport depth range to [0,1] and dealing with the actual range
at the shader level), and clamp the depth value read by the
fragment shader in that case. This will be addressed separately.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17231>
Figured this while debugging a GPU hang with a simple CTS test. This
is to make sure data written by the CP are coherent on the CPU.
This also explains spurious GPU hang reports generated for Hitman 3
that made no sense without it. Now it's clear that this game hangs
after a DRAW_INDEX_INDIRECT packet.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17183>
The driver can't assume sample positions at (0.5, 0.5) when user
sample locations are used.
This doesn't fix anything in practice because NGGC is only enabled by
default on GFX10.3 and that extension is currently disabled on GFX10+,
but I would like to expose it at some point.
This fixes dEQP-VK.pipeline.*.sample_locations_ext.verify_location.*
(when the extension is enabled locally).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17228>
Create nir passthrough shader with explicit input/output and vertex
output count so that it can be handled by compiler same as user tcs.
The drawback is we create more si_shader_selector with different
input/output and vertex output count which was handled by compiler
backend before.
As fixed function tcs can be handled like user tcs, we don't need
the dedicated fixed_func_tcs_shader state either.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
LOCAL_PREBUILT_MODULE_FILE is the only variable that allows
specifying the absolute path to a prebuilt.
That happens when OUT_DIR_COMMON_BASE is set.
Since it does not have multilib variants, define two separate
libraries for multilib
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16674>
We tracked this down with the HW teams back in 2020 and there's now a
documented workaround. Comments from the HW team say this applies all
the way through XeHP but we're not sure beyond that.
This is a bug that we hit but the Windows drivers didn't because Jason
decided to allocate our memory structures from the top end of the VMA
range explicitly to catch bugs like this, while Windows allocates from
zero and up, so they would need to allocate more than 2GB of dynamic
state before running into it.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4880
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17216>
Instead of making drivers dive into the render pass and framebuffer
themselves, provide a helper that constructs a VkRenderingInfo for a
render pass resume that they can use instead. This should reduce code
duplication between driver implementations of BeginRendering and
BeginCommandBuffer.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16953>
get_dword_size() is misleading, its name implies it's returning
a size in dwords, but it's actually returning a size in bytes.
This led to a wrong size passed to emit_cbv(). Instead of fixing
get_dword_size(), let's inline the code in emit_ubo_var().
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17230>
Additionally improve the style of the build script.
- Instead of using platform build tools, use the CMake wrappers
- Instead of build all targets, then manually stripping, use the
CMake helpers.
Contributed by Andres Gomez.
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17184>
Debian's Wine 5.0 has shown some problems when running Vulkan tests in
the past. Let's install the stable version provided by WineHQ (7.0 at
this time).
v2:
- Remove OBS repository for Wine since it is unused (previously, it
was providing libfaudio0) (Daniel).
v3:
- Add WineHQ's repository GPG key (Michel).
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org> [v2]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17184>
update_barriers has steadily grown more and more complex when the original
idea was for it to be a small function to handle deferred jit barriers and
simplify sync in patterns like bind_ubo -> draw -> buffer_subdata -> draw
instead, track the pending barrier info at bind time so that the stages and
access are already updated by the time draw/compute are reached
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17192>
this avoids the scenario where the full bo size isn't accounted for because
no variable for the block has been created
cc: mesa-stable
affects:
KHR-GL33.shaders.uniform_block.random.all_per_block_buffers.3
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17217>
only the viewMask parameter of VkPipelineRenderingCreateInfoKHR can
be accessed in the fragment stage, so for pipeline libraries it should
be assumed that zs attachments exist for the purpose of copying dynamic
state values, and then these dynamic states will naturally be pruned
during final pipeline construction if the attachments turn out to not
be present
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17219>
[Alyssa: Add a default CPU implementation of pipe->clear_buffer(). This hook is
mandatory for OpenCL support. Even though this implementation isn't optimal by
any means, having a conformant default available in core will lower the barrier
of entry to OpenCL support.]
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16044>
r600g is the only user of util_blitter_copy_buffer in tree, which implements
buffer copies with streamout. This path for r600g was added in 8ac9801669
("r600g: accelerate buffer copying"), a commit from 2012. At that point there
was no DMA path for buffer copies. Since then, a DMA path has been added,
conditional only on the kernel version -- not the hardware. It appears the
required kernel support has been mainline for at least 4 years now. Mesa 22.2
doesn't need to provide optimal performance on an old kernel -- for performance,
a DMA-capable kernel should be used, and for compatability, the CPU fallback
(used for unaligned buffers as it is) is still available. Remove the streamout
path "in the middle" that appears ~unused today.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17142>
This really, really helps on platforms where fabs() isn't free. A great
many shaders use a * frsq(fabs(fdot(a, a))) to normalize a vector.
Since the result of the fdot must be non-negative, the fabs can be
eliminated by an existing algebraic rule.
shader-db results:
r300 (run on R420 - X800XL)
total instructions in shared programs: 1369807 -> 1368550 (-0.09%)
instructions in affected programs: 59986 -> 58729 (-2.10%)
helped: 609
HURT: 0
total vinst in shared programs: 512899 -> 512861 (<.01%)
vinst in affected programs: 1522 -> 1484 (-2.50%)
helped: 36
HURT: 0
total sinst in shared programs: 260690 -> 260570 (-0.05%)
sinst in affected programs: 1419 -> 1299 (-8.46%)
helped: 120
HURT: 0
total consts in shared programs: 957295 -> 957230 (<.01%)
consts in affected programs: 849 -> 784 (-7.66%)
helped: 65
HURT: 0
LOST: 0
GAINED: 3
The 3 gained shaders are all vertex shaders from XCom: Enemy Unknown.
I'm guessing that game is never going to run on my X800XL. :)
i915
total instructions in shared programs: 791121 -> 780843 (-1.30%)
instructions in affected programs: 220170 -> 209892 (-4.67%)
helped: 2085
HURT: 0
total temps in shared programs: 47765 -> 47766 (<.01%)
temps in affected programs: 9 -> 10 (11.11%)
helped: 0
HURT: 1
total const in shared programs: 93048 -> 92983 (-0.07%)
const in affected programs: 784 -> 719 (-8.29%)
helped: 65
HURT: 0
LOST: 0
GAINED: 36
Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 16702250 -> 16697908 (-0.03%)
instructions in affected programs: 119277 -> 114935 (-3.64%)
helped: 1065
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 4.08 x̃: 4
helped stats (rel) min: 0.48% max: 10.17% x̄: 3.66% x̃: 3.94%
95% mean confidence interval for instructions value: -4.26 -3.89
95% mean confidence interval for instructions %-change: -3.76% -3.56%
Instructions are helped.
total cycles in shared programs: 880772068 -> 880734134 (<.01%)
cycles in affected programs: 2134456 -> 2096522 (-1.78%)
helped: 941
HURT: 324
helped stats (abs) min: 2 max: 2180 x̄: 123.06 x̃: 44
helped stats (rel) min: 0.04% max: 49.96% x̄: 7.08% x̃: 3.81%
HURT stats (abs) min: 2 max: 2098 x̄: 240.33 x̃: 35
HURT stats (rel) min: 0.04% max: 77.07% x̄: 12.34% x̃: 3.00%
95% mean confidence interval for cycles value: -47.93 -12.04
95% mean confidence interval for cycles %-change: -2.87% -1.34%
Cycles are helped.
No shader-db changes on any other Intel platform.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17181>
This change involves two enums:
* rogue_texstate.xml: All COMPRESSED_* members of FORMAT are moved
to FORMAT_COMPRESSED (without the prefix). A second field is added
to IMAGE_WORD0 (texformat_compressed) which overlaps with the
original (texformat), and
* rogue_pbestate.xml: REG_WORD0_LINESTRIDE was not a real enum; it's
removed entirely. It only has value when feature
pbe_stride_align_1pixel is present, so a FIXME comment was added to
this effect.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17204>
Fix defect reported by Coverity Scan.
Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking rop_reads_dst suggests that it may be
null, but it has already been dereferenced on all paths leading to the
check.
Fixes: 94be0dd0b8 ("tu: Implement extendedDynamicState2LogicOp")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17099>
SSBO access works very differently from UBO access. Straddling
loads/stores isn't an issue, loads/stores instead must be aligned to the
element size and can have up to 4 components.
We support 16-bit access with SSBOs on a650+, and sometimes the
vectorizer tries to create a misaligned 32-bit access when combining
32-bit and 16-bit accesses. The UBO-focused logic didn't reject this,
which is now fixed. This fixes a number of VK-CTS regressions on a650+.
Fixes: bf49d4a084 ("freedreno/ir3: Enable load/store vectorization for SSBO access, too.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17040>
`x86_build-base-wine.sh` are usd to install wine and xvfb
`x86_build-mingw-patch.sh` are used to pull packages from msys2 that can be directly used.
`x86_build-mingw-source-deps.sh` are used to building llvm, libclc, clang, spirv-tools and directx-headers from source
xvfb are used to enable wgl tests on debian
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16084>
error message:
```
../../src/gallium/winsys/d3d12/wgl/d3d12_wgl_framebuffer.cpp:231:42: error: no matching function for call to 'operator new(sizetype, d3d12_wgl_framebuffer*&)'
231 | new (fb) struct d3d12_wgl_framebuffer();
| ^
<built-in>: note: candidate: 'void* operator new(long long unsigned int)'
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16084>
Because we may compile mesa with both rtti=enabled and rtti=disabled because of LLVM
Fixes errors:
../../src/gallium/drivers/d3d12/d3d12_video_enc_h264.cpp:777:7: error: 'dynamic_cast' not permitted with '-fno-rtti'
777 | dynamic_cast<d3d12_video_bitstream_builder_h264 *>(pD3D12Enc->m_upBitstreamBuilder.get());
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16084>
We need to push loop nesting to handle this correctly -- at the end of
the innermost loop, the correct nesting is 1 (from the if), not 0.
Fixes assertion failure in
dEQP-GLES2.functional.shaders.struct.local.dynamic_loop_nested_struct_array_fragment,UnexpectedPass
dEQP-GLES2.functional.shaders.struct.local.dynamic_loop_nested_struct_array_vertex,UnexpectedPass
dEQP-GLES2.functional.shaders.struct.uniform.dynamic_loop_nested_struct_array_fragment,UnexpectedPass
dEQP-GLES2.functional.shaders.struct.uniform.dynamic_loop_nested_struct_array_vertex,UnexpectedPass
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17128>
We had it set up for arm64 asan already, do it for everyone else too. In
cleaning up the duplication, this fixes a pasteo in rpi3 which had the
"artifacts: false" on the wrong job, causing it to do a slow download of
the mesa build from gitlab.
Doing this required also moving the ".use-debian/arm_test" in as well, so
that its "needs:" didn't overwrite ours if it appeared after us in the
consumer's "extends:"
Should save about 20 seconds on rpi3 jobs.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17146>
This is not one of the valid stages in the top level "stages:"
declaration, you're supposed to get your stage from your
test-source-dep.yml include. Avoids issues when .baremetal-test gets
included after your test-source-dep.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17146>
For example, the proof for this pattern
(('bcsel', ('flt', 'a@32', 0), 'b@32', 'c@32'), ('fcsel_ge', a, c, b)),
would be
bcsel(a < 0, b, c)
bcsel(!(a < 0), c, b)
bcsel(a >= 0, c, b)
fcsel_ge(a, c, b)
However, !(a < 0) => (a >= 0) is well known to produce different
results if `a` is NaN.
Instead of that replacement, use this replacement:
bcsel(a < 0, b, c)
bcsel(-0 < -a, b, c)
bcsel(0 < -a, b, c)
fcsel_gt(-a, b, c)
This is NaN-safe and exact.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Fixes: 0f5b3c37c5 ("nir: Add opcodes for fused comp + csel and optimizations")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17048>
min/max pointsize clamping affects the value that must be used,
meaning that it may not be 1.0
in the case where clamping changes the value from 1.0, ensure the shader
export path is used if attenuation isn't enabled
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17145>
The RADEON_GEM_USERPTR_ANONONLY flag is hardcoded here which excludes
shared memory pages. DRM is actually capable of supporting shared file-
backed memory, but only if it's read-only. This mutability intent has to
be conveyed through the stack, so a flags argument is added to the winsys
regime to pass RADEON_FLAG_READ_ONLY.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16115>
This needs to be accurate so that when we split and then schedule a new
a0.x/a1.x/p0.x write we will eventually make progress. It wasn't taking
the kill_path into account which could create an infinite loop as we
keep scheduling writes whose uses are blocked because they are memory
instructions not on the kill_path.
Closes: #6413
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16635>
wezterm in fullscreen 4k was exceeding the xcb max request size
on the put image with llvmpipe. This fixes it to send sub-images,
the Xlib put image used in glx does this internally, but not
the xcb one, so just do it in sections here.
Cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17155>
Some LAVA jobs emit lots of messages "Listened to connection for
namespace 'common' for up to 1s" in a row at the end of the logs, making
difficult to see the result of the test script.
This commit removes those lines until a proper solution is deployed on
the LAVA side.
Closes: #6116
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17151>
This advertises ARB_gpu_shader5 on Valhall, which should be working now. On the
GLES3.1 side, this notably adds support for sample variables and dynamic offsets
for texture gathers, both of which should now be working.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
This is the instruction that the hardware actually supports. Do the rename, use
the more specific accurate model in the IR, and rework the Valhall texturing
code to emit MKVEC.v2i8 instead of MKVEC.v4i8.
Will fix:
dEQP-GLES31.functional.texture.gather.offset_dynamic.implementation_offset.*
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
Valhall does not have Bifrost's 4-source MKVEC.v4i8. Instead, it has a (somewhat
limtied) 3-source MKVEC.v2i8. The full MKVEC.v4i8 may be lowered to a pair of
MKVEC.v2i8 instructions.
For good code quality on both Bifrost and Valhall, we need to model both
instructions in their full generality.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
This avoids needless variation from Bifrost. While at it, fix the opcode
definition: there are no abs/neg/swizzle modifiers on the signed integer source,
and there's no clamp. However, there are round and infinity modes, like on
Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
We generate FADD_RSCALE.f32 in our sample variables implementations. Valhall
doesn't have a dedicated FADD_RSCALE.f32 implementation, it should be aliased to
FMA_RSCALE.f32. Handle that alias in isel lowering. This will fix:
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.*
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
When lowering vars to scratch, we need to be careful with alignment on Valhall,
where packed TLS access must not straddle a 16-byte boundary. Fixes regressions
when enabling indirect access to temps on Valhall.
Fixes: 6761dbf891 ("panfrost: Use packed TLS on Valhall")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
This pass is super easy to unit test, so we have no excuse not to test
thoroughly. va_mark_last only inserts annotations in a shader without any
annotations, so our test cases are simply annotated shaders. The CASE macro just
has to compare the case against the case with the annotations stripped and added
back with va_mark_last.
In retrospect, I should have used that technique for the flow control insertion
tests too.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
On Valhall, register reads may be marked as "last" [1]. Setting the last flag
promises the hardware that the value of the register is no longer required. This
may enable hardware optimizations. In particular, it may permit the hardware to
avoid register file writes if a write to the marked register is still in the
forwarding buffer. This may improve power efficiency.
In principle, this is trivial: run liveness analysis and mark killed sources,
like we would in an SSA-based register allocator. In practice, there are a few
wrinkles to avoid hazards around staging registers and 64-bit register pairs,
requiring some additional data flow analysis and fix ups. However, nothing here
is particularly "hard", and all the ideas are already in use for the Bifrost
scheduler and the Bifrost/Valhall scoreboard analyses.
[1] In Mesa's compiler, this is called discard for historical reasons.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
This helps "contain the crazy" and avoids special casing BLEND in compiler
passes. The Valhall instruction is roughly the same as its Bifrost counterpart,
as long as we fix up the source order (as we already do for bitwise operations)
everything works out.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
Post-RA liveness relies on the caller updating the live variable with the
results of bi_postra_liveness_ins. It is not automatic, as with regular
liveness. This means ignoring the result of bi_postra_liveness_ins is surely an
error. Mark it as MUST_CHECK to catch that error at compile time.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
For some unknown reason, waiting for general slots (at least for memory stores)
doesn't work properly on a BARRIER instruction. We need to wait for all general
slots right before issuing the BARRIER in addition to the general wait on the
BARRIER itself. I don't know if this is a hardware bug or some hideous
gate-saving quirk, but I observe the Mali-G78 DDK using the same workaround,
which implies this really is necessary.
Fixes rare flakes in:
dEQP-GLES31.functional.compute.shared_var.work_group_size.float_128_1_1
Note that the flakes from that test are extremely timing dependent. Without this
change, that test is racy but we almost always win the race. Reproducing the
issue reliably requires high system load (e.g. running the CTS in the
background) and simultaneously running that test a large number of times.
Minimal shader-db impact. In particular, no cycle count regressions.
total instructions in shared programs: 2699419 -> 2699458 (<.01%)
instructions in affected programs: 22014 -> 22053 (0.18%)
helped: 2
HURT: 25
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
HURT stats (abs) min: 1.0 max: 3.0 x̄: 1.64 x̃: 1
HURT stats (rel) min: 0.07% max: 2.82% x̄: 0.69% x̃: 0.49%
95% mean confidence interval for instructions value: 1.01 1.87
95% mean confidence interval for instructions %-change: 0.38% 0.88%
Instructions are HURT.
total cvt in shared programs: 14468.81 -> 14469.42 (<.01%)
cvt in affected programs: 221.33 -> 221.94 (0.28%)
helped: 2
HURT: 25
helped stats (abs) min: 0.015625 max: 0.015625 x̄: 0.02 x̃: 0
helped stats (rel) min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18%
HURT stats (abs) min: 0.015625 max: 0.046875 x̄: 0.03 x̃: 0
HURT stats (rel) min: 0.10% max: 4.44% x̄: 1.06% x̃: 0.79%
95% mean confidence interval for cvt value: 0.02 0.03
95% mean confidence interval for cvt %-change: 0.57% 1.36%
Cvt are HURT.
total quadwords in shared programs: 1462496 -> 1462528 (<.01%)
quadwords in affected programs: 4632 -> 4664 (0.69%)
helped: 0
HURT: 4
HURT stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
HURT stats (rel) min: 0.35% max: 7.69% x̄: 4.03% x̃: 4.03%
95% mean confidence interval for quadwords value: 8.00 8.00
95% mean confidence interval for quadwords %-change: -2.71% 10.76%
Inconclusive result (%-change mean confidence interval includes 0).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
This should help with "marge got stuck for an hour and all I got was this
failed job with no results/" when a system intermittently wedges.
This replaces the BM_POE_TIMEOUT ("did we get something on serial in the
last 3 minutes?") that rpi had, in favor of checking that the whole test
job gets through in 20 minutes.
Acked-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17096>
If the SerialBuffers can just feed the same line queue, then we don't need
the extra threads reading line queues into a new merged line queue.
Less python threading code is always better. Plus, now we can pass args
to SerialBuffer.lines() for timeout/phase.
Acked-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17096>
Every few weeks we'd get a flake where the script noticed the devcore
right as we were wrapping up, and then tar would exit because the file it
was tarring changed underneath it. Just kill devcores before we do that
-- even if we kill it while it's working, it's so rare that it probably
won't bother anyone.
Fixes: #5867
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16820>
In the following case :
vkCmdBindPipeline(compute_pipeline);
vkCmdDispatch(...);
vkCmdBindPipeline(graphics_pipeline);
vkCmdBindIndexBuffer(buffer)
vkCmdDraw(...);
We're emitting the 3DSTATE_INDEX_BUFFER instruction while the HW is
still in GPGPU mode, because we're dealing the pipeline selection to
vkCmdDraw().
Found while debugging Age Of Empire 4, HW is hung on
3DSTATE_INDEX_BUFFER instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17153>
Per Vulkan spec only "Draw" commands should be counted towards
occlusion query.
Apparently RB_SAMPLE_COUNT_CONTROL::UNK0 bool controls whether
sample counting is enabled, so we could use it to disable
sample counting for 3d blits which are sometimes used for
clear/copy/blit/gmem-store/resolve operations.
Fixes GL CTS tests running through Zink:
dEQP-GLES3.functional.occlusion_query.depth_clear
dEQP-GLES3.functional.occlusion_query.depth_clear_stencil_clear
dEQP-GLES3.functional.occlusion_query.scissor_depth_clear_stencil_clear
dEQP-GLES3.functional.occlusion_query.scissor_stencil_clear
dEQP-GLES3.functional.occlusion_query.stencil_clear
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6559
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17138>
r300 and r400 have strict rules with swizzles, so we
will need to convert swizzle back.
Operating on 0, 1, H in this case unnecessarily makes
rest of r300 overly complicated.
(also it's not currently able to handle this)
helps with:
deqp-gles2@functional@shaders@random@exponential@fragment@24
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17117>
Starting with Valhall, the provoking vertex state is specified per-framebuffer
(batch) instead of per-draw. We use the pan_tristate infrastructure to translate
between desktop OpenGL's per-draw semantics to Valhall's per-framebuffer
semantic. This is notably not required for GLES or Vulkan.
If the provoking vertex is unset when the tiler context is generated, it could
be set (incompatibly) later in the batch, and the tiler context's provoking
vertex field would no longer match the framebuffer's. That would violate a
hardware invariant. To ensure that doesn't happen, we make sure to set provoking
vertexes *before* generating the tiler context so it can't change after.
Fixes arb-provoking-vertex-render on Valhall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17068>
If we need to change batch state (currently just point coord origins), not only
do we need to flush the old batch, but also set the desired state on the new
batch. That second step was missing. Fix that so this mechanism works as
intended.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Fixes: 3641dfe436 ("panfrost: Flip point coords in hardware")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17068>
Even with the fixes in the rest of the series, arb-provoking-vertex-render is
still failing on Valhall for a single subcase (involving QUADS). It seems likely
that QUADS support is broken on Valhall, given it's not used in any of the APIs
for which Arm ships drivers.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17068>
This commit extends the graphics hard coding infrastructure to
allow the independent hard coding of stages, i.e. hard code fragment
stage and vertex stage separately instead of having to hard code
everything.
It also extends the infrastructure to allow per device hard coding.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17106>
To avoid dangling BO references if it's destroyed without being
previously unbound. This is under Vulkan spec clarification but it
looks like the "global" BO fix is simple enough to workaround the
issue for now.
This fixes a CPU hang with Halo Infinite because the kernel rejects
a submission (invalid BO handle found).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17085>
there's no point in updating blend state for this when it does nothing,
so skip updates for this functionality
in the future, I expect zink will export this conditionally based on
the underlying driver, and some sort of functionality will be implemented
to do something
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17043>
This reverts commit 4824396572.
The newest version of b2c has changed the mount point of the cache
device, which is has proven to be working very well in my local
testing, but ends up confusing podman when the cache device had been
initialized using an older version of b2c. The end result is that we
end up using tmpfs to run our jobs, and some machines just run out of
RAM before the end of the job...
We'll be fixing this issue, cut a new b2c release, then re-introduce
in Mesa.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17120>
We used to use WZYX and apply swizzles. Because swizzles apply for
border colors as well, the gallium driver un-swizzled the border colors
to cancel out swizzles. That did not work for turnip because turnip
advertises customBorderColorWithoutFormat and does not know when to
un-swizzle.
This change replaces WZYX by XYZW and removes the swizzles.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6516
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16647>
Whether a sync object is used cannot depend on where the batch is
submitted from, remove the in_sync and out_sync fields from
panfrost_batch_submit.
Always use an output syncobj, this is required for glFinish to work
correctly. This could be skipped for batches which another batch
depends on, but because of the existence of empty batches which emit
no job, doing so is not trivial.
Never use an input syncobj. There appears to be no point to this, the
kernel driver does implicit sync anyway.
Fixes "seconds per frame" rendering with Neverball; previously, every
batch was submitted with out_sync=0, so DRI's frame throttling could
do nothing. New jobs would keep getting submitted until more than a
thousand were queued in the kernel, which increased rendering latency
for the compositor far beyond acceptable levels.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16966>
PAN_BIND_SHARED_MASK is all binding flags that mean that a resource
might be shared and accessible by other contexts.
Don't replace the usage of this pattern in panfrost_should_afbc and
panfrost_should_tile in case a new binding is introduced that not all
layouts can support.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16966>
When building with Android.mk we are ending-up with:
gcc ..... -std=gnu99 .... -std=c11 .... target.o
^^^^^^^^^^ ^^^^^^^^
| |
_______________^_____ _____^___________
AOSP/KATI GENERATED MESON GENERATED
Some compilers uses first -std=gnu99 option and ignores second,
which results:
c99 implicit declaration of function static_assert()
This patch filters-out the first '-std=gnu99' from the cflags obtained
from AOSP/KATI dummy target output to avoid such kind of errors.
Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17078>
YV12 is a 3-plane format with minigbm. This change mitigates the crash
of testGLViewLargerHeightDecodeAccuracy[4], and successfully creates the
host side image and imports the memory.
vk_format(1000156002) drm_fourcc(842094169)
offsets(0, 27648, 34560, 0)
strides(256, 128, 128, 0)
modifier(0)
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16913>
If multisample is enabled and alpha testing happens, the
branch can jump out of the fragment shader before the other
samples are generated. Just don't take the branch optimisation
post alpha test if multisample is enabled.
This should fix some rendering bugs in kicad with multisample
enabled.
Cc: mesa-stable
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17049>
This was an optimization done a while ago that doesn't seem to be having
much of an impact anymore, and on the other hand, causes all sorts of
breakage with queries, as many of our HW counters don't get incremented
when rasterization is disabled.
This fixes a bunch of issues Zink has with ANV, but more importantly, it
fixes upcoming CTS tests:
dEQP-VK.transform_feedback.primitives_generated_query.*.empty_frag.*
dEQP-VK.transform_feedback.primitives_generated_query.*.no_attachment.*
dEQP-VK.transform_feedback.primitives_generated_query.*.color_write_disable_*
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17038>
A single compiled_shaders instance could be reused by several
pipelines, but strings from disasm info could be stolen only once.
So now we have to copy them.
Fixes crashes when using RenderDoc.
Fixes: 05329d7f9a
("tu: Implement pipeline caching with shared Vulkan cache")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17053>
Snapshots are processed asynchronously by INTEL_MEASURE, but snapshot
memory is allocated and associated with an iris batch. Provide a
callback that will free snapshot memory after a batch is fully
processed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16571>
The caller may have passed ownership of intel_measure_batch structures
to intel_measure until they are ready to be gathered. The caller
needs a notification when rendering is complete and snapshots have
been processed, so it can free the resources that measure the batch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16571>
Re-allocating the buffer object for snapshots carries a heavy penalty
at run-time. When resetting a command buffer, the buffer object that
is allocated for snapshots may be re-used directly on subsequent
renders.
Stale snapshot data will persist in the buffer object. To verify that
rendering is complete, zero the final timestamp value and check that
it has been written before gathering data.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16571>
this is a little tricky since it's possible to need a truly unreasonable
number of sparse binds in one go, so the current implementation will
daisy-chain through the sparse submits such that:
* sparse bind A creates signal semaphore X
* sparse bind B waits on X, creates signal semaphore Y
* sparse bind C waits on Y, creates signal semaphore Z
thus, any number of sparse binds will produce exactly one semaphore
that can be waited on by the following submit
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17066>
These are normally only set once because it's constant across the entire
renderpass, but they're trashed by the 3d store path because it needs to
store to CCU instead of GMEM. Therefore we need to save/restore them. Do
it in a way compatible with #5181.
Fixes: b157a5d ("tu: Implement non-aligned multisample GMEM STORE_OP_STORE")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17058>
Even though image views for attachments must use the identity swizzle,
there are cases where we have to add in our own swizzle, in particular
for D24S8 when the view is depth-only/stencil-only. Therefore we have to
reset it to the identity, similar to what we do with input attachments.
Fixes: b157a5d ("tu: Implement non-aligned multisample GMEM STORE_OP_STORE")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17058>
this maps directly to the vulkan api and allows removal of timeline
wrapping code
the consequence of this is a ~0.26% reduction in drawoverhead performance
on base cases (n=1000), but the simplification and deletions seem worth it
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17045>
Using the integrated adapter when none is specified uses less power by
default and doesn't break scenarios on Optimus systems (for example, on
Surface Books, detaching the screen gets prohibited because the GPU on
the performance base is in use)
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17005>
This is enough for zink to expose ARB_texture_buffer_object_rgb32 and
therefore GL 4.0. We could enable sampled images with a few more
workarounds, but the blob doesn't bother and there isn't any need at the
moment.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16980>
Currently, the LAVA job submitter is employing a temporary solution for
the bash escape code mangling in the LAVA jobs. Until the issue is not
fixed on the LAVA side, the submitter will replace the wrong characters
with the fixed ones.
This commit improves the regex pattern to comprehend the scenarios of
color codes with font formatting and background color information, such
as: `echo -e "\e[1;41;39mRed background with white bold text color\e[0m"`
Fixes: #5503
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17046>
Fixes following errors:
../src/microsoft/spirv_to_dxil/dxil_spirv_nir.c
In file included from ../src/compiler/nir/nir_builder.h:365,
from ../src/microsoft/compiler/dxil_nir.h:29,
from ../src/microsoft/spirv_to_dxil/dxil_spirv_nir.c:28:
../src/microsoft/spirv_to_dxil/dxil_spirv_nir.c: In function 'dxil_spirv_nir_passes':
src/compiler/nir/nir_builder_opcodes.h:1321:11: error: 'dyn_yz_flip_mask' may be used uninitialized in this function [-Werror=maybe-uninitialized]
1321 | return nir_build_alu2(build, nir_op_iand, src0, src1);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/microsoft/spirv_to_dxil/dxil_spirv_nir.c:290:59: note: 'dyn_yz_flip_mask' was declared here
290 | nir_ssa_def *y_flip_mask = NULL, *z_flip_mask = NULL, *dyn_yz_flip_mask;
| ^~~~~~~~~~~~~~~~
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16671>
.wideLines = false, which forbids the user to set the line width
to something different than 1. We're thus safe to claim support
for dynamic line width and do nothing in CmdSetLineWidth() other
than checking the value passed is 1.0f.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16971>
We were completely ignoring the primitive-restart case in the
index-rewrite logic used to emulate triangle fans. Unfortunately, this
case is way more complicated than a regular index rewrite:
- we need to skip all primitive-restart entries when turning the triangle
fan into a triangle list, which implies serializing the index buffer
rewrite procedure (at least I didn't find any clever way to parallelize
things)
- the number of triangles can no longer be extrapolated from the number
of indices in the original index buffer, thus forcing us to lower
direct indexed draws into indirect draws and patching the indexCount
value when the new index buffer is forged
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16971>
Iterating over a util_sparse_array is very expensive; replace this
with a standard dynarray.
Using the sparse 'nodearray' datastructure instead was tested, but
found to be slower in some cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16988>
Because this impacts most of the registers in the BLEND draw state, we
make the entire draw state dynamic so that it all gets re-emitted when
the logicOp changes. This also lays the groundwork for
VK_EXT_color_write_enable.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16936>
There were a few problems with this:
- It wasn't considering logic op at all, which is another source of
reading from the destination.
- It was conditioned on the blend_enable_mask, so it was missing the
case where there's no blending but some of the outputs were masked
out.
- It wasn't considering attachments with less than 4 components (for
example, normals in a typical deferred rendering setup) and would
always consider them partially written unless the user added extra
unnecessary components.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16936>
Compiling code should be done in docker containers using --isolation=process for efficiency,
but installing dependencies will fail when run that way. Split the "install" dependencies from
the "build" dependencies, so the former can be run in a container using --isolation=hyperv,
and the latter can use --isolation=process.
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16934>
as @Venemo discovered, zs layouts were being incorrectly set to readonly
in the case where the attachment was only used for an explicit clear,
so ensure that gets taken into account
cc: mesa-stable
fixes (radv):
dEQP-GLES31.functional.stencil_texturing.render.depth24_stencil8_clear
dEQP-GLES31.functional.stencil_texturing.render.depth32f_stencil8_clear
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17033>
each sampler is 1 driver location, so use the base variable
Fixes: 2d745904ca ("zink: add a gently mangled version of the d3d12 cubemap -> array compiler pass")
fixes:
dEQP-GL45-ES31.functional.shaders.opaque_type_indexing.sampler.const_expression.*.samplercubearray
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17008>
Due to how alignas is defined, it itsn't allowed to use it on a struct,
it needs to be used on the first member instead. So move the declaration
in those cases.
This still leaves the ALIGN16 macro using compiler-specific directives,
because it's a lot of work to untangle the above. This probably deserves
its own MR.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16908>
DXIL metadata strings and function names have a limited size. Truncate
the name when they don't fit. This is a quick&dirty workaround since it
doesn't address the problem for all kind of strings, and doesn't ensure
there's no collision in the function names after the truncation. That's
not an issue right now because I don't think we have implementations
keeping more than one function (the entrypoint), but it might be a
problem at some point.
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16961>
Right now we had two methods that tries to optimize the nir shader,
nir_optimize and st_nir_opts. The latter is being used when we are
linking, but again, it has basically the same purpose that
nir_optimize.
So this commit adds more lowerings to nir_optimize_nir, add some extra
comments on the method, and replaces st_nir_opts with nir_optimize.
Ideally we would like to just use the already existing
v3d_optimize_nir that we have at the backend But:
* Using it leads to some regressions on Vulkan CTS tests, due some
lowerings that are already there.
* We would need to move to the backend some additional
lowerings/optimizations that are used on the Vulkan
frontend. That would require to check that we are not getting any
regression or performance drop on OpenGL
So for now we are keeping a Vulkan specific nir_optimize method.
Additionally this fixes the following test:
dEQP-VK.graphicsfuzz.cov-loop-condition-clamp-vec-of-ones
Shaderdb stats, using some well known Vulkan apps (ue4 demos, Quake3e,
etc):
total instructions in shared programs: 124974 -> 125108 (0.11%)
instructions in affected programs: 50328 -> 50462 (0.27%)
helped: 4
HURT: 79
total uniforms in shared programs: 19019 -> 19020 (<.01%)
uniforms in affected programs: 60 -> 61 (1.67%)
helped: 0
HURT: 1
total max-temps in shared programs: 13438 -> 13444 (0.04%)
max-temps in affected programs: 85 -> 91 (7.06%)
helped: 0
HURT: 2
total inst-and-stalls in shared programs: 125715 -> 125849 (0.11%)
inst-and-stalls in affected programs: 50429 -> 50563 (0.27%)
helped: 4
HURT: 79
total nops in shared programs: 8203 -> 8204 (0.01%)
nops in affected programs: 732 -> 733 (0.14%)
helped: 7
HURT: 9
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16986>
That is what most others Vulkan drivers do (radv, anv, turnip at
least).
The origin of this change cames from a CTS test where the loop
unrolling converted a ubo index defined inside a loop from constant to
non constant. That is not desiderable on any driver, but a problem on
v3dv, as v3dv doesn't support that case.
Although we initially tried to fix it on the loop unroll, we discarded
that approach, and focused on the existing nir lowerings/optimizations
as this was not happening with other drivers.
We noted that in other drivers this case of a ubo index going from
const to non-const were also happening with nir_lower_explicit_io, but
in that case it was able to be converted back to a const on following
lowerings. The only difference with other drivers is that we were
calling it before the first nir optimization loop.
So this change helps with fixing the following CTS test (for that we
also need to run additional lowerings, which we do in a later patch):
dEQP-VK.graphicsfuzz.cov-loop-condition-clamp-vec-of-ones
You can get further details on the following issue and RFC merge
request, specially the merge request:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/6051https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15391
We also made some shaderdb stats with our usual Vulkan apps (ue4
demos, quake3, etc):
Total instructions in shared programs: 125014 -> 124974 (-0.03%)
instructions in affected programs: 7544 -> 7504 (-0.53%)
helped: 7
HURT: 4
total uniforms in shared programs: 19026 -> 19019 (-0.04%)
uniforms in affected programs: 514 -> 507 (-1.36%)
helped: 5
HURT: 0
total max-temps in shared programs: 13430 -> 13438 (0.06%)
max-temps in affected programs: 270 -> 278 (2.96%)
helped: 0
HURT: 8
total sfu-stalls in shared programs: 739 -> 741 (0.27%)
sfu-stalls in affected programs: 30 -> 32 (6.67%)
helped: 0
HURT: 2
total inst-and-stalls in shared programs: 125753 -> 125715 (-0.03%)
inst-and-stalls in affected programs: 7685 -> 7647 (-0.49%)
helped: 7
HURT: 4
total nops in shared programs: 8228 -> 8203 (-0.30%)
nops in affected programs: 546 -> 521 (-4.58%)
helped: 9
HURT: 2
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16986>
Since we only consume barriers at the beginning of a new job, if
a command buffer ends with a barrier we will not handle it. Fix
this by emitting a noop job in that case to consume it. Ideally,
we could do better and check the pending barrier state to fine
tune the noop job so we don't wait on all queues, but for now
this fixes flakyness with some CTS pipeline barrier tests that
started to show up after we optimized binning sync barriers. It
is likely that the additional sync we had before that change was
enough to prevent the problem from showing up.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17020>
When we switched to using structs to track barrier state we made a mistake
and started to overwrite barrier state in primary command buffers with
the pending state from secondary command buffers executed inside them, when we
should've been merging the state instead.
Fixes flakyness with some CTS barrier tests.
Fixes: f7ce42636c ('v3dv: use an explicit struct type to track barrier state')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17020>
This is not safe because it may skip regenerating the flags for the
loop condition in the loop continue block and these flags may be
stomped in the loop body by other conditionals.
Fixes: 9909fe6ba ('broadcom/compiler: Skip bool_to_cond where possible')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17020>
Until now, we have been extracing the install.tar image on the gitlab
runner before sharing it to the test machine through a MINIO bucket.
It turns out that hardlinks and symlinks do not get shared, so let's
just extract the tarball directly on the test machine to fix the issue.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16968>
We are already doing the 0*anything = 0 by default and we are also
using the DX versions of math ops like RCP. It looks like R300 and
R400 can't do IEEE math anyway (but its hard to tell without docs).
For R500 we can do IEEE math, but testing showed that some apps
are dependent on the DX behavior, so considering we only advertise
GLSL 1.20 where this is left ot the driver, just keep the curent
status and expose PIPE_CAP_LEGACY_MATH_RULES so that nine can stop
emiting math workarounds.
Also fixes two Xnine tests.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17007>
If we have a combined Z/S image, the image has depth, so we proceed down the
depth path, which does not set clear.s even though there's *also* a stencil
component. Unify the control flow to fix this.
Fixes (among others):
dEQP-VK.api.image_clearing.core.clear_depth_stencil_image.single_layer.d24_unorm_s8_uint_multiple_subresourcerange
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>
Rather than generating shaders to clear depth and stencil attachments, run the
rasterizer without a shader and configure the depth/stencil hardware to do the
clear. These settings are known to be efficient on Valhall, presumably the
depth/stencil pipeline on Bifrost is similar enough that it is also the
efficient way there. It's certainly much simpler.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>
On Bifrost and newer, blend descriptors are decoupled from render target. That
means we can always use a clear shader reading from blend_descriptor_0 and
specify the desired render target in the sole blend descriptor we pass.
Likewise on Bifrost and newer we don't need blend descriptors when we don't
blend, which is the case for the Z/S clears.
This reduces the number of shaders compiled on startup from 468 to 426.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>
usually inlining is optimal for cpu drivers since the majority of
time is spent in the shaders, and any amount of reduction to shader code
will be optimal
if, however, the shaders are still really big after inlining, this improvement
will be negated by the insane amount of time spent doing stupid llvm optimizer
passes, so check post-inline size to see whether it exceeds a size threshold
lavapipe release build - 1700% improvement
* spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec4-index-rd-after-barrier
before: 142.15s user 0.42s system 99% cpu 2:23.14 total
after: 8.60s user 0.07s system 99% cpu 8.677 total
fixes#6647
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16977>
The hardware writes one CRC per (effective) tile, the tile size of the CRC
buffer is the same as the configured effective tile size. However, all our CRC
infrastructure assumes 16x16 tiles. In case CRC is used with smaller tiles,
buffer overflows and incorrect rendering are all possible. Don't use CRC at
smaller tile sizes. Note disabling CRC correctly invalidates any bound CRC
buffers.
Fixes: 2e97d7c835 ("panfrost: Transaction elimination support")
Closes: #6332
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16983>
It has a single user -- in a section of code that only runs for MFBD GPUs and
that has already decided whether to use CRCs -- so inlining it simplifies its
definition greatly and may avoid redeciding the CRC setting.
[Note for mesa-stable maintainers: This is not a bug fix but is marked for
backport so the next patch applies cleanly.]
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16983>
info.fs.sidefx considers discard() to be a side effect. That definition is...
dubious at best. It certainly isn't the definition needed for forward pixel
kill. The only reason pixels couldn't be killed by FPK is if the shader has side
effects in the sense of writing to memory. Use that more precise condition so
FPK works more often.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Closes: #5607
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16984>
This fixes many tests in following groups on DG2:
dEQP-VK.memory_model.*
dEQP-VK.fragment_shader_interlock.*
v2: use memory scope and setup descriptor also
for barriers without defined scope (Curro),
use local scope and flush type none with
NIR_SCOPE_NONE scope, cleanups (Lionel)
v3: use LSC_FENCE_THREADGROUP for NIR_SCOPE_WORKGROUP,
remove default case (Curro), use eviction if scope
was not defined, use LSC_FENCE_GPU scope for vertex
stage
v4: use LSC_FENCE_TILE independent of stage for device
scope (Curro)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15743>
ac_llvm_add_target_dep_function_attr has no effect if the function is
inlined.
amdgpu-gds-size determines m0 for ds_sub_u32 gds, which hangs if it's 0.
This helps both gfx10 and gfx11, though it will only be used by gfx11
after we enable streamout.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
This skips the preamble for following IBs if the queue receives IBs from
the same context back-to-back. This eliminates VGT_FLUSH (for tess and
legacy GS) and PS_PARTIAL_FLUSH (for gfx11) in those cases if the preamble
contains them.
v2: only use this on gfx10+ due to stability issues on Stoney and limited
testing
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
If the first time the preamble is written, one of the rings
isn't allocated, we wouldn't write the RING_SIZE to the preamble.
Later, when the preamble gets updated after the ring allocation,
the new RING_SIZE packet would overwrite other packets.
To prevent this, always write the RING_SIZE (the alternative would
be to write NOP packets).
This fix "*ERROR* Illegal register access in command stream" hangs
I observed on GFX8.
Fixes: 32c7805ccc ("radeonsi: merge all preamble states into one")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16962>
Sample shading has similiar definitions in Vulkan and OpenGL, and they
both require unique associated data. While the definition for Vulkan
might change, we should stick to the current definition until the change
takes place and until apps (i.e., ANGLE) are updated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16700>
As far as I can't tell, there's no native operation doing this
equivalent of fquantize2f16. Let's lower this operation to
if (val < MIN_FLOAT16)
return -INFINITY;
else if (val > MAX_FLOAT16)
return -INFINITY;
else if (fabs(val) < SMALLER_NORMALIZED_FLOAT16)
return 0;
else
return val;
which matches the definition of OpQuantizeToF16:
"
If Value is an infinity, the result is the same infinity.
If Value is a NaN, the result is a NaN, but not necessarily the same NaN.
If Value is positive with a magnitude too large to represent as a 16-bit
floating-point value, the result is positive infinity. If Value is negative
with a magnitude too large to represent as a 16-bit floating-point value,
the result is negative infinity. If the magnitude of Value is too small to
represent as a normalized 16-bit floating-point value, the result may be
either +0 or -0.
"
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16959>
this was correct for 64bit loads and manually converted 32bit loads (e.g., bindless),
but it was broken for the case where 64bit was not supported, as the offset wasn't
being correctly adjusted
break out the offset division to hopefully make this a little clearer
Fixes: 150d6ee97e ("zink: move all 64-32bit shader load rewriting to nir pass")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16669>
all of ntv requires scalarized io since the offsets are now array indices
instead of byte offsets, so enforce scalarization here to avoid breaking
the universe
Fixes: 150d6ee97e ("zink: move all 64-32bit shader load rewriting to nir pass")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16669>
zink can't use lower_io, so this all has to be done manually and in
excruciating depth and detail
fixes (tu):
KHR-Single-GL46.arrays_of_arrays_gl.InteractionFunctionCalls2
KHR-GL46.gpu_shader_fp64.fp64.named_uniform_blocks
KHR-GL46.gpu_shader_fp64.fp64.varyings
KHR-GL46.vertex_attrib_binding.advanced-bindingUpdate
KHR-Single-GL46.enhanced_layouts.varying_array_components
KHR-Single-GL46.enhanced_layouts.varying_array_locations
KHR-Single-GL46.enhanced_layouts.varying_components
KHR-Single-GL46.enhanced_layouts.varying_locations
KHR-Single-GL46.enhanced_layouts.xfb_explicit_location
dEQP-GLES3.functional.transform_feedback.basic_types.interleaved.lines.highp_mat3x4
dEQP-GLES3.functional.transform_feedback.basic_types.interleaved.lines.mediump_vec3
dEQP-GLES3.functional.transform_feedback.basic_types.interleaved.points.mediump_vec4
dEQP-GLES3.functional.transform_feedback.basic_types.separate.points.highp_vec3
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16669>
I left this semi-unfinished back when I discovered that I could blast
out xfb values inline with variable declarations, but this is not viable
for all scenarios, so it has to work and it has to be able to pass cts
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16669>
fs_visitor::assign_curb_setup() maps UNIFORM registers to HW regs,
and contains the following assert:
assert(inst->src[i].stride == 0);
emit_a64_oword_block_header's striding tricks run afoul of this
restriction, by producing stride 1 values on a 64-bit UNIFORM source.
Work around this by copying the UNIFORM value to a VGRF first.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16938>
Instead of attempting to signal based on the memory object, use the new
DMA_BUF_IOCTL_EXPORT_SYNC_FILE to get a sync_file for the dma-buf and
use that to signal the semaphore or fence. Because this happens before
we transfer ownership back to the driver, the resulting sync_file should
only contain dma_fences from the compositor and/or display and shouldn't
be mixed up with the driver in any way. This gives us a real semaphore
and fence (as opposed to the dummy objects we've used int the past)
without over-synchronization.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
Instead of treating the blit submit specially in the buffer_blit_queue
case, treat the dummy submit as special. This lets us keep all the
handling of special-queue blits together. It also means that the
wsi_memory_signal_submit_info gets chained into the final submit which
is what we want if we're to rely on it for implicit sync. If we chain
it into the dummy submit, we'll implicit sync on all work previous to
the blit but not the blit. This won't work if X11 or a Wayland
compositor is depending on that to synchronize the linear copy.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
The WSI code is about to start querying for available semaphore handle
types via GetPhysicalDeviceExternalSemaphoreProperties in wsi_init().
For drivers that use vk_sync, supported_sync_types needs to be
initialized before GetPhysicalDeviceExternalSemaphoreProperties is
called. Really, wsi_init() should be the very last step of physical
device setup.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
Also, stop setting wsi_device::signal_semaphore/fence_with_memory
because those cause the WSI code to call the function we just dropped.
Since the core WSI code is now setting dummy syncs by default, we don't
need any of this anymore.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
With the old value, anv didn't think that the hardware supported 48-bit
addresses, and hit this assert:
assert(device->supports_48bit_addresses == !device->use_relocations);
The new value of 1ull << 48 is the one reported on my Icelake machine.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16933>
With the following structures :
struct StructA
{
uint64_t value0;
uint8_t value1;
};
struct TopStruct
{
struct StructA a;
uint8_t value3;
};
Currently offsetof(struct TopStruct, value3) = 9. While the same code
on the CPU gives offsetof(struct TopStruct, value3) = 16.
This is impacting OpenCL kernels we're trying to use to build
acceleration structures.
v2: Add comment/link to some description of the alignment/size
computation
Cc: mesa-stable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16940>
This was origally added for scons, which is gone, but keep it around since
enough people use "build" for their meson builds that you probably
shouldn't add anything to git under that name. The qualifying '/' is
needed because we have a .gitlab-ci/build/ directory where we do check in
code.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16827>
As the default option for msvc 2019 does support designated initializers
```
../src/util/tests/timespec_test.cpp(302): error C7555: use of designated initializers requires at least '/std:c++20'
../src/util/tests/timespec_test.cpp(303): error C7555: use of designated initializers requires at least '/std:c++20'
../src/util/tests/timespec_test.cpp(312): error C7555: use of designated initializers requires at least '/std:c++20'
../src/util/tests/timespec_test.cpp(313): error C7555: use of designated initializers requires at least '/std:c++20'
```
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15497>
Create c11/time.h instead of put timespec_get in `c11/threads.h`
Creating impl folder is used to avoid `#include <time.h>` point the c11/time.h file
Detecting if `struct timespec` present with meson
Define TIME_UTC in `c11/time.h` instead `c11/threads.h`
Define `struct timespec` in `c11/time.h` when not present.
Implement timespec_get in c11/impl/time.c instead threads.h
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15497>
the 'notemplates' debug mode is somewhat misleading since there's no
uncached+notemplates mechanism, meaning that if the descriptor cache
explodes it'll still use templates for updating in the fallback path
Fixes: 4e3768914d ("zink: add ZINK_DESCRIPTORS env var to explicitly set a mode")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16927>
If stencil is constant across the resource, then it can be treated as
if it was cleared.
Improves performance in applications which create a stencil buffer but
do not use it. Originally the same was done for depth to help some 2D
applications, but that gave mixed results so the patch was dropped.
v2: Don't do anything if a fragment job wouldn't be needed otherwise.
v3: Set stencil_value when a batch is cleared (Alyssa)
v4: Handle clears when the stencil is already known (Alyssa)
v5: Make sure shared resources are not used (Alyssa)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16646>
State tracker only declares one shared memory for GLSL and all
references to the shared memory are with index 0.
So fix the shared memory index in the declaration to use 0 index.
Fixes spec@arb_compute_shader@execution@shared*
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16930>
This changes the trace rays logic to always use
VkTraceRaysIndirectCommand2KHR and implements
vkCmdTraceRaysIndirect2KHR. I renamed the
load_sbt_amd to sbt_base_amd and moved the SBT
load lowering from ACO to NIR.
Note that we can not just upload one pointer to
all the trace parameters because that would
be incompatible with traceRaysIndirect.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16430>
Implements the new
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_SERIALIZATION_BOTTOM_LEVEL_POINTERS_KHR
and
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_SIZE_KHR
query types.
The documentation is a bit lacking for now but
the fist type probably refers to the instance
count and the second type refers to the
acceleration structure size which we already
store in radv_acceleration_structure. To support
size queries, this commit adds a size member
to the acceleration structure header.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16430>
vtbl->transfer_map may return staging buffer and not real one and it
exposes a problem in MSAA resolve path, since u_transfer_helper does
blit from a resource that is still mapped and it's not flushed yet.
Add explicit flush_region() for a temporary transfer before doing flush
for MSAA resolve.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16923>
We've discussed this at length and have agreed that Midgard + Vulkan is DOA, but
have let the code linger. Now it's getting in the way of forward progress for
PanVK... That means it's time to drop the code paths and commit t to not
supporting it.
Midgard is only *barely* Vulkan 1.0 capable, Arm's driver was mainly
experimental. Today, there are no known workloads today for hardware of that
class, given the relatively weak CPU and GPU, Linux, and arm64. Even with a
perfect Vulkan driver, FEX + DXVK on RK3399 won't be performant.
There is a risk here: in the future, 2D workloads (like desktop compositors)
might hard depend on Vulkan. It seems this is bound to happen but about a decade
out. I worry about contributing to hardware obsolescence due to missing Vulkan
drivers, however such a change would obsolete far more than Midgard v5...
There's plenty of GL2 hardware that's still alive and well, for one. It doesn't
look like Utgard will be going anywhere, even then.
For the record: I think depending on Vulkan for 2D workloads is a bad idea. It's
unfortunately on brand for some compositors.
Getting conformant Vulkan 1.0 on Midgard would be a massive amount of work on
top of conformant Bifrost/Valhall PanVK, and the performance would make it
useless for interesting 3D workloads -- especially by 2025 standards.
If there's a retrocomputing urge in the future to build a Midgard + Vulkan
driver, that could happen later. But it would be a lot more work than reverting
this commit. The compiler would need significant work to be appropriate for
anything newer than OpenGL ES 3.0, even dEQP-GLES31 tortures it pretty bad.
Support for non-32bit types is lacklustre. Piles of basic shader features in
Vulkan 1.0 are missing or broken in the Midgard compiler. Even if you got
everything working, basic extensions like subgroup ops are architecturally
impossible to implement.
On the core driver side, we would need support for indirect draws -- on Vulkan,
stalling and doing it on the CPU is a nonoption. In fact, the indirect draw code
is needed for plain indexed draws in Vulkan, meaning Zink + PanVK can be
expected to have terrible performance on anything older than Valhall. (As far as
workloads to justify building a Vulkan driver, Zink/ANGLE are the worst
examples. The existing GL driver works well and is not much work to maintain. If
it were, sticking it in Amber branch would still be less work than trying to
build a competent Vulkan driver for that hardware.)
Where does PanVK fit in? Android, for one. High end Valhall devices might run
FEX + DXVK acceptably. For whatever it's worth, Valhall is the first Mali
hardware that can support Vulkan properly, even Bifrost Vulkan is a slow mess
that you wouldn't want to use for anything if you had another option.
In theory Arm ships Vulkan drivers for this class of hardware. In practice,
Arm's drivers have long sucked on Linux, assuming you could get your hands on a
build. It didn't take much for Panfrost to win the Linux/Mali market.
The highest end Midgard getting wide use with Panfrost is the RK3399 with the
Mali-T860, as in the Pinebook Pro. Even by today's standards, RK3399 is showing
its limits. It seems unlikely that its users in 10 years from now will also be
using Vulkan-required 2030 desktop environment eye candy. Graphically, the
nicest experience on RK3399 is sway or weston, with GLES2 renderers.
Realistically, sway won't go Vulkan-only for a long-time.
Making ourselves crazy trying to support Midgard poorly in PanVK seems like
letting perfect (Vulkan support) be the enemy of good (Vulkan support). In that
light, future developers making core 2D software Vulkan-only (forcing software
rasterization instead of using the hardware OpenGL) are doing a lot more
e-wasting than us simply not providing Midgard Vulkan drivers because we don't
have the resources to do so, and keeping the broken code in-tree will just get
in the way of forward progress for shipping PanVK at all.
There are good reasons, after all, that turnip starts with a6xx.
(If proper Vulkan support only began with Valhall, will we support Bifrost
long term? Unclear. There are some good arguments on both sides here.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16915>
Vertex shaders are allowed to define input variables pointing to the
same location but a different, or even variables that overlap other
variables, as long as only one of them is used in a shader invocation.
One way to support that case would be to merge overlapping variables,
but we can also declare one input element per variable, and make those
point to the same input slot/offset. The only limitation with the
second approach is the maximum number of VS input registers, meaning
that only (32 - num_sysvals) input variables can be defined.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16221>
Now that dxil_spirv_nir.h exposes an helper to run the
DXIL-SPIRV specific passes, we can handle the varying linking
on our side and tell nir_to_dxil() we don't want automatic
varying index/register assignment, which should fix a bunch
of compiler errors.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16221>
Letting the compiler decide which slot should be used for varyings when
it doesn't know about the varyings written/read by the previous/next
stage doesn't work well. So let's the caller decide when it wants
automatic index/register assignment through a dedicated parameter,
instead of assuming Vulkan users always want that.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16221>
Linking should be done in reverse order, starting from the last
pipeline stage and going backward, so we can eliminate outputs from the
previous stage that are never used by the next stage, and possibly
kill some instructions and input variables too.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16221>
Right now, nir_to_dxil() assumes driver_location on inputs will be
contiguous, which is true for GL, and also true for Vulkan shaders
with the current implementation. But we are trying to delegate
the varying linking step to Dozen, and that means the driver will
assign the driver_location field.
For everything except vertex shaders this works fine, because we
are in control of the ID we assign to each variable, and can make
sure no holes exists in this assignment, but vertex inputs expect
the index value (which is directly extracted from the
driver_location field) to match the input index defined at pipeline
creation time. The compiler has a hack to treat Vulkan differently
and extract the index from the var->data.location field instead,
but that's a bit confusing.
Moreover, the input_mappings[] array is already indexed with
the var->data.driver_location field in the input load emission
path, so it makes sense to index it with the same field when
emitting signatures.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16221>
Extract NIR passes out of spirv_to_dxil() so we can re-use them
without separately and do the varying linking in Dozen. This way
we will also be able to use vk_shader_module_to_nir() which
takes care of the SPIRV -> NIR translation, plus a bunch of
common lowering passes.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16221>
Tell the state tracker our point coordinates have a lower left origin
instead of an upper left origin, and remove our point coordinate
flipping code. Saves an instruction in any shader that reads
gl_PointCoord.y
Note: the OpenGL blob also emits an "fadd $y', ^y.neg, 1.0" to flip
point coordinates, so this isn't just a Metal weirdness.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16829>
When lower_wpos_pntc is used, the state tracker inserts code to
transform gl_PointCoord.y according to a uniform, to account for
API-requested point coordinate origin and framebuffer orientation. With
the transformation, driver-supplied point coordinates are expected to
have an upper left origin.
If the hardware point coordinate supports (only) a lower left origin,
the backend has to use lower_wpos_pntc and then lower *again* to flip
back. This ends up transforming twice, which is wasteful:
a = load point coord Y with lower left origin
a' = 1.0 - a
a'' = uniform_transform(a')
However, lower_wpos_pntc is quite capable of transforming for a lower
left origin too, it just needs to flip the transformation. Add a CAP
specifying the point coordinate origin convention, rather than assuming
upper-left. This simplifies the Asahi code greatly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16829>
The performance counter layout depends on the number of L2 blocks and the number
of shader cores. It doesn't make a ton of sense to hardcode these into the XML
files. Instead, let's make the coutner offsets in the XML files relative to the
categories (blocks), so we can calculate the offsets of the categories
themselves at runtime based on the computed layout. This fixes performance
counters on Mali-G57 as implemented on MT8192.
There is little code change here, mainly churn from changing the XML definition.
Postprocessing for the XML to make it suitable for Mesa uses Antonio Caggiano's
https://gitlab.freedesktop.org/panfrost/hwc-helper tool.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16803>
Otherwise, load_push_constant won't work properly. This could probably be made
to work if we tried hard enough, but we still don't want reordering for internal
(meta) shaders which are layed out deliberately.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>
Bifrost supports "fast access uniforms" loaded from a single contiguous buffer.
This maps directly to Vulkan push constants, with some caveats:
* No indirect access. Indirects need to be lowered to a UBO pull.
* Strict alignment requirements. These will be met in practice.
Implement the NIR intrinsic and map it to the native hardware construct.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>
To be able to sum drawcall cost and render pass cost, the units of costs
are changed to bytes. With that, tu_autotune_use_bypass can make
decisions by comparing the costs of sysmem rendering and gmem rendering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16733>
This is generally PEP-484 stuff, but there is one functional change.
The base class Node needed to have an add() method to allow typed
dynamic dispatch. This could have been decorated @abstractmethod, but
that would require an error-raising implementation on all leaf-type
nodes. Instead, I added a base implementation that just errors out with
information from the subclass instance.
As a simple optimization, subclass implementations of add() (instead of
raising the same (or similar) error) now call super().add() in the
case of invalid child nodes.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16884>
The size must be the size of the total object, not the size
of the resource.
For instance, when using a single object for a multi-plane
format, the size of each plane should be equal to the size
of the underlying object to match libva's documentation:
/** Total size of this object (may include regions which are
* not part of the surface). */
uint32_t size;
Fixes: 13b79266e4 ("frontend/va: Setting the size of VADRMPRIMESurfaceDescriptor")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16813>
Fixes -amdgpu-atomic-optimizations=true option.
From CommandLine.h:
/// Reset all command line options to a state that looks as if they have
/// never appeared on the command line. This is useful for being able to parse
/// a command line multiple times (especially useful for writing tests).
void ResetAllOptionOccurrences();
/// Reset the command line parser back to its initial state. This
/// removes
/// all options, categories, and subcommands and returns the parser to a state
/// where no options are supported.
void ResetCommandLineParser();
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Fixes: 7e2874dc93 ("ac: reset LLVM command line parser")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16850>
NIR has fdiv, and all the NIR backends have to have lower_fdiv set
appropriately already since various passes (format conversions,
tgsi_to_nir, nir_fast_normalize(), etc.) might generate one.
This causes softpipe and llvmpipe to now do actual divides, since
lower_fdiv is not set there. Note that llvmpipe's rcp implementation is a
divide of 1.0 by x, so now we're going to be just doing div(x, y) instead
of mul(x, div(1.0, y)).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16823>
The drivers not setting it were:
- nv30, which gets lowering using NIR's lower_fsat flag.
- r300, which gets lowering using NIR's lower_fsat flag.
- a2xx, which has was getting it optimized back to fsat anyway.
This drops the check for the cap from gallium nine. While nine does have
a non-nir path, I think it's safe to assume that if you have SM3
texturing, you can do fsat.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16823>
As per the comment this was meant to tidy things up after varying
linking but varying linking has been moved into a nir based linker
so this extra call is no longer needed.
This optimisation pass is still called in the regular glsl ir
optimisation loop.
No shader-db change on Iris (BDW).
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16880>
We have to use a 3D draw to make it possible (so it goes through the
binner's visibility calcs), but hopefully the increased overhead for apps
with non-skippable rendering balances against skipping in others.
The real motivation is to get draw-time state out of tile load setup.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16826>
When !fb->binning but fb->binning_possible, we can just set the VSC
per-tile visibility reg to all visible in the "whoops, we'd rather not bin
but we had to anyway for XFB" case. This gets that EndRenderPass state out
of tile_load_cs/store_cs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16826>
Everything required for conformant OpenGL ES 3.1 support on Valhall (v9) is now
upstream -- all that's left is to enable implementations! Add the GPU ID for the
Mali-G57 implemented in the MediaTek MT8192 system-on-chip.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16890>
Use an internal geometry shader to handle input primitives. Do full
accurate culling and clipping in the shader and output hit result and
min/max depth to a SSBO for final being written to select buffer.
With multiple result slots in SSBO we can left multiple draws on the
fly and wait them done when buffer is full or exit GL_SELECT mode.
This provides quicker selection response compared to software based
solution. Tested on Discovery Studio 2020: some complex model needs
1~2s selection response time originally, now it's almost selected
immidiately.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15765>
When glthread not enabled, CurrentClientDispatch and CurrentServerDispatch
should be same. This does not cause problems before because OutsideBeginEnd
and BeginEnd have same BeginEnd entries, so when
CurrentServerDispatch==OutsideBeginEnd
CurrentClientDispatch==BeginEnd
will call into same BeginEnd _mesa_* functions.
But we'll add another dispatch table to replace BeginEnd when HW GL_SELECT
mode, so this needs to be fixed. Otherwise some function like _mesa_Rectf
which always call with CurrentServerDispatch will go into wrong entries.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15765>
HW code path will not flush vertex whenever name stack change.
It will save the current name stack and write to select buffer
only when no space left or exit select mode.
This let us submit multi draws from different name stack at
once instead of submit draws for a single name stack then
wait it finish before submit next one.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15765>
Like other optimizations, breaking this pass may not affect functional
correctness. It's also dead simple to unit test the pass, so we have no excuse
not to. Add unit tests for the functionality we currently support, since we just
extended it and want to make sure everything still works.
This includes tests for use of modifiers to get more small constants. There are
lots of subtle gotchas there, so let's add lots of unit tests to make sure we
got it right.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16862>
Some blitter operations, like clear, doesn't require to save all the
states.
This is particular important because, besides saving time, the blitter
operation restores the state required for the operation, and if we saved
more states than those, these ones won't be restored and will be leak.
So this also fixes some leaks when running CTS tests.
CC: mesa-stable
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16837>
CSEL executes on the conversion unit (CVT), while MUX executes on the special
function unit (SFU). Throughput on CVT is 4x higher than SFU, so this is
(almost) always an optimization.
The "real" MUX is still used for unusual cases, like 8-bit and bitselect.
Note that it's easier for us to use MUX everywhere for the IR. This is an easy
fixup to get better codegen on Valhall without touching the core Bifrost code.
shader-db is a bit of a toss up: register pressure and instruction count are
hurt in some cases due to restrictions on FAU access. In particular, a shader
that muxes between two uniforms needs an extra move due to extra constant
(zero). However, in terms of throughput this is still a win: 2 CVT instructions
(MOV + CSEL) have 2x throughput to 1 SFU instruction (MUX). The MOV has
opportunities for CSE, but that can hurt pressure in turn. Overall, cycles are
helped substantially.
total instructions in shared programs: 2728438 -> 2731597 (0.12%)
instructions in affected programs: 414391 -> 417550 (0.76%)
helped: 87
HURT: 1063
helped stats (abs) min: 1.0 max: 6.0 x̄: 5.17 x̃: 6
helped stats (rel) min: 0.19% max: 15.79% x̄: 4.12% x̃: 4.11%
HURT stats (abs) min: 1.0 max: 56.0 x̄: 3.40 x̃: 2
HURT stats (rel) min: 0.11% max: 23.43% x̄: 1.15% x̃: 0.63%
95% mean confidence interval for instructions value: 2.47 3.03
95% mean confidence interval for instructions %-change: 0.61% 0.90%
Instructions are HURT.
total cycles in shared programs: 142103 -> 142015.75 (-0.06%)
cycles in affected programs: 1263.45 -> 1176.20 (-6.91%)
helped: 281
HURT: 176
helped stats (abs) min: 0.015625 max: 2.234375 x̄: 0.50 x̃: 0
helped stats (rel) min: 0.71% max: 54.17% x̄: 16.93% x̃: 15.31%
HURT stats (abs) min: 0.015625 max: 30.0 x̄: 0.30 x̃: 0
HURT stats (rel) min: 0.84% max: 120.00% x̄: 7.16% x̃: 5.00%
95% mean confidence interval for cycles value: -0.33 -0.05
95% mean confidence interval for cycles %-change: -9.08% -6.22%
Cycles are helped.
total cvt in shared programs: 13983.34 -> 14891.70 (6.50%)
cvt in affected programs: 7498.36 -> 8406.72 (12.11%)
helped: 71
HURT: 4711
helped stats (abs) min: 0.0625 max: 0.0625 x̄: 0.06 x̃: 0
helped stats (rel) min: 5.41% max: 40.00% x̄: 10.23% x̃: 9.30%
HURT stats (abs) min: 0.015625 max: 2.640625 x̄: 0.19 x̃: 0
HURT stats (rel) min: 0.18% max: 141.18% x̄: 16.21% x̃: 9.52%
95% mean confidence interval for cvt value: 0.18 0.20
95% mean confidence interval for cvt %-change: 15.21% 16.42%
Cvt are HURT.
total sfu in shared programs: 11320.44 -> 7882.56 (-30.37%)
sfu in affected programs: 7618.50 -> 4180.62 (-45.13%)
helped: 4782
HURT: 0
helped stats (abs) min: 0.0625 max: 10.5625 x̄: 0.72 x̃: 0
helped stats (rel) min: 1.34% max: 100.00% x̄: 41.91% x̃: 37.50%
95% mean confidence interval for sfu value: -0.75 -0.68
95% mean confidence interval for sfu %-change: -42.68% -41.14%
Sfu are helped.
total ls in shared programs: 129660 -> 129690 (0.02%)
ls in affected programs: 25 -> 55 (120.00%)
helped: 0
HURT: 1
total quadwords in shared programs: 1482728 -> 1484128 (0.09%)
quadwords in affected programs: 58624 -> 60024 (2.39%)
helped: 24
HURT: 195
helped stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
helped stats (rel) min: 3.70% max: 20.00% x̄: 10.34% x̃: 10.00%
HURT stats (abs) min: 8.0 max: 24.0 x̄: 8.16 x̃: 8
HURT stats (rel) min: 1.41% max: 50.00% x̄: 4.84% x̃: 2.56%
95% mean confidence interval for quadwords value: 5.70 7.09
95% mean confidence interval for quadwords %-change: 2.22% 4.14%
Quadwords are HURT.
total spills in shared programs: 125 -> 127 (1.60%)
spills in affected programs: 0 -> 2
helped: 0
HURT: 1
total fills in shared programs: 800 -> 828 (3.50%)
fills in affected programs: 0 -> 28
helped: 0
HURT: 1
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>
It's portable, and useful to both Bifrost and Valhall, in the clause scheduler
and in an instruction selection respectively. Move it from the Bifrost clause
scheduler to common code so we can share the benefits.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>
On Valhall, we always* use memory-allocated IDVS, which does not require min/max
indices. As such, we do not want to calculate min/max indices, as this is quite
slow. Skip this step.
* except for blit shaders, which don't use an index buffer anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16867>
It's unnecessary and breaks the empty shader optimizations. Noticed while
inspecting a trace from dEQP-GLES3.functional.color_clear.masked_scissored_rgb,
which does not produce any varyings other than gl_Position in its vertex shader
and hence should omit the varying shader.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16868>
Top level acceleration structures need the bottom
6 bits to store the root ids of instances. If we
don't require that alignment, more "advanced"
allocators like VMA may sub allocate a buffer
which can lead to the 6 getting lost.
Fixes the Khronos ray tracing Vulkan samples.
Closes: #6598
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16870>
according to the spec, atomic counters can be bound at any offset divisible by 4,
which means that any driver that uses the ssbo lowering pass and doesn't have
a min offset align of 4 is potentially broken
to handle this, use a statevar to inject the misaligned remainder of the offset
into the shader as a uniform. for well-aligned counter binds, the uniform offset
will be 0
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16749>
This pass will merge instructions like these
MOV output[0].x, temp[5].x___;
MOV output[0].yzw, none._001;
into
MOV output[0].xyzw, temp[5].x001;
It is currently very careful with control flow and dependency
tracking, so there is still room for improvements.
Shader-db stats with RV530:
total instructions in shared programs: 132486 -> 132256 (-0.17%)
instructions in affected programs: 6186 -> 5956 (-3.72%)
helped: 65
HURT: 0
total temps in shared programs: 18035 -> 18014 (-0.12%)
temps in affected programs: 295 -> 274 (-7.12%)
helped: 22
HURT: 1
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16657>
This fixes the "Rewrite of inst X failed Can't allocate source
for Inst X src_type=X new_index=X new_mask=X" errors.
The compiler is quite strict when rewriting registers during
the pair allocation and checks that all of the reads of it are
initialized. However the spec doesn't enfore that, and
specifically with control flow depending on user input we can't
really know...
In the following example temp[4].x is written only in one branch,
that might or might not be taken, but this is enough to keep the
compiler happy:
IF aluresult.x___;
MAD temp[4].x, src0.1__, src0.111, src0.000
ENDIF;
src0.xyz = temp[4], src0.w = temp[4]
MAD color[0].xyz, src0.xyz, src0.111, src0.000
MAD color[0].w, src0.w, src0.1, src0.0
After switch to ntt, more IFs are converted to CMP, and the color
write looks like this. Please note that the CMP here is not TGSI
opcode but rather our US_OP_RGB_CMP: src2 >= 0 ? src0 : src1
src0.xyz = temp[4], src0.w = temp[4], src1.xyz = temp[3], src1.w = temp[12], src2.xyz = temp[2]
CMP color[0].xyz, src0.xyz, src1.xyz, -src2.xxx
CMP color[0].w, src0.w, src1.w, -src2.x
At this point temp[4].x is undefined. Now when compiler tries to
allocate register for temp[4] at some previous instruction, it will
find out that it is used as a source in the final CMP and bail out.
Instead of increasing the complexitty even more trying to account for
this, just get rid of the check completelly.
Fixes:
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec2_dynamic_subscript_write_component_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec2_dynamic_subscript_write_direct_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec2_dynamic_subscript_write_dynamic_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec2_dynamic_subscript_write_static_loop_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec2_dynamic_subscript_write_static_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec3_dynamic_subscript_write_component_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec3_dynamic_subscript_write_direct_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec3_dynamic_subscript_write_dynamic_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec3_dynamic_subscript_write_static_loop_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec3_dynamic_subscript_write_static_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec4_dynamic_subscript_write_component_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec4_dynamic_subscript_write_direct_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec4_dynamic_subscript_write_dynamic_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec4_dynamic_subscript_write_static_loop_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec4_dynamic_subscript_write_static_subscript_read_fragment,Fail
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16657>
It doesn't make sense and was not working anyway. This was spotted
by Filip Gawin in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13978
however the fix there was IMO just papering over the problem.
I don't believe that this could manifest as a real issues, because
when all of the swizzles were constant the file would be set to
RC_FILE_NONE already. So in theory this could lead to an issue only
in the close to impossible circumstance that the out of bounds memory
read by constant->u.Immediate[swz] would end with the same exact value
as another inlineable constant in different channel. However in some
circumstances it would lead to following valgrind warnings:
Conditional jump or move depends on uninitialised value(s)
at 0x5D4E690: ieee_754_to_r300_float (radeon_inline_literals.c:61)
by 0x5D4E690: rc_inline_literals (radeon_inline_literals.c:133)
by 0x5D3877A: rc_run_compiler_passes (radeon_compiler.c:436)
by 0x5D38821: rc_run_compiler (radeon_compiler.c:458)
by 0x5D4AF63: r3xx_compile_fragment_program (r3xx_fragprog.c:139)
by 0x5D48377: r300_translate_fragment_shader (r300_fs.c:499)
by 0x5D491B0: r300_pick_fragment_shader (r300_fs.c:601)
by 0x5D2BFEE: r300_create_fs_state (r300_state.c:1072)
by 0x57DDC36: st_create_nir_shader (st_program.c:538)
by 0x57DF10E: st_create_fp_variant (st_program.c:1056)
by 0x57E057C: st_get_fp_variant (st_program.c:1102)
by 0x57E0AB1: st_precompile_shader_variant (st_program.c:1287)
by 0x57E0AB1: st_finalize_program (st_program.c:1333)
by 0x57CB6F3: st_link_nir (st_glsl_to_nir.cpp:958)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16657>
When there are multiple MOVs with the same destination in loop
in different branches and some readers after the loop, we would
now errorneously copy propagate the last MOV, like in the following
snippet:
BGNLOOP;
...
IF temp[3].x___;
MOV temp[2], const[1].yxxy;
BRK;
ENDIF;
IF temp[4].x___;
MOV temp[2], const[1].xyxy;
BRK;
ENDIF;
...
MOV temp[2], const[1].xyxy;
ENDLOOP;
ADD_SAT temp[0], temp[2], temp[1];
into:
BGNLOOP;
...
IF temp[3].x___;
MOV temp[2], const[1].yxxy;
BRK;
ENDIF;
IF temp[3].y___;
MOV temp[2], const[1].xyxy;
BRK;
ENDIF;
...
ENDLOOP;
ADD_SAT temp[0], const[1].xyxy, temp[1];
We need the copy propagate just for simple cleanups after ttn,
anything more complex should have been handled already in NIR.
So just bail out if any of the readers is after the loop.
No changes in shader-db.
Fixes few piglit tests when loop unrolling is disabled:
spec@glsl-1.10@execution@vs-loop-complex-unroll
spec@glsl-1.10@execution@vs-loop-complex-unroll-nested-break
spec@glsl-1.10@execution@vs-loop-complex-unroll-with-else-break
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6467
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16657>
NIR loop unrolling is already enabled so just let it do its job.
Shader-db results (nv120):
total gpr in shared programs: 893490 -> 893898 (0.05%)
gpr in affected programs: 15338 -> 15746 (2.66%)
total instructions in shared programs: 6243205 -> 6237068 (-0.10%)
instructions in affected programs: 71160 -> 65023 (-8.62%)
total bytes in shared programs: 66729616 -> 66664760 (-0.10%)
bytes in affected programs: 759328 -> 694472 (-8.54%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
NIR loop unrolling is already enabled so just let it do its job.
Shader-db results (nv92):
total gpr in shared programs: 734638 -> 735037 (0.05%)
gpr in affected programs: 11058 -> 11457 (3.61%)
total instructions in shared programs: 6073415 -> 6073398 (<.01%)
instructions in affected programs: 10079 -> 10062 (-0.17%)
total bytes in shared programs: 41837432 -> 41838872 (<.01%)
bytes in affected programs: 252504 -> 253944 (0.57%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
NIR loop unrolling is already enabled so just let it do its job.
Shader-db results (nv40):
total instructions in shared programs: 17446532 -> 17446068 (<.01%)
instructions in affected programs: 15532 -> 15068 (-2.99%)
total gpr in shared programs: 82658 -> 82801 (0.17%)
gpr in affected programs: 1680 -> 1823 (8.51%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
Otherwise we will later hit:
gpir_error("nir_ssa_undef_instr is not supported\n");
Unfortunatly this causes a piglit failure due to increased register
pressure in an unrealistic shader but since not doing this can
result in hitting the not supported error in more relistic shaders
this seems the right thing to do for now.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
Force unroll setting based on GLSL IR settings:
case PIPE_SHADER_CAP_INDIRECT_INPUT_ADDR:
case PIPE_SHADER_CAP_INDIRECT_OUTPUT_ADDR:
case PIPE_SHADER_CAP_INDIRECT_TEMP_ADDR:
case PIPE_SHADER_CAP_INDIRECT_CONST_ADDR:
/* a2xx compiler doesn't handle indirect: */
return is_ir3(screen) ? 1 : 0;
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
The NIR unroller is already enabled so just allow it to do its job.
We add a new failure here because llvmpipe fails to handle a
shader that is no longer unrolled.
Previously GLSL IR could unroll the loop because it only had a
single break. However once lower_returns passes over the shader
it ends up with more than 2 breaks making it no longer possible
to unroll. This is a disadvantage of doing the unrolling in NIR
however in practice we don't see shaders in the wild with multiple
returns inside loops.
Being unable to handle this loop is an existing bug with llvmpipe
exposed by the loop no longer being unrolled.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
We now have infrastructure in place to generate variants of vertex shaders
specialized for transform feedback. All that's left is launching these
compute-like kernels before the IDVS job, implementing both the
transform feedback and the regular rasterization pipeline. This implements
transform feedback on Valhall, passing the relevant GLES3.1 tests.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Valhall has no architectural support for transform feedback. So if a vertex
shader uses transform feedback, we need to split the shader into two: a pure
vertex stage and a compute-like transform feedback stage. This splitting
resembles the splitting we do for IDVS.
When compiling a vertex shader that uses transform feedback on Bifrost, also
compile the transform feedback variant. That variant (marked by internal=true)
will get its stores lowered by the NIR pass introduced earlier in this series.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
In both GL and VK, the driver may choose not to support vertex shaders with side
effects (SSBOs, atomics, images). Supporting this opens a can of worms for IDVS.
Neither freedreno nor the (Vulkan?) DDK advertise support, for this reason.
Apps should not be using this anti-feature anyway.
Stop advertising support.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Add a simple NIR-based implementation of transform feedback, appropriate for
OpenGL ES 3.1 class hardware (compute but no geometry or tessellation shaders).
Stores to varyings that will be captured are replaced by stores to transform
feedback buffers and some addressing math. This allows implementing the semantic
of transform feedback in a compute-like stage.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Calling this directly in the linker code allows us to place it between
the varying linker and uniform linker calls which allows for better
optimisation/removal of uniforms.
Also in a later patch it allows us to insert a new nir based
lower_const_arrays_to_uniforms() call after the gl_nir_link_opts()
call. This is important because it allows the linking opts to
move constant arrays to later stages if possible before
lower_const_arrays_to_uniforms() turns them into uniforms.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6541
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770>
The remap tables are used with the GL API so there is no need to
add hidden uniforms to them. Also when we switch to lowering some
constant arrays to uniforms in NIR in a following patch there
will no longer be enough room in the tables as we assign their
size in the GLSL IR linker not the NIR linker currently.
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770>
Doing this in NIR should give better results, but also allows us to
stop calling more GLSL IR optimisations passes.
v2: Skip 8bit and 16bit type that would require further processing
I believe this is an existing bug in the GLSL IR pass also.
v3: rebuild constant initialisers as we want to call this pass
after nir has already lowered them and performed optimisations.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770>
In mingw's `<unknwn.h>`, it's defeind __REQUIRED_RPCNDR_H_VERSION__ to 475,
so that gcc/mingw won't raise compiling error that because directx/d3d12.h
define __REQUIRED_RPCNDR_H_VERSION__ to 500, but the maximal supported __REQUIRED_RPCNDR_H_VERSION__ in mingw
are 475.
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16767>
In mingw's `<unknwn.h>`, it's defeind __REQUIRED_RPCNDR_H_VERSION__ to 475,
so that gcc/mingw won't raise compiling error that because directx/d3d12.h
define __REQUIRED_RPCNDR_H_VERSION__ to 500, but the maximal supported __REQUIRED_RPCNDR_H_VERSION__ in mingw
are 475.
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16767>
...until an entry is added to VERSIONS in zink_device_info.py.
Now that we stop obtaining feature structs of promoted extensions when the VK
impl is new enough, it's possible for things to break when someone updates
vk.xml, and extensions get promoted but the codegen is not updated to include
e.g. info->feats13 and info->props13 (or newer) in zink_device_info
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16265>
(compilation structs refer to VkPhysicalDeviceVulkanXYFeatures/Properties, as
they consist of all features added by extensions promoted in VK X.Y)
The spec prohibits this, so instead we map the fields in the compilation
structs onto the fields in extension structs.
Some extensions' feature/properties structs are not promoted, and the spec
allows including both compilation structs and extension structs in such cases.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16265>
st_texture_release_all_sampler_views uses the validate_mutex,
but st_get_texture_sampler_view_from_stobj didn't.
Since they both modify stObj->view we could have threadA in
st_get_texture_sampler_view_from_stobj with a non-NULL sv,
so expecting sv->view to be non-NULL, while threadB was in
st_texture_release_all_sampler_views clearing sv->view.
It's also needed to protect st_sampler_view::private_refcount,
which is supposed to be used from the owning context thread,
but can also be used by any context in st_texture_release_all_sampler_views.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6088
Fixes: ef5d427413 ("st/mesa: add a mechanism to bypass atomics when binding sampler views")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16779>
Right now, we just consider the size of the accessed portion of the
push constant array, but it doesn't necessarily reflect the size
of the UBO we should declare.
Fixes: de1e941c59 ("microsoft/spirv_to_dxil: Lower push constant loads to UBO loads")
Reviewed-by: Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16703>
Since we now require C11 and C++14, we can use the standard
static_asserts from the standard library instead of rolling our own
compiler-specific versions.
To avoid needing scopes around usage in switch cases, keep the
while-wrapping from before. This means it still can't be used outside of
functions, but that should be fine; we should probably just use
static_assert directly in those cases anyway.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16670>
This macro kinda complements util_is_power_of_two_*, but is implemented
as a macro. This means that it can expand to a constant integral
expression, and thus be used in static_assert.
Because we don't really need the added complexity, this doesn't handle
zero correctly. But that's OK, because the call-sites will.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16670>
dirty is a variable, and C standard doesn't guarantee that the compiler
can see that it's always passed a constant. So this isn't appropriate to
use for static_assert, which we're about to convert STATIC_ASSERT to
expand to.
If we really want to do this compile-time, we need to make
fd_context_dirty a macro instead. But it seems reaonable to just use a
run-time assert instead here.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16670>
Improve: 91fe0b5629 ("radv: Delete lots of sync code.")
As cnd_monotonic.h are include `util/os_time.h`, radv_debug.c and radv_debug.c needs `util/os_time.h`
So include in these files directly.
The compiling errors are:
```
../src/amd/vulkan/radv_debug.c:707:12: error: implicit declaration of function 'os_localtime' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
timep = os_localtime(&raw_time, &result);
../src/amd/vulkan/radv_device.c:97:11: error: implicit declaration of function 'os_time_get_nano' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
return os_time_get_nano();
^
../../src/amd/vulkan/radv_pipeline.c: In function 'radv_create_shaders':
../../src/amd/vulkan/radv_pipeline.c:4119:29: error: implicit declaration of function 'os_time_get_nano' [-Werror=implicit-function-declaration]
4119 | int64_t pipeline_start = os_time_get_nano();
```
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16536>
this one's a little different from other dynamic states in that it
isn't expected to change much, so I've kept it outside of draw handling
since it can be trivially applied elsewhere with no chance of impacting perf
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16838>
Don't compute it based on devinfo->has_64bit_float. Othwerwise we may
end up emitting 64bit-int (Q) instructions on platforms with 64bit
floats but not 64bit integers.
Right now, the only platforms where has_64bit_int is different from
has_64bit_float are the platforms that use GFX7_FEATURES.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15835>
This expression accidentally performs a 32-bit sign-extension when
processing the second half of the expression (the low 16 bits).
Consider -7W, which is represented as 0xfff9fff9 in our encoding (the
16-bit word is replicated to both halves of the 32-bit dword).
Tigerlake's compaction stores the low 11-bits of an immediate as-is,
and replicates the 12th bit. So here, compacted_imm will be 0xff9.
( (int)(0xff9 << 20) >> 4) |
((short)(0xff9 << 4) >> 4))
0xfff90000 | (0xff90 >> 4)
0xfff90000 | 0xfffffff9 ...oops...
0xfffffff9
By casting the second line of the expression to unsigned short, we
prevent the sign-extension when it combines both parts, so we get:
0xfff90000 | 0x0000fff9
0xfff9fff9
Fixes: 12d3b11908 ("intel/compiler: Add instruction compaction support on Gen12")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16833>
Because we lower SPLIT and COLLECT before RA, we need to consider offsets when
determining the dimensions of vectors, in order to align properly. Lowering
COLLECT post-RA would avoid this special case.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>
Speeds up compiling shaders/skia/781.shader_test in shader-db by 8x
(Icecream95).
...At least it did before I extended to support register allocation of vec8. On
Valhall, texture instructions require up to 8 consecutive registers. To handle
this, provide for vec8 register allocation. Liveness was already (accidentally?)
vec8. The increased memory requirement is acceptable given that the interference
matrix is now stored sparsely (Alyssa).
Icecream95 reports the vec8 changes hurt RA performance by about 1% on average.
I consider this acceptable for now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>
This is an array which can either be sparse or dense, and was designed
to be used to track liveness and interference information.
Either a sparse array with sorted indices or dense array is used.
Other data structures were tried, such as red-black trees or hash
tables, but they were slower. When used for storing constraints, the
indices do not have to be sorted as duplicating elements is okay, but
the speedup from that was not enough to justify the extra complexity.
v2: Add a comment about how to potentially speed it up. But it seems
fast enough even without this change.
v3: Use a custom struct rather than relying on util_dynarray.
v4: Split out functions only used for liveness analysis, rather than the simpler
data structure needed for the register interference matrix. If we need to
optimize liveness, that can follow on after. Also make it for vec8 (Alyssa).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>
For the NIR XFB gathering as well as all the Vulkan drivers, buffer
strides in nir_xfb_info are in bytes. When Marek started using
nir_xfb_info for GLSL on radeonsi, he copied directly from the GLSL
struct which has strides in dwords. This inconsistency didn't show up
until I went through and started us using the NIR passes for GL drivers
directly without going through the GLSL structs. We could change the
nir_xfb_buffer_info field to be in dwords to be consistent with
shader_info but that would mean changing all the Vulkan drivers but, for
now, it's easier to always use bytes in nir_xfb_info.
Fixes: 2a22885a45 ("st,nir: Use nir_shader::xfb_info in nir_lower_io_passes")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16819>
Since this logic was written, we've moved to require C11, so this can
now be simplified. First of all, we no longer need to set
__STDC_VERSION__ for C code at all, because the issue that MSVC doesn't
set __STDC_VERSION__ for C99 is longer a concern. Second, we can make
the C++ check unconditional.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16812>
dEQP-VK.glsl.builtin.precision_double.determinant.compute.mat3 was
failing because of a CTS bug, which got fixed in the latest update for
all our CI machines.
This commit assumes this got fixed for all families, even the ones we
did not try to run on.
Fixes: 836ce97f5e (ci: bump VK-GL-CTS to 1.3.2.0)
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16810>
We don't need to go as far as the fake front thing when MSAA is being used, because the
swapchain (single-sampled) is already decoupled from the app render buffers. But we do
need to direct the frontbuffer flush to the single-sampled back buffer, and then present
the back buffer. We also need to swap the buffers when we do this, so the next blit
targets the former front buffer.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16227>
ce1a381e57 ("turnip: enable VK_KHR_16bit_storage on A650") determined
that the type of the instr decided the type of the value being stored in
the ".b" case. But it would be surprising if image stores had the type
determine the coordinates' precision instead of the value's, and once we
turned on image instruction precision lowering we ran into asserts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16616>
A 1D texture operation may need to do a mov to turn a reference to a
channel of an SSA value into a scalar value to be passed as the texture
coordinate (since texture srcs can't do swizzles). Seen in
amnesia-the-dark-descent/low/46.shader_test() for example, where a 1D
texture is used to remap each of r,g,b from a previous texture result.
Besides, the nir_op_is_vec() case will (perhaps surprisingly) look through
a mov, anyway.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16616>
Triggered a BIR validation error, which made debugging a breeze. That validation
pass (dimensionality checks) gets a lot of use, it seems :-)
Fixes:
dEQP-VK.ssbo.layout.2_level_array.std430.row_major_mat4x2_comp_access_store_cols
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16724>
Map a canonical format (a hardware-independent pipe_format) to a compression
mode (Valhall-specific hardware enum defined in GenXML). To be used for packing
plane descriptors and render target descriptors when AFBC is in use on Valhall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
For callers that have a device object, it's easy to pass dev->arch instead of
dev. But this requires callers to have a reference to the device, which is
tricky for callers that only have the arch via PAN_ARCH. Pass dev->arch instead
of dev to accommodate them.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
These do not convey any additional information, and fail to account for
shrinking. In particular, a 64-bit writemask with .keephi would fail to
disassemble and instead trip the assertion, since that would be the ZW
components. Just delete the broken code.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16798>
Logically at the same part of the compile pipeline as clause scheduling on
Bifrost. Lots of similarities, too. Now that we generate flow control only as a
late pass, various hacks in the compiler are no longer necessary and are
dropped.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
Test that we correctly track the scoreboard, helper invocations, reconvergence,
and ends and insert NOPs to effect this expected flow control.
As the pass inserts NOPs but does not otherwise modify the shader, this is easy
to test with well-defined behaviour of the pass.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
On Bifrost, to terminate helper threads we set the td bit on the clause. On
Valhall, we need to use the .discard flow control. Extend the flow control NOP
insertion to insert NOP.discard where necessary to terminate helper threads.
This should reduce wasted work in fragment shaders.
This requires fairly involved data flow analysis, but the handling here should
be optimal.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
The current helper termination analysis code is hardwired for clauses, so it
won't work for Valhall. However, the bulk of it is dataflow analysis which is
portable between Bifrost and Valhall. Export the interesting bits so we can
reuse them on Valhall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
We need to insert dependencies for varyings and memory access. Currently, the
Bifrost scoreboarding pass just treats these as barriers, but this is too heavy
handed. Extend the scoreboard data structure so we can do better.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
This is already allowed by the gallium driver, which uses the same code
for image layout and image views, so everything Just Works and the tests
pass. radv doesn't enable the sampler feature, but I don't see any
reason it wouldn't work and the tests pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16806>
Fixes tests matching:
dEQP-VK.pipeline.extended_dynamic_state.cmd_buffer_start.*unused_ms
These tests bind mesh pipeline, immediately after that bind non-mesh
pipeline and expect that binding mesh pipeline was a no-op.
v2: do it in one place & add comment (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16811>
The Polaris10 in CI has been getting insta-hangs when starting dEQP.
Let's give it 5 attempts to get its act together, as it won't affect
the run time dramatically (max 5 minutes), but will provide more
reliable results for developers.
Tracking of hangs (and many other issues) is done through scrapping the
execution logs, processing them to find these issues, then pushing the
data to influxdb. This allows us to plot the failure rate over time,
and see if the situation is getting better or worse.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16793>
This function allows to only scalarize instructions down to a desired
vectorization width. nir_lower_alu_to_scalar() was changed to use the
new function with a width of 1.
Swizzles outside vectorization width are considered and reduce
the target width.
This prevents ending up with code like
vec2 16 ssa_2 = iadd ssa_0.xz, ssa_1.xz
which requires to emit shuffle code in backends
and usually is not beneficial.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>
The callback allows to request different vectorization factors
per instruction depending on e.g. bitsize or opcode.
This patch also removes using the vectorize_vec2_16bit option
from nir_opt_vectorize().
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>
NIR already has the necessary lowering, and the GLSL lowering violates
GLSL IR validation rules. Once quadop lowering was turned off, the IR
validation at the end of the compile path on DEBUG builds caught the
problem.
In order to move the lowering to NIR, though, we need to make sure that
drivers supporting these functions actually have the lowering flag set.
xfails added for t860, where apparently this tickles a variety of existing
64-bit bugs in the backend.
Fixes: #6461
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16437>
It looks like atomics are slow on compressed surfaces so when enabling
compression for storage images that can be possibly used for atomic
operation hinders performance. Lets just disable compression in this
scenario.
v2: Reword comment (Ken)
Allow mutable with 16/32/64 bits (Ken)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14712>
The vertex and fragment shaders are named foo.sh, while the geometry
shaders are named foo.txt. Since .sh can easily be confused with shell
scripts, let's rename the vertex and fragment shaders to .txt to match
the geometry shaders.
These tests aren't hooked up to run automatically, so there's no need to
update references to the file names.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16791>
Until now we would only disable EZ globally if we had a depth or stencil
load operation or if we had no draw calls at all, but even if we have draw
calls if all of them disable EZ we should also us the global disable.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16794>
When we have a draw call that is incompatible with EZ we should only
disable EZ for the remaining of the job in the case that both of the
following conditions are met:
1. The cause for the incompatibility is an incompatible depth test
direction.
2. The pipeline does Z writes.
Otherwise it is enough to disable EZ temporarily only for draw calls
with the incompatible pipeline.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16794>
Fix defect reported by Coverity Scan.
Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking pD3D12VideoBuffer suggests that it may
be null, but it has already been dereferenced on all paths leading to
the check.
Fixes: d8206f6286 ("d3d12: Add video decode implementation of pipe_video_codec")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16714>
this yields sizable perf improvements in some cases
KHR-GL46.copy_image.functional timing (zink+anv-icl):
before
MESA_LOADER_DRIVER_OVERRIDE=zink ./glcts -n 74.77s user 76.44s system 33% cpu 7:32.38 total
after
MESA_LOADER_DRIVER_OVERRIDE=zink ./glcts -n 69.95s user 68.84s system 33% cpu 6:51.54 total
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16545>
programs have different usages of descriptors, so forcing a recalc on program
change ensures that the right hash values are always set, especially for compact
sets where there's more descriptors going into each hash value
this can (ideally) be optimized later to check for matching interfaces between old
program and new program to avoid recalc if both programs have identical descriptor
usage for a given set
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16755>
this avoids looking at irrelevant 3d pixelstore params like
GL_PACK_IMAGE_HEIGHT when they don't apply, which will cause the storage
buffer to be incorrectly sized and break the operation
Fixes: e7b9561959 ("gallium: implement compute pbo download")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16728>
Instead of SDWA v_mov_b32/v_xor_b32, we can use a combination of
v_add_u16/v_sub_u16 (add/sub swap, similar to xor swap) and v_perm_b32
with a literal.
I don't know yet if GFX11 adds any new instructions which makes this
easier, but this approach should have full functionality.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16595>
Blending configuration needs to be adapted in case the RT format does
not have an alpha channel. This is handled so far correctly.
But when we have two RT, one with alpha and other without it, we need
to split the blend configuration, so one is adapted and the other not.
Otherwise we would be changing the blend config for the wrong RT.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16747>
During certain control-flow manipulation passes, we go out-of-SSA
temporarily in certain areas of the code to make control-flow
manipulation easier. This can result in registers being in phi sources
temporarily. If two sub-passes run before we get a chance to do
clean-up, we can end up doing some out-of-SSA and then a bit more
out-of-SSA and trigger this case. It's easy enough to handle.
Fixes: a620f66872 ("nir: Add a couple quick-and-dirty out-of-SSA helpers")
Fixes: 79a987ad2a ("nir/opt_if: also merge break statements with ones after the branch")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6370
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16111>
This isn't really necessary because the API doesn't allow MSAA and
mipmapping at the same time but people forget that pretty often so it's
good to have it as documentation if nothing else.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14129>
GLSL says that indexing in a samplers array should be done with
a dynamically uniform value.
If the app doesn't obey this constraint, it'll get incorrect
results.
But there's one case where the app might be correct and yet
get incorrect result: if 2 consecutive draws are used, with no
state changes in between, the hardware might run them in the
same wave and the app will get incorrect result as-if it was
using non-dynamically uniform index.
To prevent this, this commit takes advantage of the divergence
analysis pass - if the index is marked as divergent, then it will
have the same effect as using ACCESS_NON_UNIFORM in SPIR-V.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16709>
In addition to pushing it to the current latest stable, the v5.17 kernel
for mesa CI pulls a patch to address a regression in drm that affects at
least the lima jobs.
The dtb for sc7180-trogdor-lazor-limozeen-nots is also updated since the
old one no longer exists in v5.17.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16641>
Until know when we consumed a barrier we would implement it by
setting the serialize flag on a job, which would cause it to
be serialized across all hardware queues (CL, CSD, TFU). However,
now that we track the source(s) of the barrier, we can restrict this
to only the relevant queue(s) instead (multisync path only).
It should be noted that we can implement transfers via TFU or CL
jobs, so if the source of a barrier is a transfer, we currently
synchronize against both the TFU and the CL queues, however, we
may be able to more effectively track this in the future to
restrict this to just one of the queues.
Also, for secondary command buffers we are taking the easy way
out and always synchronize against all queues, but we should
be able to do the same for secondaries without too much effort.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16743>
Until now we have been tracking the dstStageMask of barriers (where they
are consumed) but not where they are produced (the srcStageMask). With
this change we extend our barrier state to keep track of this as well.
This allows the driver to have better knowledge of the intended barrier
semantics so it can limit the amount of synchronization it does only
to the source stages involved with a barrier. We will do this in a
later patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16743>
Until now, we have always consumed barriers with the next GPU job
recorded into the command buffer after the barrier even if the job
was not the target of the barrier itself. This works based on the
idea that when we consume a barrier in a job we serialize it against
all queues, so effectively we are ensuring that whatever came
before it has completed, so if the barrier was intended for an
even later job, it would have served its purpose anyway.
It should be noted that CL jobs are special because they are actually
split in two different queues: the binning queue and the render
queue, with a dependency between them to ensure render runs after
binning. With our current implementation, if we have 3 jobs (A, B,
C) and we have a barrier after job A which is intended to block job C
on A's completion, with our implementation we would instead block
B on A's completion. If C is a CL job, and the barrier was targetting
the binning stage then we can have the following scenarios:
1. If B) is a CL job, it will consume the barrier at its binning
stage, so we know that B's binning will not start until A has
completed. Then C's binning will not start until B's binning
has completed, and thus, will not start until A has completed,
as intended.
2. If B) is not a CL job, it will consume the barrier and will not
start until A has completed, however, C's binning job will be
submitted to the binning queue without any sync requirements
and since B did not put any jobs in the binning queue it will
start as soon as A's binning has completed, but not A's render,
which would be incorrect.
Further, since a981ac0539 we now skip consumming BCL barriers if
a job does not have draw calls that can be affected by them. In the
same scenarios as before, now case 1) would also be problematic,
since B may skip the binning sync in that case and start immediately,
and since C's binning would be allowe to start immediately after B's
binning, there is no guarantee that this doesn't happen in parallel
with A's render.
With this patch we fix this situation by tracking the intended
consumer of each barrier: graphics, compute or transfer, and we make
sure to consume them only with jobs that match those semantics.
This fixes flakyness in dEQP-VK.device_group.*
Fixes: a981ac0539 ('v3dv: skip binning sync if binning shaders don't access external resources')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16743>
Fix defects reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member m_numPkrLog2 is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member m_numSaLog2 is not initialized in this constructor nor in any functions that it calls.
Fixes: 4fdf42b3c2 ("amd: import gfx11 addrlib")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16679>
The previous logic would just set the block to the instructions
original location if we couldn't evict it from a loop.
For now we only push const loads to a later block inside ifs
but we can add more heuristics later. This change helps a
hand full of shaders but also stops a CTS regression caused
by excess spilling after a series I'm working on to disable
more of the GLSL IR optimisation passes.
Shader-db results iris (BDW):
total instructions in shared programs: 17529759 -> 17529749 (<.01%)
instructions in affected programs: 15929 -> 15919 (-0.06%)
helped: 5
HURT: 2
helped stats (abs) min: 1 max: 5 x̄: 2.40 x̃: 2
helped stats (rel) min: 0.06% max: 0.15% x̄: 0.11% x̃: 0.12%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for instructions value: -3.34 0.49
95% mean confidence interval for instructions %-change: -0.14% 0.02%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 861109994 -> 861099681 (<.01%)
cycles in affected programs: 7027698 -> 7017385 (-0.15%)
helped: 95
HURT: 72
helped stats (abs) min: 1 max: 7995 x̄: 138.54 x̃: 9
helped stats (rel) min: <.01% max: 15.96% x̄: 0.54% x̃: 0.11%
HURT stats (abs) min: 1 max: 474 x̄: 39.56 x̃: 12
HURT stats (rel) min: <.01% max: 1.17% x̄: 0.20% x̃: 0.11%
95% mean confidence interval for cycles value: -159.05 35.54
95% mean confidence interval for cycles %-change: -0.45% 0.01%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 17606 -> 17605 (<.01%)
spills in affected programs: 323 -> 322 (-0.31%)
helped: 1
HURT: 0
total fills in shared programs: 22599 -> 22598 (<.01%)
fills in affected programs: 1348 -> 1347 (-0.07%)
helped: 1
HURT: 0
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14940>
LD_VAR_BUF instructions on Valhall take a source format, indicating the
in-memory format of the varying independent from the register format, which we
still model within the compiler for compatibility with Bifrost. (Prior to
Valhall, source format is specified in the attribute descriptor as a physical
pixel format.)
Model this information, allowing us to generate fp16 LD_VAR_BUF instructions
correctly on Valhall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16752>
Fixes a vector dimension validation failure in
dEQP-GLES3.functional.shaders.indexing.varying_array.vec4_static_write_dynamic_read
after we enable fp16 varyings.
No shader-db changes, as we don't yet support fp16 varyings.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16752>
As fusing VAR_TEX is an optimization, it's helpful to have unit tests since
functional tests won't check that the optimization triggers when expected.
Originally written when I was touching the VAR_TEX code. Those changes have
since been dropped by the unit test remains useful.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16752>
The register precolouring logic assumes that coverage masks are always in R60,
so spilling them causes incorrect results. We could do better. Fixes on Valhall:
dEQP-GLES3.functional.ubo.random.all_per_block_buffers.28
Fixes: 3df5446cbd ("pan/bi: Simplify register precolouring in the IR")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16748>
On Valhall, the driver should set this flag if the hardware may rotate
primitives. This happens if:
1. The rasterization of lines does not matter, AND
2. The provoking vertex does not matter.
The first condition we may satisfy by checking for LINES and the second by
checking for flat shading. Otherwise, we should set this flag to allow
optimizations. This may be more efficient for tiling.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16748>
These basically correspond to the alpha_zero_nop and alpha_one_store flags we
already compute and set. Except those flags don't exist on Valhall, so these
need to be used instead (on Bifrost, in addition .. unclear why the duplication
on Bifrost).
Set these flags when we can. Ostensibly this is for performance (neglible
improvement on glmark2 score), but mostly I want to get us using the hardware
optimally.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16748>
Sometimes you can end up with tex instructions that have sampler deref srcs, even though
they don't need them, e.g. a txs. In this case, still fix up those derefs in the sampler
splitting pass rather than leaving them pointing to a typed sampler.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16639>
There is also a hypothetical scenario where
transformData is 0 and transformOffset is not 0
and we end up reading from transformOffset because
transform_addr is not 0.
VkAccelerationStructureBuildRangeInfoKHR spec:
If VkAccelerationStructureGeometryTrianglesDataKHR::transformData is not NULL, a single VkTransformMatrixKHR structure is consumed from VkAccelerationStructureGeometryTrianglesDataKHR::transformData, at an offset of transformOffset. This matrix describes a transformation from the space in which the vertices for all triangles in this geometry are described to the space in which the acceleration structure is defined.
Which I think means, that we should ignore
transformOffset if transformData is NULL.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16719>
VkAccelerationStructureBuildRangeInfoKHR spec:
If the geometry uses indices, primitiveCount × 3 indices are consumed from VkAccelerationStructureGeometryTrianglesDataKHR::indexData, starting at an offset of primitiveOffset. The value of firstVertex is added to the index values before fetching vertices.
If the geometry does not use indices, primitiveCount × 3 vertices are consumed from VkAccelerationStructureGeometryTrianglesDataKHR::vertexData, starting at an offset of primitiveOffset + VkAccelerationStructureGeometryTrianglesDataKHR::vertexStride × firstVertex.
Meaning: We always add firstVertex * vertexStride
to the vertex address and add primitiveOffset
either to the vertex address or the index address,
depending on wether indices are used.
Also add missing handling with instances.
Fixes: 0dad88b ("radv: Implement device-side BVH building.")
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16719>
Currently dpb_enc referneces not updated properly when index 0, as
we are skipping clearing that ref.
This patch will fix this for index 0. So that when ever we set
non_referenced flag, that is not used as ref and not pushed to DPB.
This is helping in SVC encoding.
Signed-off-by: SureshGuttula <suresh.guttula@amd.com>
Reviewed-by: Thong Thai <thong.thai@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16745>
For cases with lots of very small primitives, this may improve
performance because we're not executing those dead channels all the
time.
Shader-db reports no instruction or cycle-count changes. However, by
hacking up the driver to report when this optimization triggers, it
appears to affect about 10% of shader-db.
v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now,
because using VMask on those platforms allows us to perform the
eliminate_find_live_channel() optimization. However, XeHP doesn't
seem to have packed fragment shader dispatch, so we lose that
optimization regardless, and there's no reason not to avoid vmask.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>
It turns out that we need a fragment shader for streamout. Whh? From
Lionel's reading of simulator sources, it seems the streamout unit is
looking at enabled next stages. It'll generate output to the clipper in
the following cases :
- 3DSTATE_STREAMOUT::ForceRendering = ON
- PS enabled
- Stencil test enabled
- depth test enabled
- depth write enabled
- some other depth/hiz clear condition
Forcing rendering without a PS seems like a recipe for hangs so it's
probably better to just enable the PS in this case.
Fixes: 36ee2fd61c ("anv: Implement the basic form of VK_EXT_transform_feedback")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>
Starting with Ivy Bridge, we implement alpha-to-coverage by writting
gl_SampleMask with a pattern based on alpha. This will show up in
wm_prog_data::uses_omask so we don't need to look at the key.
Fixes: 36ee2fd61c ("anv: Implement the basic form of VK_EXT_transform_feedback")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>
Add unit tests ensuring the optimization applies in all the cases we care about,
as functional integration tests (CTS and Piglit) won't test this. Also add unit
tests for a few cases where we specifically cannot fuse, in case these cases are
missed by the tests.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16725>
1. Lowers NV_mesh_shader TASK_COUNT output to launch_mesh_workgroups.
2. Removes all code after launch_mesh_workgroups, enforcing the
fact that it's a terminating instruction.
3. Ensures that task shaders always have at least one
launch_mesh_workgroups instruction, so the backend doesn't
need to implement a special case when the shader doesn't have it.
4. Optionally, implements task_payload using shared memory when
task_payload atomics are used.
This is useful when the backend is otherwise not capable of
handling the same atomic features as it can for shared memory.
If this is used, the backend only has to implement the basic
load/store operations for task_payload.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16720>
VK_KHR_format_feature_flags2 is mostly about define a new 64-bit
VkFormatFeatureFlagBits2KHR format feature flag type, as 29 bits of
the 32-bit VkFormatFeatureFlagBits are already in use.
So all the bits from VkFormatFeatureFlagBits are being replicated, and
most of the work here consist on switch to the new flags.
From the new (not replicated from VkFormatFeatureFlagBits) flag bits,
we don't support
VK_FORMAT_FEATURE_2_STORAGE_READ_WITHOUT_FORMAT_BIT_KHR or
VK_FORMAT_FEATURE_2_STORAGE_WRITE_WITHOUT_FORMAT_BIT_KHR, as right now
we require the format on the shader for doing the read and stores.
We use now VK_FORMAT_FEATURE_2_SAMPLED_IMAGE_DEPTH_COMPARISON_BIT_KHR,
but only applying it for depth formats.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16718>
On GPUs that support AFBC with tiled headers, try to use tiled headers instead
of linear headers. This should be a bit more efficient for the caches.
Additionally, on Mali, tiled headers are tied to solid colour blocks, so this
has the effect of enabling AFBC with solid colour blocks where supported.
Unfortunately, results are disappointing. Mali-G52:
-btexture from 856fps to 859fps
-bdesktop from 292fps to 294fps
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
The header size is the header stride times the number of rows in the header
(number of tiles of superblocks). We already calculate the header stride, so
eliminate the separate header size calculation.
Delete the old header size calculation. It has no notion of wide blocks, let
alone tiled AFBC headers.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
We want to add more tracepoints for intel drivers. Having them all
enabled at the same time can be both costly and unreadable.
This allows a driver to specify an environment variable and values to
enable/disable tracepoints.
v2: s/TRACEPOINTS_ENABLES/TRACEPOINTS_TOGGLES/ (Danylo)
s/config_name/toggle_name/
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16717>
Otherwise we can fail to allocate tied operands if we spill the tied operand.
Seen in shaders/android/com.miHoYo.GenshinImpact/16.shader_test with a
particularly bad scheduling causing excessive spilling.
No shader-db changes.
Fixes: bc17288697 ("pan/bi: Lower split/collect before RA")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16378>
According to the Vulkan spec 21.4 "Conditional Rendering",
only clearing attachments with vkCmdClearAttachments is subject to
conditional rendering.
Subpass clear and vkCmdClearColorImage / vkCmdClearDepthStencilImage
should always be executed even if it happens in a
conditional rendering block.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16654>
In some scenario ac_init_llvm_target may be called twice,
but the LLVM library won't have been unloaded between
the 2 calls, leading to a LLVM warning being printed.
Example pseudo-code to trigger this for radeonsi:
gbm_create_device();
eglInitialize();
eglTerminate();
gbm_device_destroy();
gbm_create_device();
eglInitialize();
eglTerminate();
gbm_device_destroy();
To avoid the warning message from LLVM, clear the command line
parser state before calling LLVMParseCommandLineOptions.
This might fix https://gitlab.freedesktop.org/mesa/mesa/-/issues/5960
This is done only on LLVM 12+ because it seems to break some apps
on LLVM 11 (there has been some work post LLVM 11 release to refactor
CommandLine.cpp, see 42f588f39c5c and the following commits).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16587>
This method isn't part of the C API but we can still use it and
avoid getting error messages from the command line parser:
mesa: for the [...]: may only occur zero or one times
We could call it at the beginning of ac_init_llvm_target but
this may hide some real bugs so let drivers call it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16587>
As it could happens that when a bo is reused from the cache, it is
being mapped with a smaller size that needed. So let's just unmap it,
and let be remapped with the needed size.
Even if we could try to be smarter when deciding when to unmap or not,
to avoid uneeded re-mappings later, it is also true that doing the
unmap would help to reduce the memory used.
This fixes an assert when running the following tests in a row (same
deqp-vk execution):
dEQP-VK.pipeline.creation_feedback.graphics_tests.vertex_stage_fragment_stage
dEQP-VK.pipeline.executable_properties.graphics.vertex_stage_geometry_stage_fragment_stage
dEQP-VK.pipeline.executable_properties.graphics.vertex_stage_fragment_stage_internal_representations
That hits the following assertion:
assert(qpu_bo && qpu_bo->map_size >= variant->assembly_offset +
variant->qpu_insts_size);
at v3dv_pipeline.c, pipeline_get_qpu.
v2: use just one call to v3dv_bo_unmap (move call at v3dv_bo_free,
replace call at bo_free for assert) (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16678>
Originally, we had virtual opcodes for scratch access, and let the
generator count spills/fills separately from other sends. Later, we
started using the generic SHADER_OPCODE_SEND for spills/fills on some
generations of hardware, and simply detected stateless messages there.
But then we started using stateless messages for other things:
- anv uses stateless messages for the buffer device address feature.
- nir_opt_large_constants generates stateless messages.
- XeHP curbe setup can generate stateless messages.
So counting stateless messages is not accurate. Instead, we move the
spill/fill accounting to the register allocator, as it generates such
things, as well as the load/store_scratch intrinsic handling, as those
are basically spill/fills, just at a higher level.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16691>
Unify the "create regular resource" and "create scanout resource" code paths.
They only differ in how the backing memory is allocated. This unifies the layout
code as well, avoiding hacks for AFBC.
What I really care about is that, if we're creating the resource, we choose the
layout first with pan_image_layout_init and allocate that layout.
pan_image_layout_init is a common, extensively tested (including unit tested)
helper written for correctness with a deep understanding of the hardware.
By contrast, we currently guess the layout with some hacks specific to AFBC,
allocate our guess, and then then tell pan_image_layout_init to use the layout
we guessed and pray everything works out. (It does work out, but it's all kinds
of wrong, in terms of layering violation. If that really is the way to go, I can
add the required routines to the layout code. But that doesn't seem right.)
All of this is motivated by extending the layout code to handle AFBC with other
superblock sizes or tiled headers without having to pile on extra hacks in this
WSI path.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16334>
It's meaningful for this intrinsic and so does not add noise to the
lowering pass.
(Although dual-source writes must be to RT 0, depth and stencil
writes, which store_combined_output_pan is also used for, can still be
done with MRT enabled.)
Fixes: 5c168f09eb ("nir: Eliminate store_combined_output_pan BASE")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16685>
LAVA is mangling the escape codes from ANSI during log fetching from the
target device, making the gitlab section markers from deqp, for example,
to not work, inputting noise into the log.
This commit makes the simplest fix which is to replace the mangled
characters to the fixed ones.
This approach is error-prone, since it may unwittingly replace a genuine
log that resembles the mangled escape code. But this solution should
suffice until we get a proper fix from LAVA team itself.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16520>
LAVA is mangling the escape codes from ANSI during log fetching from the
target device, making the colored lines from deqp, for example, to not
work, inputting noise into the log.
This commit makes the most straightforward fix which is to replace the
mangled characters to the fixed ones.
This approach is error-prone since it may unwittingly replace a genuine
log that resembles the mangled escape code. But this solution should
suffice until we get a proper fix from LAVA developers itself.
Fixes: #5503
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16520>
The branch operation supports comparisons, so it may be possible to
merge a previous comparison operation with the branch operation.
There are several restrictions to do it at this stage, but it may save
instructions in many cases.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16163>
Looks like MSVC doesn't like VLAs:
src/compiler/spirv/spirv_to_nir.c(3879): error C2057: expected constant expression
src/compiler/spirv/spirv_to_nir.c(3879): error C2466: cannot allocate an array of constant size 0
src/compiler/spirv/spirv_to_nir.c(3879): error C2133: 'srcs': unknown size
so let's use a static array size.
Fixes: 87d7431198 ("spirv: Use nir_vec_scalars() to simplify matrix transpose.")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16632>
D3D12 fences are capable of handling binary operations, but the
current dzn_sync implementation doesn't match vk_sync expectations
when sync objects are used to back semaphores. In that case, the wait
operation is supposed to set the sync object back to an unsignaled
state after the wait succeeded, but there's no way of knowing what
the sync object is used for, and this implicit-reset behavior is not
expected on fence objects, which also use the sync primitive.
That means we currently have a semaphore implementation that works
only once, and, as soon as the semaphore object has been signaled it
stays in a signaled state until it's destroyed.
We could extend the sync framework to pass an
implicit-reset-after-wait flag, but, given no one else seems to
need that, it's probably simpler to drop the binary sync
capability and rely on the binary-on-top-of-timeline emulation provided
by the core.
Fixes: a012b21964 ("microsoft: Initial vulkan-on-12 driver")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16629>
Instead of using a union in radv_pipeline, this introduces new
structures for graphics, compute and library pipelines which inherit
from radv_pipeline. This will ease graphics pipeline libary implem.
There is still no radv_raytracing_pipeline because RADV actually
uses a compute pipeline for everything but it could be introduced
later when necessary.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16603>
Add a library that wraps the key IOKit entrypoints used in the macOS
UABI for AGX. Our wrapped routines print information about the kernel
calls made and dump work submitted to the GPU using agxdecode. This code
has two major use cases:
1. Debugging Mesa, particularly around the undocumented macOS
user-kernel interface. Logs from Mesa may compared to Metal to check
that the UABI is being used correcrly.
2. Reverse-engineering the hardware, using this as glue to get at the
"interesting" GPU memory.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16512>
This is pretty simple now that the hardware is understood. The hardware
interfaces parallels that of scissors, so the scissor path is reused
with minor modifications to accommodate the new functionality.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16512>
Aka occlusion queries. There is an annoying limitation in the hardware
(reflected in Metal) that only a single buffer may be bound per render
pass, with the per-draw settings merely specifying an offset.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16512>
This uses a subset of the depth/stencil infrastructure we built out to
support writing back tiled, uncompressed Z32F depth buffers to memory.
Texturing from this format is already supported.
This gets glmark2 -bshadow working.
v2: Fix partial renders
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16512>
In copy_format, we treat snorm as unorm to avoid clamping. But snorm
and unorm are UBWC incompatible for special values such as all 0's or
all 1's. Disable UBWC for snorm.
For reference, I dumped the first byte of an UBWC blocks and it was
color UNORM SNORM
all black 0x01 0x31
all white 0x0d 0x11
@flto clarified that bit 4 is unset for fast clear encoded blocks. It
looks like fast clear is not used for SNORM.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6480
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16534>
eglCreatePlatformWindowSurface[EXT] and
eglCreatePlatformPixmapSurface[EXT] should be passed (xcb_window_t *)
and (xcb_pixmap_t *), so we must dereference these types before using
them as drawables. We already do something similar with X11 drawable
pointers.
Signed-off-by: Jeffrey Knockel <jeff@jeffreyknockel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16269>
The value should be at the bottom 24 bits, not at the top.
dEQP-VK.pipeline.sampler.* still passes. This fixes most of
dEQP-GLES31.functional.texture_border_clamp.formats.*depth* on angle.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16570>
The struct is returned from a function, so in debug builds the address
may change after returning, and pointers to patched_s will be broken.
Pass the pointer to the patched stencil view as a parameter to
pan_preload_get_views to avoid this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16343>
If the sample mask is being written it means we want to discard some of the
samples generated so we should not be promoting the fragment shader to
do early tests, since that would not take into account the sample mask
written from the shader.
Fixes:
dEQP-VK.fragment_operations.early_fragment.sample_count_early_fragment_tests_depth_samples_4
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16626>
GPUs with the LINEAR_PE feature bit have the ability to render into linear
buffers. While this decreases PE cache effectiveness and is thus slower than
rendering into a (super-)tiled buffer, it's still preferable for cases where
we would need a blit to get into linear otherwise, i.e. when importing a
linear buffer or when linear is forced on allocation by usage flags or
modifiers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16615>
The blob only switches to the 3 single buffer state when required, which seems
to be the case when any color or ZS target is <= 16bpp. Using 2 as the single
buffer state gives a very small 1-2% performance improvement on fillrate
constrained rendering, so it likely affects some PE cache setting.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16615>
Some glcts tests implement tons of tests because they verify
every possible combination of format/swizzle/target/...
They take a long time to execute and aren't possible to run
using multiple processes.
The proper way to fix it would be to split them in vk-gl-cts,
as is already done for some of them (eg es31fTextureGatherTests.cpp).
In the meantime, not running them makes glcts run almost
10 times faster.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16580>
Since PIPE_CAP_TGSI_TEXCOORD is now set in SVGA vgpu10 driver,
we need to add a new parameter need_texcoord_semantic to
tgsi_add_point_sprite and tgsi_add_aa_point
to allow setting texcoords using tgsi texcoord semantic.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16598>
Now that our IR is much more strongly typed, and RA code quality depends on
correct typing, add a validation pass to make sure we didn't screw it up. This
pass found a massive number of bugs in early versions of this series.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
We tightened the rules around preloading substantially and take advantage of the
rules in RA. The safe helpers it introduced should ensure the rules are
followed, but just in case, add a validation pass to check our work. This pass
found (multiple) bugs in early versions of this series.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
In the current IR, any register may be preloaded by reading it anywhere, and any
register may be precoloured by writing it anywhere. This is convenient for
instruction selection, but requires the register allocator to do considerable
gymnastics to ensure it doesn't clobber precoloured registers. It also breaks
the purity of our SSA representation, which complicates optimization passes
(e.g. copyprop).
Let's trade some instruction selection complexity for simplifying register
allocation by constraining how register precolouring works. Under the new model:
* Registers may only be preloaded at the start of the program.
* Precoloured destinations are handled explicitly by RA.
Internally, a stronger invariant is placed for preloading: registers may only be
preloaded by MOV.i32 instructions at the beginning of the block, and these moves
must be unique. These invariants ensure RA can trivially coalesce the moves.
A bi_preload helper is added as a safe version of bi_register respecting these
invariants, allowing a smooth transition for instruction selection.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
Move can take in a vector and write a scalar, depending on the swizzle. We need
to handle this case. Split out mov and pack_32_2x16 so we can specify correct
behaviour for both. Also drop unused 1-bit boolean stuff which obscured the fix.
Fixes: 76cea8e27b ("panfrost: Fix pack_32_2x16 implementation")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
"Revisiting Out-of-SSA Translation for Correctness, Code Quality, and
Efficiency" discusses "value-based interference": two variables interfere if and
only if there exists a point in the program where they are both live *with
different values*. In particular, the source and destination of a move do not
interfere a priori, because they have the same value at that point in the
program. (If a later instruction overwrites one, the required interference will
be added there).
We can use this idea to avoid some extra interferences, avoiding a regression in
moves from split/collect.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
If we don't lower phis to scalar, when we go out of SSA, we can get vector
nir_registers. In particular, we can get code like:
r0 = vec2 r0.y, r0.x
This code looks like a move, but is in fact a swap. The trivial lowering of vec2
would not work -- the following fails to swap correctly:
r0.x = r0.y
r0.y = r0.x
Currently, we generate temporaries to handle these cases. It's easy to move the
complexity to NIR, though, and we'll want to scalarize phis for SSA-based RA
anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
Minor ISA detail missed in the Bifrost scheduler. I hit this in an early version
of this series (where a move feeding into a blend shader return was not
coalesced). Let's get it fixed in the scheduler.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
This source size information will be consumed by the 64-bit lowering pass, so
ensure it's accurate. That means marking 32-bit and 64-bit sources explicitly on
message passing where it wouldn't match up with the type size suffix of the
instruction.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
This can happen if an object is serialized whose object type isn't in
the pipeline cache import ops. In this case, we generate a raw data
object and plan to turn it into the right object type later.
Fixes: d35e78bb85 ("vulkan/pipeline_cache: Implement deserialize for raw objects")
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16584>
In f99ac7f2de ("v3dv: Don't use color aspects for depth/stencil
images"), we stopped using color aspects for depth/stencil images in a
bunch of cases. This causes us to trigger an assert in
copy_buffer_to_image_shader where it assumes 16-bit is always color but
now it can also be D16_UNORM. The assert isn't protecting us from
anything we weren't already doing before so we can just loosen it a bit.
Fixes: f99ac7f2de ("v3dv: Don't use color aspects for depth/stencil images")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16592>
When deqp-runner times out, it kills the deqp process, which in our case
is the previous invocation of our shell script, so the crosvm and socat
cleanup never happened. crosvm has a way to cleanly shut down a previous
crosvm invocation, so let's just use that and do our cleanup when we need
to.
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16485>
Each shader stage has its own "early preamble" flag.
Early preamble is likely an optimization to hide some of latency
when loading UBOs into consts in the preamble.
Early preamble has the following limitations:
- Only shared, a1, and consts regs could be used
(accessing other regs would result in GPU fault);
- No cat5/cat6, only stc/ldc variants are working;
- Values writen to shared regs are not accessible by the rest
of the shader;
- Instructions before shps are also considered to be a part of
early preamble.
Note, for all shaders from d3d11 games blob produced preambles
compatible with early preamble mode.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15901>
This reverts commit 68ef895674.
When trying out transcode_astc=true with BPTC on Asphalt 9, we observed
very poor image quality - to the point that basic UI icons were blocky,
and buttons with a black border had smeared pixels on the edges. Using
DXT5 had no such issues.
I originally suspected there was a bug in the BPTC encoder, but I now
believe the issue is deeper than that. The commit that introduced the
encoder, 17cde55c53, says:
"The compressor is written from scratch and takes a very simple
approach. It always uses a single mode of the BPTC format (4 for
unorm and 3 for half-floats) and picks the two endpoints by dividing
the texels into those which have more or less than the average
luminance of the block and then calculating an average color of the
texels within each division.
It's probably not really sensible to try to use BPTC compression at
runtime because for example with the Nvidia offline compression tool
it can take in the order of an hour to compress a full-screen image.
With that in mind I don't think it's worth having a proper compressor
in Mesa and this approach gives reasonable results for a usage that
is basically a corner case."
In other words, the reason our BPTC compressor was so fast is that it
only implements one of the modes and does a low quality approximation.
This honestly should probably be improved somewhat, but the original
use case was for online-compression, the uncommon but mandatory OpenGL
feature where you can supply uncompressed data and trust the driver to
compress it for you (at unknown and uncontrolled quality and speed).
Unfortunately, the compressor as it stands is simply not usable for
transcoding ASTC data where we want to preserve the underlying image
quality as much as possible.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16566>
When caching NIR, it's cached as a raw object because we cache the
serialized NIR. When it's then loaded from the disk cache later, we
fail to deserialize it because raw objects are a special case. There
are two callers of vk_pipeline_cache_object_deserialize(), one of which
has a special case for raw objects and the other is called only when
we've checked that it isn't a raw object. The special cases are
pointless; raw objects should deserialize themselves.
Fixes: 591da98779 ("vulkan: Add a common VkPipelineCache implementation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16281>
without dynamic vertex input, pipeline vertex state must be recalculated
if buffer strides change or the enabled buffer mask changes in order
to accurately handle dynamic state stride VUs
cc: mesa-stable
fixes:
spec@!opengl 1.1@array-stride
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16563>
this is one of those cases where some bizarro format is being created
for sampling only, but gallium blasts out all the bind flags at once
trust that we're not going to do anything too crazy and let surface
usage pruning handle the rest
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16563>
the idea here is that if a resource is intended to be used solely as a rendertarget
and can't be blitted to or drawn to, then resource creation should fail
but if the resource might be e.g., a texture, then it can probably hit the subdata
path and be fine
Fixes: 37ac8647fc ("zink: reject resource creation if format features don't match attachment")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16563>
gallium can't directly support vertex attribute instance rate zero, since
the instance rate is also used to determine if the data is per-vertex or
per-instance in the first place (hence divisor zero meaning the data is
per vertex).
While it's an optional feature for VK_EXT_vertex_attribute_divisor, some
apps require it to work (it's a standard d3d10 feature and widely
supported), hence translate it away as MAX_UINT32 divisor instead (which
at this point probably makes more sense than to change the gallium
interface), which should work all the same.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16526>
If this == NULL, we'll get a crash which is pretty much the same thing
when it comes to ease of debugging. This fixes a giant pile of warnings
with GCC 12 of the form:
../src/compiler/glsl/ir.h: In member function ‘ir_dereference* ir_instruction::as_dereference()’:
../src/util/macros.h:149:30: warning: ‘nonnull’ argument ‘this’ compared to NULL [-Wnonnull-compare]
149 | #define assume(expr) ((expr) ? ((void) 0) \
| ~~~~~~~~^~~~~~~~~~~~~~
150 | : (assert(!"assumption failed"), \
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
151 | __builtin_unreachable()))
| ~~~~~~~~~~~~~~~~~~~~~~~~~
../src/compiler/glsl/ir.h:150:7: note: in expansion of macro ‘assume’
150 | assume(this != NULL); \
| ^~~~~~
../src/compiler/glsl/ir.h:160:4: note: in expansion of macro ‘AS_BASE’
160 | AS_BASE(dereference)
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16558>
The spec says that VkGraphicsPipelineCreateInfo::pTessellationState is
ignored and may be an invalid pointer in some cases. When ignored,
patch the pCreateInfo with `pTessellationState = NULL`, so the encoder
doesn't attempt to encode an invalid pointer.
Tested in Borealis, with debug build of venus, with a minimal test app
that sets `.pTesselationState = 0x17`. Pre-patch, the app crashes;
post-patch, the app works.
Signed-off-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16284>
We already had a little workaround for v3dv where, for some if its meta
ops, it had to bind a depth/stenicil image as color. Instead of
special-casing binding depth/stencil as color, let's flip on the
drier_internal flag and get rid of most of the checks in that case.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16376>
XeHP supports a 20:5 pointer format, so the offset can legitimately
be more than UINT16_MAX. Likewise, with 256B binding table mode on
Icelake/Tigerlake, we might have 18:8 pointers that exceed UINT16_MAX.
Thanks to Felix DeGrood for catching this!
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16538>
VVL is great, but there's actually cases where it doesn't catch critical
spirv errors, so add in our own validation pass to make sure things are
okay
this is especially useful for running on nvidia, as their compiler will
either crash on or silently drop illegal instructions
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16462>
depth and stencil states should only be set if the corresponding attachment
is present, otherwise they should be ignored. this is different from
ignoring the entire VkPipelineDepthStencilStateCreateInfo struct, as
it's possible that only depth or only stencil may be present
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16457>
This will allow us to disable the GLSL IR loop unroller in a
following patch and rely on the NIR loop unroller instead.
This allows the piglit test spec@!opengl 2.0@max-samplers border
to pass on the v3d rpi4 driver.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>
Some drivers don't support these indirects and therefore require
loop unrolling if a shader uses a loop induction variable to
access a sampler array.
Here we add a new nir shader compiler option that drivers can set,
this will be the equivalent of the EmitNoIndirectSampler setting
used in the GLSL IR unrolling pass.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>
SO was always enabled before this change. That meant, after a call to
tu_CmdBindTransformFeedbackBuffersEXT to emit VPC_SO_BUFFER_SIZE, any
draw call (from the same render pass, in a different render pass, or in
a different cmdbuf) could potentially cause writes to the SO buffers
regardless of whether the draw is inside xfb begin/end or not.
I choose to emit VPC_SO_DISABLE instead of using stateobjs like
freedreno does only because it is simpler. It is not clear to me which
is more efficient to HW.
This also fixes double SO writes for gmem rendering. While
tu6_tile_render_begin was careful to disable SO for the draw pass,
tu6_emit_tile_select re-enabled it.
dEQP-VK.transform_feedback.* still passes. It fixes
dEQP-GLES3.functional.transform_feedback.* on angle.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16502>
It seems fine to advertise msaa in sampledImageIntegerSampleCounts.
dEQP-VK.rasterization.rasterization_order_attachment_access.format_integer.*
goes from NotSupported to Pass for more test cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16487>
Fixes: 3e85a0c90b ("ac/surface: define gfx11 modifiers")
../../src/amd/common/ac_surface.c: In function 'ac_get_supported_modifiers':
../../src/amd/common/ac_surface.c:421:47: error: 'AMD_FMT_MOD_TILE_GFX11_256K_R_X' undeclared (first use in this function); did you mean 'AMD_FMT_MOD_TILE_GFX9_64K_R_X'?
421 | unsigned swizzle_r_x = num_pipes > 16 ? AMD_FMT_MOD_TILE_GFX11_256K_R_X :
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| AMD_FMT_MOD_TILE_GFX9_64K_R_X
../../src/amd/common/ac_surface.c:421:47: note: each undeclared identifier is reported only once for each function it appears in
In file included from ../../src/amd/common/ac_surface.c:31:
../../src/amd/common/ac_surface.c:424:61: error: 'AMD_FMT_MOD_TILE_VER_GFX11' undeclared (first use in this function); did you mean 'AMD_FMT_MOD_TILE_VER_GFX10'?
424 | AMD_FMT_MOD_SET(TILE_VERSION, AMD_FMT_MOD_TILE_VER_GFX11) |
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/amd/common/ac_drm_fourcc.h:75:21: note: in definition of macro 'AMD_FMT_MOD_SET'
75 | ((uint64_t)(value) << AMD_FMT_MOD_##field##_SHIFT)
| ^~~~~
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16373>
Fixes: b261ac1ab5 ("ac/gpu_info: print all IP versions reported by the kernel")
```
../../src/amd/common/ac_gpu_info.c
../../src/amd/common/ac_gpu_info.c: In function 'ac_query_gpu_info':
../../src/amd/common/ac_gpu_info.c:545:44: error: 'struct drm_amdgpu_info_hw_ip' has no member named 'hw_ip_version_major'
545 | info->ip[ip_type].ver_major = ip_info.hw_ip_version_major;
| ^
../../src/amd/common/ac_gpu_info.c:546:44: error: 'struct drm_amdgpu_info_hw_ip' has no member named 'hw_ip_version_minor'
546 | info->ip[ip_type].ver_minor = ip_info.hw_ip_version_minor;
| ^
```
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16373>
As of ca63a5ed3e ("glsl: fix interpolateAtXxx(some_vec[idx], ...) with
dynamic idx"), this lowering pass does two things. It converts
ir_binop_vector_extract to an if-ladder to select the dynamically
indexed component, and it extracts a ir_binop_vector_extract from the
source of an interpolateAt function and applies to the result instead.
This change adds a flag to disable the former behavior. The latter is
still useful, but NIR has better (and soon even better) ways of doing
the former.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16440>
It has been disabled in December 2021 due to unreliability,
and never got re-enabled.
VEGA10 is disabled because it currently fails:
Replay of parallel-rdp/uber_subgroup.foz failed
Fossilize ERROR: Compute pipeline crashed or hung, hash: 520406f40241abf8. Rerun with: --compute-pipeline-range 4 5.
Suggested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16455>
This allows glamor to successfully compile its shaders on the GC400.
When running glamor using the GC400, Xorg reports that the compiled
shaders exceed the maximum allowed instructions because the value
reported from the kernel is halved.
Xserver[314]: etna_draw_vbo:318: compiled shaders are not okay
$ cat /sys/kernel/debug/dri/128/gpu | grep instruction_count
instruction_count: 256
However, the spec for the Unified vertex-fragment shader explicitly
lists 256 as the maximum number of instructions for each shader
("256 for vertex shaders; 256 for fragment shaders").
Signed-off-by: Kyle Russell <bkylerussell@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16383>
Unlike spirv glsl varyings might not have explicit locations set.
nir_shader_gather_info() was once only called at the end of linking
but these days it even gets called in NIR optimisation loops via
nir_opt_phi_precision.
In the following patches we implement a NIR version of the GLSL
varying linker which means we will have varyings with no location
set when nir_shader_gather_info() gets called the first few times,
and temp values set only for the purpose of removing unmatched
varyings between shaders for some calls after that.
Here rather than asserting we simply abort the io info gathering,
when we hit these values.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15731>
This patch refactors common shader compilation code into a helper function
which will call the corresponding shader translation function according to
the shader IR type.
It also adds a function pointer for getting dummy shader for different
shader stages.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16501>
This patch adds a new structure svga_shader_info which includes
shader info that is accessed outside of the shader translation
code. That's why it cannot be TGSI specific as we will later also
support NIR. This shader info structure, however, is derived
from the TGSI shader info or the NIR shader info.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16501>
This is similar to ir3_shader_get_variant(), but always compiles the
variant from scratch and returns a variant that's owned by the user
rather than the shader. We'll need this because when variants are stored
in the Vulkan pipeline cache they will outlive their shader.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16147>
The shader won't be available for deserialized variants, so we need to
include all the info we need for compiling variants to be in the
variant. Most of the things we dug out of the shader were various bits
from nir_shader_info which we move into ir3_shader_variant.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16147>
Besides transparent border color, the hardware has special support for
white and black borders when clamping to border is enabled, which should
be more efficient.
v2:
- Rename enumerations (Iago)
- Add comment to border_color_variant field (Iago)
- Refactor the sampler variant code selection (Iago)
v3:
- Add comment for case of not clamping to border (Iago).
- Generate only required sampler states (Iago).
v4:
- Use one array element for standard border color set (Iago).
v5:
- Fix variant setting for transparent black borders (Iago).
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16422>
Previously this check was skipped for pools with
VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT unset, but after
96a240e1 we need to check this otherwise we risk overflowing
radv_descriptor_pool::entries into the host memory base
This fixes a crash to desktop when launching Dota 2, which overallocates
descriptor sets and expects an error to allocate another descriptor pool
Fixes: 96a240e176 ("radv: fix memory leak of descriptor set layout")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16490>
Instead of storing them in a linked list, put them in an array
in si_shader_selector. The keys are also stored separatly, to
avoid pointer chasing when searching a variant in si_shader_select_with_key.
This main point here is to simplify the code by storing everything
in the selector instead of splitting the list storage between the selector
and the shaders; this shouldn't affect performance in a meaningful way.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16273>
The original code waiting for the queue to be full before adding more
threads. This makes the thread count grow slowly, especially if the
queue also uses UTIL_QUEUE_INIT_RESIZE_IF_FULL.
This commit changes this behavior: now a new thread is spawned if we're
adding a job to a non-empty queue because this means that the existing
threads fail to process jobs faster than they're queued.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16273>
From the Vulkan spec:
"VUID-VkGraphicsPipelineCreateInfo-PrimitiveId-06264
If the pipeline is being created with pre-rasterization shader
state, it includes a mesh shader and the fragment shader code
reads from an input variable that is decorated with PrimitiveId,
then the mesh shader code must write to a matching output variable,
decorated with PrimitiveId, in all execution paths"
So, if PS uses PrimitiveID, MS must export it (like GS).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16438>
I am not super happy about doing this, as this means we will
potentially be introducing regressions because CI would have papered
over a rare race condition... But the situation is that this is already
happening, and improving the reliability will likely come from tracking
the occurence rate of hangs in an external system.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16472>
This fixes 7 loop piglit tests when loop unrolling is disabled.
The problem is that we were trying to be smart with breaks and
tried to save one predicate instruction for endif in some cases.
This worked for simple loops but brought problems for more complex
shaders, instead just switch to standard VE_PRED_SNEQ_PUSH
ME_PRED_SET_POP combo everywhere.
Shader-db results on RV530 show three hurt glmark tests, however
I believe the simplification should be worth it.
total instructions in shared programs: 123715 -> 123718 (<.01%)
instructions in affected programs: 54 -> 57 (5.56%)
total predicate in shared programs: 118 -> 121 (2.54%)
predicate in affected programs: 6 -> 9 (50.00%)
total temps in shared programs: 17304 -> 17307 (0.02%)
temps in affected programs: 12 -> 15 (25.00%)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6468
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16470>
With this option enabled range of input values for fsin and fcos is
limited to [-2*pi : 2*pi] by calculating the reminder after 2*pi modulo
division. This helps to improve calculation precision for large input
arguments on Intel.
-v2: Add limit_trig_input_range option to prog_key to update shader
cache (Lionel)
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16388>
Currently, the LAVA job submitter fetches the job results from the LAVA
XMLRPC call, but that is not necessary, as the job result is easily
found in the logs. E.g. the bare-metal and poe jobs uses that log to set
the final job status of their runs.
Another reason for the change is that the LAVA signals are not reliable
in some devices with one serial port, causing some troubles in a618
recently. So, if one signal fails to be sent/received, the job will
ultimately fail even when the hwci script has been successful.
Fixes: #6435
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16425>
this existed in order to reset query pools in advance so they
would never overflow. now the pools are reset every time the query
is started, so this behavior is no longer necessary
fixes#6475
Fixes: 57dd05616f ("zink/query: rewrite the query handling code to pass validation.")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16484>
Previously it would fail, and then we'd fall back to the transfer path
for things like readpix. But it would spam logcat w/ bo_mmap fail
messages. Since gralloc allocated buffers for GPU usage are allocate
without _USE_MAPPABLE, let's just assume we can't map imported bo's.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16477>
We no longer emit STATE_BASE_ADDRESS in every batch on XeHP, so the
decoder might not know what the various base addresses are if it's only
looking at a single batch. Fortunately, they also never change, so we
can just emit them once here.
On earlier platforms, initializing them here should be harmless. We'll
emit STATE_BASE_ADDRESS if we change them, which will update these.
Thanks to Iván Briano for catching this.
Fixes: 8831cb38aa ("anv: Stop updating STATE_BASE_ADDRESS on XeHP")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16287>
All we were doing was copying panvk_descriptor structs around which
don't actually contain data that's used by anything interesting. We
need to copy the actual data arround. Annoyingly, that means we need a
descriptor copy function per descriptor type. Woo!
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
The new design is based on the ANV code which I massively cleaned up
some time ago. Each descriptor type has a write function and they have
consistent prototypes. This makes it all much easier to read and figure
out what's going on. It also makes it easier to make changes going
forward because you aren't re-plumbing function arguments if you ever
change the type of data in any given descriptor type. You just change
the write function.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
Instead of storing SSBO pointers in the very limited sysval space, store
them in the UBO we've attached to the descriptor set. This gives us a
virtually unlimited number of SSBOs. Dynamic SSBOs still live in the
sysval space so we can update them as part of vkCmdBindDescriptorSets().
Also, the new code (based on the code in ANV) loads those SSBO addresses
in a way that never chases the deref chain back to the variable so we
should now be able to handle all of variable pointers. The code as
written in this patch is a bit overly generic because it switches on
address modes a bit more than panvk needs but we ended up needing all
that flexibility in ANV so we may as well leave hooks for it in panvk.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
The original intention was to put all the non-dynamic UBOs first
followed by all the dynamic ones. However, we got the calculations
wrong and, once you went above one descriptor set, things start stomping
each other.
Also, the whole strategy is a bit busted. Vulkan pipeline layout
compatability rules say that it's ok to create a pipeline with one
layout and then bind with another so long as the bottom N descriptor set
layouts match and the pipeline uses at most N descriptors. This means
that, while it's safe to have each subsequent set add onto a given pool
of descriptors, if you're going to combine two of those pools, you need
to be careful that the position of descriptors in set N only depends on
the layouts of sets M <= N. The easy way to do this is to interleve
where we do the UBOs for set 0 then dynamic for set 0 then UBOs for set
1 then dynamic for set 1, etc.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
In theory, this may cost us a tiny bit of descriptor space but in
practice, given that the viewport transform is a sysval, we'll always
need it for 3D and given that SSBO pointers live there, we'll basically
always need it for compute. It also makes a lot of things simpler.
We're about to start using the sysval UBO directly in our descriptor set
code and knowing the index up-front is really nice.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
PanVK uses fewer sysvals than the GLES driver, as some data that would
be a data in GLES is instead part of the descriptor set or the pipeline
state in Vulkan. Therefore, it is simpler and more efficient to use a
flat, fixed layout provided by the driver for our sysvals, rather than
the compiler choosing a layout.
This commit switches to a flat sysval layout.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
In 3559efb9bf ("panfrost: Allow passing an explicit UBO index for the
sysval UBO"), an explicit UBO index was added and it was implicitly
assumed that it would be > num_ubos. This was convenient because it
meant 0, the default for designated initializers, implicitly meant
compiler-assigned. However, we're about to move the sysval UBO to 0
which breaks this assumption. Also, we don't want the back-end
compiler to even look at num_ubos since it's meaningless in Vulkan.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
Later in the series, we will map descriptor sets to driver-internal
buffers bound as UBOs. These buffers will contain various internal data,
like buffer and texture sizes. Resource access will be lowered to pull
from this UBO in the shader. To prepare, create a backing buffer when
creating descriptor set and emit a UBO record so we can bind it.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
Compute memory item demotion invokes a device to host transfer unconditionally,
but there are at least two cases where this is not necessary:
1. The item is mapped for discarding with PIPE_MAP_DISCARD_RANGE (e.g.
CL_MAP_WRITE_INVALIDATE_REGION).
2. The item cannot be written to by the device.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16116>
The compute memory pool forced a defragmentation (a left-packing relocation)
of items prior to promoting (adding) items to the tail end of the pool.
This patch instead makes an initial pass over the fragmented pool intent on
promoting items back to where they may have been recently demoted, filling
in the gaps first before conducting the defragmentation (if at all).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16117>
Resources returned by r600_buffer_from_user_memory() are not compatible
with the evergreen compute memory pool, though they're added to it anyway.
This results in a segfault reproducible from Clover when the user passes
CL_MEM_USE_HOST_PTR.
This patch allows user_ptr resources to participate in the compute global
memory pool as intended. The result appears to finally allow for zero-copy
DMA out of userspace for anonymous pages.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16114>
When NGG is active, the GS invocation counter is always incremented, even
if there's no explicit GS.
Implementing the counter manually fixes it:
* in emit_gs_epilogue for the legacy path
* in gfx10_ngg_gs_emit_prologue for the ngg path
This fixes piglit's arb_query_buffer_object-qbo test.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15861>
Statistics only work in non-NGG mode. If screen->use_ngg is true, we can't
know if the draw will actually use NGG or not, so this commit switch
to a shader based implementation of this counter.
To avoid modifying si_query, the shader implementation behaves like the hw
one: it uses the same buffer size and offset.
The emulation path activation in the shader is controlled by vs_state_bit[31].
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15861>
DISABLE_INSTANCE_PACKING needs to be enabled when stats queries are
active to fix incorrect results.
We need to emit this for indexed and non-indexed draws.
Based on PAL's waDisableInstancePacking.
This fixes:
KHR-GL46.pipeline_statistics_query_tests_ARB.functional_primitives_vertices_submitted_and_clipping_input_output_primitives
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15861>
the problem here is that this returns a vec2 instead of a vec5, which
throws all the existing calculations off
given that the shader is (still) expecting a vec2 return from this,
and there's no way to sanely rewrite with nir to be valid for both
sampler types as well as spirv translation, just pad out to a vec2
here and be done with it
Fixes: 73ef54e342 ("zink: handle residency return value from sparse texture instructions")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16456>
Remove the conditional break statements associated with all
terminators that are associated with a fixed iteration count,
except for the one associated with the limiting terminator.
This logic matches similiar functionality that exists in the
old GLSL IR unrolling code.
This change helps a piglit test pass on the r300 driver once
we switch off the old GLSL IR unrolling code.
Shader-db results IRIS (BDW):
total instructions in shared programs: 17538619 -> 17538595 (<.01%)
instructions in affected programs: 216 -> 192 (-11.11%)
helped: 3
HURT: 0
helped stats (abs) min: 7 max: 10 x̄: 8.00 x̃: 7
helped stats (rel) min: 10.00% max: 12.07% x̄: 11.38% x̃: 12.07%
total cycles in shared programs: 858674910 -> 858672810 (<.01%)
cycles in affected programs: 79540 -> 77440 (-2.64%)
helped: 3
HURT: 0
helped stats (abs) min: 620 max: 800 x̄: 700.00 x̃: 680
helped stats (rel) min: 2.45% max: 2.83% x̄: 2.63% x̃: 2.62%
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16399>
Coming in from Vulkan or OpenCL, it's possible for nr_samplers to be
zero if all we ever use is texelFetch(). Annoyingly, samplers and
sampler views are handled by the same function. Fortunately, it will
work if some of the samplers or sampler views are missing so we can just
pass the maximum.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16435>
First, make all key_size functions take nr_samplers and nr_sampler_views
separately so we ensure both get passed in. Second, rework the offset
helpers to take MAX(nr_samplers, nr_sampler_views) so we get the image
param offset correct if nr_samplers < nr_sampler_views. While we're
here, also re-order the size calculations to be in the same order as the
things land in memory.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16435>
For doing performance investigation, I often find it useful to have a "are
we tripping over any of our performance TODOs?" flag, so add it and use it
in a few of the TODOs.
This also greatly cleans up the deqp-vk logs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16316>
Mostly the same as for compute shaders, but with a few extras:
task_ring_offsets:
Same as what ring_offsets is to graphics shaders.
Contains an address that points to a buffer that contains
the ring buffer descriptors.
task_ring_entry:
Index that can be used to address the draw and payload rings.
draw_id:
Same meaning as in graphics shaders.
task_ib_addr/task_ib_stride:
Indirect buffer address and stride from the draw calls.
These are used to emulate the firstTask feature of NV_mesh_shader.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
Task shaders store their output payload to VRAM where mesh
shaders read from. There are two ring buffers:
1. Draw ring: this is where mesh dispatch sizes and
the ready bit are stored.
2. Payload ring: this is where the optional payload
is stored (up to 16K per task workgroup).
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929>
SM5 requires swizzles for 64 bits alu source to be either .xyzw,
.xyxy, .zwxy, or .zwzw. If the swizzles are not in the valid pattern,
move the source according to the specified swizzle to a temporary register
first.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16464>
Because we pull the RT from the variable location and use that to look
up formats, we need a constant RT index. To deal with arrays (possibly
of arrays), we would either need to handle array derefs (we don't today)
or we need to require the variables to be split into one variable per
RT. Given that we have to lower indirect derefs anyway (to get constant
indices), we may as well require the client to split output variables by
calling nir_lower_io_arrays_to_elements_no_indirect().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16309>
This takes the buffer ptrs out of the struct, so they can be memset
separately and optimises the memset to be minimal for them.
Removing them from the struct avoids having to loop to clear them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16393>
Fix defect reported by Coverity Scan.
Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking desc suggests that it may be
null, but it has already been dereferenced on all paths leading to
the check.
Fixes: 2f83dce059 ("radeonsi: don't report R64_*INT as a sampler format because it doesn't work")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16380>
If the secondary command buffer executed used push constants on a
different set of stages than the primary is using, we may end up not
reallocating them for the primary, getting misrender artifacts at best,
or a nice GPU hang at worst.
Fixes the tests from a CTS from the future:
dEQP-VK.dynamic_rendering.random.*
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16439>
Sampler views and samplers may not be the same limit; in fact one is 32
while the other is 128. The sampler_buffers field is tracking sampler
views (yes, naming is confusing) so we should use the right limit.
Fixes: e9c41b3214 ("gallium/u_threaded: add buffer lists - tracking of buffers referenced by tc")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15988>
If two instructions in a single bundle both write to a spilt
destination, then we need to reuse the fill and spill instructions,
otherwise the value will be overwritten.
This and the rest of this set of Midgard bug fixes were found from a
vertex shader in Firefox WebRender that is used when a video is
clipped, for example by setting the border-radius CSS property.
CC: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16382>
Check the bytemask against 0xFFFF rather than 0xF so that the fill is
skipped for a .xyzw write rather than a .x write.
Set the mask on the store to 0xF when doing a read so that all
components are written back.
Fixes: 31d26ebf1b ("pan/mdg: Fill from TLS before spilling non-SSA nodes")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16382>
If a value is written in a vector CSEL but then written again by other
instructions, it still needs full alignment, so set min_alignment
using MAX2 to avoid ever reducing it.
Fixes: 1798f6bfc3 ("pan/midgard: Fix masks/alignment for 64-bit loads")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16382>
The following sequence:
glEnable(GL_DEBUG_OUTPUT_KHR);
glEnable(GL_DEBUG_OUTPUT_SYNCHRONOUS_KHR);
glDebugMessageCallbackKHR(my_callback, NULL);
Will cause the 2nd call to be ignored - but since the callback
function used by _mesa_update_debug_callback is always the
same (_debug_message), this means we'll keep using it, causing
"my_callback" to be called from driver-internal threads.
So instead of skipping the 2nd call, make sure we pass the
information to the driver.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5206
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16300>
Also move the preamble update into this function, as that is only needed
by this code path and not needed for empty submits.
With this change, the goal is to make radv_queue_submit easier to
read and understand. This prepares it for future work when we'll
add the capability to submit to multiple queues at the same time.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16358>
this compacts all buffers in the shader into an array that can be
used in a single descriptor, thus handling the case of indirect indexing
while also turning constant indexing into indirect (with const offsets)
since there's no sane way to distinguish
a "proper" implementation of this would be to skip gl_nir_lower_buffers
and nir_lower_explicit_io altogether and retain the derefs, but that would
require a ton of legwork of other nir passes which only operate on the
explicit io intrinsics
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15906>
nvidia can't do this, but also nothing uses it, so I've gone ahead and
done the bare minimum here to make cts pass
I think the work to do the shader rewrites should be easy, but without a test
case, I see no point in spending the time for it
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16100>
this is enough to fix CTS
affects/fixes:
dEQP-GLES3.functional.polygon_offset.default_render_with_units
dEQP-GLES3.functional.polygon_offset.fixed16_render_with_units
dEQP-GLES3.functional.polygon_offset.fixed24_render_with_units
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16195>
gfx11 passes scratch base address using
SPI_GFX_SCRATCH_BASE_LO and _HI registers. Make the
code changes to support the same.
v5: remove type cast from 64bit to 32bit (Marek Olšák)
v4: combine scratch_memory and scratch_state atom (Marek Olšák)
v3: skip shader relocs for gfx11
v2: make atom for scratch_memory (Indrajit)
Signed-off-by: Yogesh mohan marimuthu <yogesh.mohanmarimuthu@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16328>
In the SVGA create shader functions, the original shader IR could have been
deleted after pipe_shader_state_to_tgsi_tokens() if it is to be converted
from NIR to TGSI. So to avoid accessing the deleted NIR IR,
set the shader state type to TGSI before passing the shader
state to the draw function.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16365>
use multiple contexts and submit in a round robin scheme to make
use of all the available jpeg engines simultaneously. During mjpeg
decode context need not be same across frames as they are discrete
jpeg images.
V2: number of ctx to be equal to number of engines and fix indent (Leo)
V3: decide ctx count in create_decoder, don't add a video param (Boyuan)
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16355>
As a reminder primitive_restart should always be checked before any access to restart_index.
It seems that restart_index is only initialized when primitive_restart is set to a non-zero
value. This patch is equivalent to the previous code but written in a way that the compiler
will test primitive_restart first before trying to read restart_index.
With commit ad864a7c15:
Conditional jump or move depends on uninitialised value(s)
at 0xD33F1EC: panfrost_is_implicit_prim_restart (pan_cmdstream.c:2907)
by 0xD33F1EC: panfrost_emit_primitive (pan_cmdstream.c:3073)
by 0xD33F1EC: panfrost_draw_emit_tiler (pan_cmdstream.c:3440)
by 0xD33F1EC: panfrost_direct_draw (pan_cmdstream.c:3595)
by 0xD340467: panfrost_draw_vbo (pan_cmdstream.c:3889)
by 0xD219119: u_vbuf_draw_vbo (u_vbuf.c:1498)
by 0xD1C81F9: cso_multi_draw (cso_context.c:1644)
by 0xCFBA19B: _mesa_draw_arrays.part.11 (draw.c:1324)
by 0xCFBADA1: _mesa_draw_arrays (draw.c:1295)
by 0xCFBADA1: _mesa_DrawArrays (draw.c:1533)
by 0xD32EB: gl_vao_draw_data (in /usr/local/bin/mpv)
Uninitialised value was created by a stack allocation
at 0xCFBA14E: _mesa_draw_arrays.part.11 (draw.c:1289)
With mesa-22.1.0-rc4:
Conditional jump or move depends on uninitialised value(s)
at 0xD36369C: panfrost_is_implicit_prim_restart (pan_cmdstream.c:2895)
by 0xD36369C: panfrost_draw_emit_tiler (pan_cmdstream.c:3023)
by 0xD36369C: panfrost_direct_draw (pan_cmdstream.c:3215)
by 0xD3649BF: panfrost_draw_vbo (pan_cmdstream.c:3494)
by 0xD23DE7D: u_vbuf_draw_vbo (u_vbuf.c:1498)
by 0xD1ECBD1: cso_multi_draw (cso_context.c:1644)
by 0xCFD60FF: _mesa_draw_arrays.part.11 (draw.c:1324)
by 0xCFD6D11: _mesa_draw_arrays (draw.c:1295)
by 0xCFD6D11: _mesa_DrawArrays (draw.c:1533)
by 0xD32EB: gl_vao_draw_data (in /usr/local/bin/mpv)
Uninitialised value was created by a stack allocation
at 0xCFD60B2: _mesa_draw_arrays.part.11 (draw.c:1289)
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Alyssa Rosenzweig alyssa@collabora.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16389>
Due to an upstream bug, macOS can't link empty static libraries.
There is open Meson bug about this use case [1], though arguably the issue
is macOS's implementation of ar. Of course, the functionality is mostly
useless.
The removal of GLSL-to-TGSI trivialized a static library, causing
linking to fail. This commit garbage collects the useless library.
This fixes the build on macOS:
FAILED: src/mesa/state_tracker/tests/libmesa_st_test_common.a
rm -f src/mesa/state_tracker/tests/libmesa_st_test_common.a && ar csr src/mesa/state_tracker/tests/libmesa_st_test_common.a
ar: no archive members specified
usage: ar -d [-TLsv] archive file ...
ar -m [-TLsv] archive file ...
ar -m [-abiTLsv] position archive file ...
ar -p [-TLsv] archive [file ...]
ar -q [-cTLsv] archive file ...
ar -r [-cuTLsv] archive file ...
ar -r [-abciuTLsv] position archive file ...
ar -t [-TLsv] archive [file ...]
ar -x [-ouTLsv] archive [file ...]
[1] https://github.com/mesonbuild/meson/issues/3735
Fixes: 214c774ba6 ("mesa/st: Remove st_glsl_to_tgsi.")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16385>
Typically we free them when we upload the QPU code from the variant
to the assembly BO in the pipeline, however, if there is an error
during pipeline compilation that may not happen and we would leak
the QPU code from the variants.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16370>
We can output the final NIR form (which we store in the pipeline
stage) and the final QPU (which we can retrive from the assembly BO).
We should be careful not to fetch the shaders from the cache when
VK_PIPELINE_CREATE_CAPTURE_INTERNAL_REPRESENTATIONS_BIT_KHR is present,
since we don't store NIR shader in the pipeline shader data that is
cached, so a cache hit would leave us without the NIR shader. The spec
already contemplates this scenario:
"Enabling this flag must not affect the final compiled pipeline but
may disable pipeline caching or otherwise affect pipeline creation
time."
We also prevent disposing of the pipeline stages the variants when this
flag is requested to ensure this information is available later when
calling vkGetPipelineExecutableInternalRepresentationsKHR.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16370>
This is actually required by Vulkan 1.2 and to expose the extension,
so let's conform to this requirement, we don't really care since
image layouts are not relevant to our current implementation.
Fixes: 1442d77bc5 ('v3dv: trivially implement VK_KHR_separate_depth_stencil_layouts')
Fixes: dEQP-VK.info.device_mandatory_features
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16398>
When moving code into the main block or loop blocks, put the code into
its own :
if(true) { ... }
block so that we avoid break/continue/return issues.
v2: Also take care of the main block with return instructions
v3: Make deletion more obvious with dummy if blocks (Jason)
v4: Fixup assert for loops (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8dfb240b1f ("nir: Add raytracing shader call lowering pass.")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16036>
LLVM 15 ripped out the legacy coroutine passes. This means moving
to the new pass manager is the best option to move forward and is
long overdue.
I've tried to recreate the same set of passes in the new pass mgr
as the old, but I expect some tweaking may be needed to confirm this.
Acked-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16289>
Layering clients, e.g. angle and zink, use wide sets of buffer usage
flags because they don't know what a resource will be used for in the
majority cases, which is on the other hand making it easier for layering
to optimize resource management.
This change adds a super-set usage to the buffer cache entries, that
will mostly ensure no cache-miss for non-sparsed buffer usages. Since
that involves usage bits from extensions, we'll mask out those disabled
ones upon querying but will use the static cache create info for
checking cache hit for code simplicity.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16379>
in a sequence where a driver saves 0 sampler/views before calling
u_blitter, the previous state of having 0 sampler/views bound would
not be restored as expected, resulting in stale sampler/views which
could affect behavior before new sampler/views were bound
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16178>
The NIR path through the frontend is effectively the only one maintained
for a quite a while now. We can see that effect with !15540, where the
TGSI generation path was regressed to assertion fail on real-world
shaders, and nobody noticed until I came along trying to test the
NIR-to-TGSI transition.
We already have a nir_to_tgsi() call for translating NIR representation
for ARB programs into TGSI before handing them off to the driver. This
change makes that path get taken for GLSL programs as well.
This is the minimum change to get all the drivers on NIR from GLSL, to
give a simple commit to bisect too. The dead code removal comes next.
Now every driver benefits from shared NIR optimizations for GLSL, and we
can start retiring GLSL optimizations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8044>
it's been flaking with "2022-05-05 16:29:49.055151: [0m[31mERROR - Failure
getting run results: parsing results: Reading from dEQP: timed out waiting
for fd to be ready (See \"//results/c32.r1.log\")" and a pile of missings
since the brief "whoops, HW CI failed to listen to the test exit code"
regression.
The only ways I know of to hit this case would be:
1) The deqp binary abruptly wedges on its own. This happens with NFS
failures sometimes, but the rest of the run went fine and we never got the
kernel complaining about NFS, so that seems unlikely.
2) The stderr pipe filled up before stdout was completed, and deqp got
wedged trying to output stderr (happens sometimes when you do like
NIR_DEBUG=print in your run).
Both of these seem unlikely, given that we've got a big .qpa file that
made it all the way to writing out test case durations at the end of the
run before abruptly terminating. Why didn't we have at least some of the
test results parsed?
The next deqp-runner release we integrate will solve #2, and cleans up
these error paths a bunch, so I'm hoping we get more information soon.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16350>
Required to avoid tiler heap out-of-memory condition on Valhall on tests
including:
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1200x1200_drawcount_8
This test passes on Bifrost without the fix because varyings are only allocated
from the tiler heap on Valhall.
Minimal perf or memory usage impacted is expected, as even old versions of
panfrost.ko support growable memory.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16330>
Until now we have been enabling binning sync if we found a barrier
involving geometry stages (a bcl barrier), however, if the actual
binning shaders involved with the job don't access any external
buffers or images there is no reason to sync at the binning stage.
In this patch we don't immediately consume the bcl barrier flag from
the command buffer state when we create a new job. Instead, we check
this state when we are about to emit a draw call by checking if the
shaders involved with binning may access external resources, such as
vertex buffers, UBOs, or textures. If none of the draw calls in the
job use binning shaders that access external resources then we never
enable binning sync for the job.
It is possible that a binning shader uses resources that are not
synchronized through a barrier though, so we keep track of the
access masks used with barriers for both buffers and images separately
to better identify if the binning shader is affected by the barrier.
If a serialized job never consumes the bcl barrier flag because none
of its draw calls ever required a bcl sync, then the flag will be
cleared when the job is finished.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16322>
For some days, the CI was bypassing LAVA and bare-metal jobs due to an
issue in the init-stage2.sh script. After the fix the neverball trace on
panfrost-t860 is producing a different image, due to a bugfix in
Mesa itself driver. This commit updates the neverball trace on that
device.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16325>
After a LAVA job submitter rework, the init-stage2.sh was changed to be
compatible with new LAVA job definitions, but the result from the script
represented by `HWCI_TEST_SCRIPT` variable is obfuscated by the `set -e`
command. So when the test script fails, `set` will override the exit
code and the jobs will pass when they should fail.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16325>
Most of the time, this is a non-issue because the WSI back-end closes
them as part of handing them to the window-system and sets fds[*] to -1.
The one exception here was Wayland which was closing them but leaving
fds[*] pointing to bogus file descriptors. Having wsi_destroy_image
close them makes clean-up easier and more reliable.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16333>
LLVM is transitioning to "opaque pointers", and as such deprecates
LLVMBuildGEP, LLVMBuildCall, LLVMBuildLoad, replacing them with
LLVMBuildGEP2, LLVMBuildCall2, LLVMBuildLoad2 respectivelly.
These new functions were added in LLVM 8.0; so for LLVM before 8.0 we
simply forward to the non-opaque-pointer variants.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15893>
We create one XFIXES region per swapchain image. If the QueuePresent
comes in with a list of rectangles, we push them into the region and
pass it to xcb_present_pixmap.
The extension is technically just a hint. We still fall back to the
unhinted "update the whole image" path if the update region has more
than an arbitrary number of rects, or if we're stuck using plain
PutImage instead of ShmPutImage.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16218>
We take a slight liberty here by allowing 0 to mean either MAILBOX or
IMMEDIATE, since Wayland (at least) doesn't have a true IMMEDIATE mode
at least MAILBOX won't throttle to vblank.
This only correctly handles intervals of 0 or 1 at the moment.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15800>
If the window is destroyed from underneath us while we happen to be in
xcb_wait_for_special_event, there's no recovery. The special event will
never match because the XID is no longer valid, and Present doesn't have
an in-band DestroyNotify. We're going to work around this by using the
poll API instead. If we get an event we short-circuit back to the top of
the "wait for available image" loop, so we drain the whole special event
queue before any other logic. Which means if we run out of special
events (and the connection and swapchain are still valid) that we
_don't_ have enough images available, so to hurry along any events that
the X server hasn't flushed out yet we call GetGeometry on the
swapchain's window. As a side effect this verifies that the window is
still alive.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15800>
Valhall introduces hardware-allocated varyings. Instead of allocating varying
descriptors on the CPU with a slot based interface, the driver just tells the
hardware how many bytes to allocate per vertex and loads/stores with byte
offsets. This is much nicer!
However, this requires us to rework our linking code to account for separable
shaders. With separable shaders, we can't rely on driver_location matching
between stages, and unlike on Midgard, we can't resolve the differences with
curated command stream descriptors. However, we *can* rely on slots matching. So
we should "just" determine the byte offsets based on the slot, and then
separable shaders work.
For GLES, it really is that easy.
For desktop GL, it's not -- desktop GL brings unpredictable extra varyings like
COL1 and TEX2. Allocating space for all of these unconditionally would hamper
performance. To cope, we key fragment shaders to the set of non-GLES varyings
written by the linked vertex shader. Then we may define an efficient ABI, where
only apps only pay for what they use.
Fixes various tests in dEQP-GLES31.functional.separate_shader.random.* on
Valhall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16310>
Packed TLS has cache-locality benefits on Valhall, compared to Bifrost's flat
TLS. Valhall does support flat TLS, but requires extra arithmetic in the shader
for correct results. At least until we get to generic pointers (and maybe even
then), we can use packed TLS. So just use packed TLS always for proper spilling.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16314>
The set of blend shaders is closed. They are completely internal. As such, we
know that the registers we reserve for them suffice, and we don't permit
register spilling. Refusing to support spilling in blend shaders simplifies a
number of parts of the compiler. Add a check that we don't try to spill anyway,
which will silently fail.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16314>
BIFROST_MESA_DEBUG=spill now restricts the register file to 1/4 its usual size,
useful for testing register spilling (e.g. running CTS) as well as debugging
spilling on small shaders.
Note blend shaders are exempt, as we don't allow blend shaders to spill.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16314>
Fixes failures on tests like this when the on-disk-cache is enabled:
dEQP-VK.binding_model.descriptor_copy.compute.uniform_buffer_0
We only found them when running full CTS runs. What happens is that we
got a hit from the on-disk shader cache, for several tests using the
same shaders. But some tests seems to be using a uniform buffer, and
others a inline buffer. Right now inline buffers leads to some changes
on the final nir shader, and generated assembly, compared with uniform
buffers. So we got a wrong shader. Fortunately we only got an assert
instead of weird behaviour.
With this commit we include the pipeline layout on the pipeline sha1,
so those two cases would get different sha1. FWIW, this is what other
drivers are already doing.
Surprisingly that didn't cause a problem before.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16313>
If we are calling pipeline_cache_upload_shared_data with
from_disk_cache, that means that we had used the disk-cache to found
that entry. And that should only happens if we didn't find the entry
on the cache. So on that case we can skip to search for it.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16313>
From the GL 4.6 spec: "If the value of any non-ignored component of the
offset vector operand is outside implementation-dependent limits, the
results of the texture lookup are undefined." We shouldn't assertion
fail, then.
GLSL-to-NIR shouldn't be enforcing limits on TG4 offsets, since it doesn't
for non-TG4 tex_src_offset either. Leave it up to the driver to handle
it.
Fixes a crash in a piglit test on nouveau that supplies a negative random
number up to 10,000 as the first coordinate for some reason. Other NIR
drivers lowered TG4 explicit offsets to tex_src_offset, so they weren't
affected.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16261>
This is deterministic, unlike a set. Note we need the extra dereferencing to
keep the macro safe, simple, and standards compliant:
1. Nesting two for-loops would cause break/continue to fail.
2. Declaring variables outside the loop would pollute the namespace.
3. Declaring an anonymous struct is not conformant and doesn't compile in clang.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16279>
On Gfx7 we can only give the sample location for a given multisample
number. This means everytime the multisampling value changes, we have
to re-emit the locations. It's fine because it's also where
(3DSTATE_MULTISAMPLE) the number of samples is stored.
On Gfx8+ though, 3DSTATE_MULTISAMPLE only holds the number of samples
and all the sample locations for all number of samples are located in
3DSTATE_SAMPLE_PATTERN. So to be more effecient there, we need to
track the locations for all sample numbers and compare new values with
the relevant sample count when touching the dynamic state.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16220>
Row stride is defined in terms of header blocks for AFBC. Usually,
afbc.row_stride is used for AFBC images and row_stride for non-AFBC images;
however, the nonsense non-AFBC stride leaked into the UABI. So handle that in
the legacy conversion path and use a unified row stride (equal to
afbc.row_stride for AFBC images and row_stride otherwise).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16201>
Line strides don't make sense for linear images, so use row strides instead in
the API. Then update the layout code accordingly.
Note: we need to preserve the old UABI (bug for bug compatibility), so we still
use legacy strides externally. But now we use row strides internally, which is
better than using both everywhere.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16201>
This function modifies the NIR shader, so when using NIR_DEBUG=print and
nir_viewer all the variants are displayed using the original name.
With this change, we get the original shader (eg: GLSL3), and then each
variant gets its own name (GLSL3-xxxxxxxx) - and is displayed in its
own tab in nir_viewer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16247>
It could happen that at the time of a live-range split,
a phi was not yet placed in the new instruction vector,
and thus, instead of renamed, a new phi was created.
Fixes: dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_i8vec2
Cc: mesa-stable
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16248>
If wsi_configure_native_image() fails, it will call
wsi_destroy_image_info() itself, so let's try to not call it again from
wsi_wl_swapchain_destroy().
Fixes the CTS tests:
dEQP-VK.wsi.wayland.swapchain.simulate_oom.*
Fixes: b626a5be43 ("vulkan/wsi/wayland: Split image creation")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16257>
Ken performed some tests with shader-db to evaluate the effects
```
Across all 145,848 shaders generated, the results were:
Total bytes compacted before: 3,326,224
Total bytes compacted after: 60,963,280
```
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15399>
Apparently we were only keying off the presence of a real stipple pattern
being set, and completely ignoring when the app does glDisable().
Add in the actual enable bit as an additional discriminator to determine
if we should be doing polygon stippling.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16254>
On Bifrost, this is very easy: there's an RSD bit to Y-flip gl_PointCoord. It
should map perfectly to the Gallium bit. With this change, we no longer use
lower_pntc_ytransform on Bifrost, saving a bit of ALU when reading point
coordinates.
On Valhall, this is quite hard: the bit is in the framebuffer descriptor now!
That means it can't be changed in a batch. This is expected to be ok: on GLES
and VK, the origin is controlled only by the framebuffer orientation. It's a
bigger problem on big GL, where GL_POINT_SPRITE_COORD_ORIGIN can be set freely.
To cope, a tri-state data structure is used for the state tracking. This has a
failure case on Valhall: every draw toggling the coord origin. However, the
intention of the ORIGIN state bit is smoothing over coordinate system
differences; it should never /actually/ change once set. Until we see an app
doing something so stupid, I don't think we should worry about.
We need all the Valhall tri-state infrastructure for handling provoking vertices
on big GL anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16173>
Since we don't export the relevant CAP, the state tracker calls
nir_lower_clip_vs for us. However, for some reason we're still responsible for
calling nir_lower_clip_fs. Now that we have sane shader key infrastructure,
let's do so.
Fixes the floor rendering wrong in the title screen of Neverball.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16173>
Rather than clever key compare/initialize code, let's make the shader keys plain
old data. This makes it easier both to extend and to optimize the shader keys.
Keys are compared with a simple memcmp(). I considered a hash table but I don't
think we have enough variants (or large enough keys) to justify the overhead.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16173>
In Vulkan, it's possible to create a pipeline with no fragment shader that's
still expected to rasterize. This is useful for depth/stencil side effects, and
is closely related to the "fragment shader required" optimization we do in the
GLES driver. Refactor the RSD emit code to handle this case.
Fixes dEQP-VK.pipeline.stencil.nocolor.*
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16204>
The "fragment shader required?" computed state is about fragment shader side
effects. There may be no fragment shader required but depth/stencil side effects
meaning that rasterization is nonoptional. What actually gates rasterization is
the rasterizer discard bit. Use that instead.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16204>
Otherwise wide lines break. The alternative approach is to eliminate the points
writes when not drawing points since we do have topology information at compile
time. I'm admittedly stuck in my GL mindset. That's the approach we'll need for
Valhall anyway.
Fixes dEQP-VK.rasterization.interpolation.basic.lines_wide
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16204>
The VAR_TEX definition in ISA.xml only has a field for texture_index,
so trying to read sampler_index will return zero; read from
texture_index instead, and rename other fields for consistency.
The texture and sampler indices must be equal for VAR_TEX to be used,
so either name could be used for the field.
Fixes the wrong textures being used in Thief.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6219
Fixes: eb1479bda2 ("pan/bi: Support message preloading")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16255>
Discrete platforms don't have LLC, but on those, we mmap our buffers
with WC. So we shouldn't need to clflush there.
Anv already had a boolean field on the physical device to know whether
we need to use clflush(), based off the memory heaps available. So use
that instead.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15780>
981bd8cbe2 moved outputs removing handling to NIR, but instead of
applying it only to the last stage before the FS this now applies
it to both the GS and the VS.
This commit fixes this by clearing the kill_outputs field for
the VS when using a ES-GS shader.
Fixes: 981bd8cbe2 ("radeonsi: apply key.ge.opt.kill_{outputs,pointsize,clipdistance} in NIR")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16249>
Layout transitions are not relevant to us, we only care about barriers
that involve a sync point between read/write actions on the image across
GPU jobs.
Image transitions from undefined layout can only happen before the image
is ever used by the GPU, which means they are never relevant to our
implementation.
This improves performance in vkQuake.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16235>
Lifted from ir3. Algorithm is the same; the data structures and interface are
lightly modified to decouple from ir3's IR.
Sequentializing parallel copies after RA is tricky. ir3's implementation works
well enough, so I use that one.
Original implementation by Connor Abbott.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16268>
Rather than using builder magic (implicitly lowered on emit), add actual pseudo
operations (explicitly lowered before encoding). In theory this is slower, I
doubt it matters. This makes the instruction aliases first-class for IR prining
and machine inspection, which will make optimization passes easier to write.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16268>
Lifted from Bifrost. Add some basic optimizer tests (they pass!) to show the
compiler is ready to be unit tested. Given we can't have hardware CI for Asahi
yet -- and dEQP is still pretty janky -- unit testing should prove quite useful.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16268>
This changes the intel_device_info calculation to call an additional
DRM query requesting the geometry topology from the kernel, which may
differ from the result of the current topology query on XeHP+
platforms with compute-only and 3D-only DSSes. This seems more
reliable than the current guesswork done in intel_device_info.c trying
to figure out which DSSes are available for the render CS.
Cc: 22.1 <mesa-stable>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14143>
The glsl-to-tgsi code generation and GLSL IR linker is is going away
(!8044), so we need to make the call on whether to use nir-to-tgsi (See
!15932 and !15541), or switch over to the NIR code generator. The NIR
backend should reduce the compile time regression while providing more
direct control over the IR we receive than going through NTT, while still
providing the optimization that NIR-to-TGSI was bringing us.
nv92 shader-db:
total local in shared programs: 2048 -> 1988 (-2.93%)
local in affected programs: 2048 -> 1988 (-2.93%)
total gpr in shared programs: 688468 -> 724705 (5.26%)
gpr in affected programs: 437159 -> 473396 (8.29%)
total instructions in shared programs: 6115978 -> 5874401 (-3.95%)
instructions in affected programs: 5038041 -> 4796464 (-4.80%)
total loops in shared programs: 1361 -> 835 (-38.65%)
loops in affected programs: 538 -> 12 (-97.77%)
total bytes in shared programs: 42389752 -> 40480416 (-4.50%)
bytes in affected programs: 36311616 -> 34402280 (-5.26%)
LOST: 0
GAINED: 1 (pixmark-piano)
nv120 shader-db:
total local in shared programs: 4416 -> 1988 (-54.98%)
local in affected programs: 4416 -> 1988 (-54.98%)
total gpr in shared programs: 870534 -> 893490 (2.64%)
gpr in affected programs: 564210 -> 587166 (4.07%)
total instructions in shared programs: 6379402 -> 6243210 (-2.13%)
instructions in affected programs: 5430790 -> 5294598 (-2.51%)
total bytes in shared programs: 68184224 -> 66729672 (-2.13%)
bytes in affected programs: 58013544 -> 56558992 (-2.51%)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15949>
nvc0 aligns to 0x10 in setting up its rogram header, but nv50 TLS
allocation expects the incoming value to be aligned already (like TGSI
always did). Avoids regression in
KHR-GL33.shaders.arrays.declaration.dynamic_expression_array_access_* with
the nir backend.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15949>
The nir_move/sink caused instructions to sink interleaved into the output
stores at the end of the shader. nouveau's RA doesn't track liveness of
FS outputs in registers after the export instruction, so they could end up
overwritten. To work around it, after normal NIR move/sink, move the
output stores back to the end of the shader.
Fixes: b1fa2068b8 ("nouveau/nir: Enable nir_opt_move/sink.")
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15949>
The ARB_shader_objects spec says the following:
> The error INVALID_VALUE is generated by any command that takes one or
> more handles as input, and one or more of these handles are not an
> object handle generated by OpenGL.
And a long, long time ago, we used do to just that for
glDeleteObjectARB... Until 9ac9605de1, all the way back in February 2006,
where the error condition was removed without explanation.
Let's restore it, because it should really be there.
This was noticed by running the tests that are in the mesa-demos
repository, that actually tested this condition.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16211>
This option has no meaningful effect any more other than pointlessly
renaming the the library. Let's introduce a new default value called
"unspecified", and complain if it's set to anything else.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16213>
With NGG GS, the hardware can't know the number of generated primitives
and we have to increment it manually from the shader using a plain GDS
atomic operation.
Though this had a serious problem (see this old TODO) if the bound
pipeline was using legacy GS because the query implementation was
relying on NGG GS. Another situation is if we had one draw with NGG GS,
followed by one draw with legacy (or the opposite) the query result
would have been broken.
The solution is to allocate two 64-bit values for storing the begin/end
values if the query pool is supposed to need GDS and accumulate the
result with the number of generated primitives generated by the hw.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15892>
When HW binning is used tile loads/stores could be skipped
if there is no geometry in the tile.
Loads could be skipped when:
- The attachment won't be resolved, otherwise if load is skipped
there would be holes in the resolved attachment;
- There is no vkCmdClearAttachments afterwards since it is likely
a partial clear done via 2d blit (2d blit doesn't produce geometry).
Stores could be skipped when:
- The attachment was not cleared, which may happen by load_op or
vkCmdClearAttachments;
- When store is not a resolve.
I chose to predicate each load/store separately to allow them to be
skipped when only some attachments are cleared or resolved.
Gmem loads are moved into separate cs because whether to emit
CP_COND_REG_EXEC depends on HW binning being enabled and usage of
vkCmdClearAttachments.
CP_COND_REG_EXEC predicate could be changed during draw_cs only
by perf query, in such case the predicate should be re-emitted.
(At the moment it is always re-emitted before stores)
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15974>
The input is an array so moving it to a single temporary value doesn't
seem to make much sense. I also don't see any piglit regressions when
not moving the value to a temporary.
Fixes: bc912bace1
virgl: Add workarounds for virglrenderer input/sv signedness bugs.
v2: remove unused enum for SAMPLEMASK (Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15997>
Casts shouldn't change the bit pattern of the deref and you have to cast
again after you're done with the ALU anyway so we can ignore casts on
ALU sources. This means we can actually start constant folding NULL
checks even if there are annoying casts in the way.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15673>
This controls the whole lowering of "make tex ops with implicit
derivatives on non-implicit-derivative stages be tex ops with an explicit
lod of 0 instead", but it's really hard to describe that in a git commit
summary.
All existing callers get it added except:
- nir_to_tgsi which didn't want it.
- nouveau, which didn't want it (fixes regressions in shadowcube and
shadow2darray with NIR, since the shading languages don't expose txl of
those sampler types and thus it's not supported in HW)
- optional lowering passes in mesa/st (lower_rect, YUV lowering, etc)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>
This will only work if all contexts have been destroyed. If the app
attempts to re-create one context, while another outstanding context
exists and is still in the removed state, then the screen is not
recovered and the new context will fail to create.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15002>
is_pixmap is defined in kopper_allocate_textures() as being (!window && x11),
which is very different from this check, which determines whether the drawable
is a window
so rename it to keep things consistent
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16190>
Force default device if MESA_VK_DEVICE_SELECT_FORCE_DEFAULT_DEVICE
environment variable set. This will not give multiple device
options to app. There are apps that selects gpu to use based on its
own criteria, this patch can force default behaviour for these apps
by giving only one gpu device to select from.
v2: return 0 if no physical device present (Mihai Preda)
v3: document environment variables (Mihai Preda)(Marek Olšák)
Signed-off-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15585>
Rarely the jobs.logs RPC call can return corrupted data, such as
mal-formed YAML data. As this is expected and very rare to occur, let's
retry this RPC call several times to give it a chance to fix itself.
Retrying would not swallow the log lines since we keep track of how many
log lines each job has.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Move exceptions to its own file.
Create MesaCITimeoutError and MesaCIRetryError with specific exception
data for better exception classification.
Avoid the use of `fatal_err` in favor of raising a proper exception.
Make _call_proxy exception handling exhaustive, add missing
ResponseError treatment.
Also, detect JobError during job result parsing. So when a LAVA timeout error
happens, it is probably cause by some boot/network issues with a
specific device, we can retry the same job in other device with the same
device_type.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
During development, we may want to test lava_job_submitter.py locally,
sometimes one can find what one is been looking for before the LAV job
is done. Let's respond to SIGINT signal and cancel the LAVA job, as we
can't follow what is happening anymore via script.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
A normal boot on LAVA device would take less than 1 minute. Sometimes
the depthcharge freezes and LAVA waits until the current timeout (25
minutes) to kick in and fail the job.
With this lower timeout value, the job could fail faster and eventually
LAVA will be able to retry it.
Furthermore, set a default timeout for all actions to fix an undesired
behavior with some LAVA job subactions, such as depthcharge-action.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Make jwt-file argument optional, this means that this submitter could
deal with more generic use cases, such as running it locally, since the
MINIO JWT is a sensitive information, which is not really required to
test if the submitter is working well.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
LAVA can filter which test suite to show the results from, let's list
all testcases possible in the mesa test suite, to be able to divide more
complex jobs into test_cases.
Another advantage is that the test case can vary its name.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Any daemon executed in init-stage2.sh may interfere with LAVA signals,
since any threaded output to console may clutter the signals, which are
based on the log output.
E.g: This job
https://gitlab.freedesktop.org/gallo/mesa/-/jobs/20779120#L2102
has failed because capture-devcoredump.sh was alive and emitting kernel
messages to the console during the LAVA signal handling, mangling the
output and making the LAVA to fail to check the results of the job.
Another problem is that CONFIG_DEBUG_STACK_USAGE Kconfig is enabled.
This causes process exit to dump a `RESULT=[ 246.756067] lava-test-case
(156) used greatest stack depth: ... bytes left` kernel message to the
logs corrupting LAVA signal message. Empirically, it happens one in
every 280 jobs. To solve that, compose the lava-test-case custom script
with a short sleep to give time for kernel to dump the message clearly
and a exit command to keep the return code from init-stage2.sh script.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
The exit code is automatically parsed to fail/pass the job, so this
commit removes the `hwci.*pass|fail` regex and printings.
Add mesa-job-name parameter to get the CI_JOB_NAME easily to serve as
test name.
Besides, there is the treatment for the mesa job naeme, as LAVA does not
like whitespace character inside the test case/suite name, since it
interprets it as a LAVA signal parameter and it can make the job fail
when the script looks for the results from the LAVA RPC.
And the slash character seems to break gitlab log sectioning, so
removing every character after the first whitespace.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
By default, LAVA emit signals as specific lines of message to the
console serial for subsequent parsing. However, this is prone to be
interleaved with the dmesg messages, particularly with debug messages
that can happen just after the process exit, such as the ones set by
CONFIG_STACK_DEBUG_USAGE Kconfig.
Setting the `lava-signal` to `kmsg` will make those special messages to
be dumped to /dev/kmsg, where the each line is printed atomically
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Start to differentiate between the different types of LAVA log message;
the only visible change right now is that we make warnings and errors
bold and red to match bare-metal, but it comes in useful later as we
will use the results markers to watch us step through the different
stages of job execution.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Less free-form passing stuff around, and also makes it easier to
implement log-based following in future.
The new class has:
- job log polling: This allows us to get rid of some more function-local
state; the job now contains where we are, and the timeout etc is
localised within the thing polling it.
- has-started detection into job class
- heartbeat logic to update the job instance state with the start time
when the submitter begins to track the logs from the LAVA device
Besides:
- Split LAVA jobs and Mesa CI policy
- Update unit tests with LAVAJob class
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
We rate-limit LAVA API calls as they are standard polling calls rather
than blocking for changes. However when we sleep after making the calls
rather than before, we can block when we want to exit - e.g. after
getting the final logs, we will still sleep even though we can drop out.
Fix this by moving the calls to before the API calls, rather than after.
This means that the first calls (when we're waiting to be scheduled, or
haven't got our first log lines yet), will be delayed compared to
previously, but that's not going to slow us down as even in the best
case we won't be executing in a device within the first 15 seconds.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Without this being atomically incremented and decremented, I observed
this assert triggering in debug builds:
src/vulkan/wsi/wsi_common_x11.c:x11_present_to_x11_dri3():
assert(chain->sent_image_count <= chain->base.image_count);
I think this was happening since,
src/vulkan/wsi/wsi_common_x11.c:x11_handle_dri3_present_event()
which decrements chain->sent_image_count may be run in a separate
thread.
Fixes: d0bc1ad377 ("vulkan/wsi/x11: add sent image counter")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15908>
This is needed for the VIRTGPU_WAIT ioctl to work.
TODO we could perhaps limit this, since it is not needed for residency,
but only fencing. Ie. we could omit cmdstream, and probably anything
that has FD_BO_NOMAP flag.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16086>
This syncs up with the protocol of what eventually landed in virglrender.
1) Move all static params to capset to avoid having to query host
(reduce synchronous round trips at startup)
2) Use res_id instead of host_handle.. costs extra hashtable lookups in
host during submit, but this lets us (with userspace allocated IOVA)
make bo alloc and import completely async.
3) Require userspace allocated IOVA to simplify the protocol and not
have to deal with GEM_NEW/GEM_INFO potentially being synchronous.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16086>
If supported by host virglrenderer and host kernel, use userspace
allocated GPU virtual addresses. This lets us avoid stalling on
waiting for response from host kernel until we need to know the
host handle (which is usually not until submit time).
Handling the async response from host to get host_handle is done
thru the submit_queue, so that in the submit path (hot) we do not
need any additional synchronization to know that the host_handle
is valid.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16086>
We use nir_assign_io_var_locations() which compacts the varyings and
eliminates any unused input slots. We need to do the same thing when
processing pVertexAttributeDescriptions[] or else we'll end up with
mismatches between the shader and the state setup code.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16183>
Due to both Lavapipe on Windows and Dozen, we need to support MSVC in
the shared Vulkan code. So let's make sure we compile with the
compatibility flags for it.
Techinically speaking, we also need this in the wsi subdir, because we
also compile wsi_common_win32.c with MSVC. But wsi_common_wayland.c
contains void-pointer arithmetic, causing compiler errors if we do.
Fixing that properly is a bit more involved, because Meson doesn't love
passing different compiler arguments per source-file. The alternative is
to remove the void-pointer arithmetic, but that seems a bit pointless as
this code will never be compiled on MSVC.
So, let's leave that one out for now. We can probably do better in the
future, but this gets us a step further.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6386
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16162>
On older GPUs a color tile was always 64 Byte. On new GPUs with
CACHE128B256BPERLINE support the tile size is either 128 Byte or
256 Byte depending on the TS mode. Add a helper to return the
color tile size and use in in places that use hard-coded tile
size values or do their own calculation.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
With access to HALTI5 GPUs with and without DEC400 compression it's
obvious that the previous compression state setup only worked when
DEC400 was present. Properly set up the compression state bits.
This is only the second part of the fix, first part is moving the
compression state to the correct bit location, which has already
happened via the import of new rnndb headers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
We used the number of pipes to determine which state registers to use
for the RS pipe address configuration, as the dual pipe GPUs were the
first one where the new states were used. This isn't correct though,
as now there are single pipe GPUs which also use the new state
addresses.
There actually is a feature flag telling us to use the new RS pipe
address states, use it. As this feature flag is not available on early
GPUs using the new base address (mostly because we don't have HWDB
entries for them), still check for more than a single pipe as an
additional clue to use new states.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
We used the number of pipes to determine which state registers to use
for the PE pipe address configuration, as the dual pipe GPUs were the
first one where those new states were used. Now there are some new
single pipe GPUs where this logic breaks. HALTI0 added the new PE
address states and all GPUs with at least this feature level are using
the new states exclusively, even if they only have a single PE pipe.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
When the divisor is 0, the compiler should generate a different VS
prolog instead of re-using a previous prolog that uses nontrivial
divisors. This is because divisor == 0 and divisor > 1 should use
a different path to guarantee that the index is correctly computed.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16009>
Because for fragment shaders we still use the variables, and
lower_io_to_vector may leave dead variables that duplicate inputs
that are now vectorized, we have to call this pass, because otherwise
we will may hit the assertion
src/gallium/auxiliary/tgsi/tgsi_ureg.c:318:
ureg_DECL_fs_input_centroid_layout:
Assertion `(ureg->input[i].usage_mask & usage_mask) == 0'
This is relevant for
spec@arb_enhanced_layouts@execution@component-layout@*
on r600/ntt
Fixes: a4840e15ab
r600: Use nir-to-tgsi instead of TGSI when the NIR debug opt is disabled
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16130>
There's below AHB VU on the image view:
VUID-VkImageViewCreateInfo-image-02399
If image has an external format, format must be VK_FORMAT_UNDEFINED
This is well hidden and completely missed from the original venus ahb
implementation.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16159>
The GLSL lowering of half float packing involves software conversion
to half-float; instead, use the lowering in NIR.
Both Midgard and Bifrost are already set to lower the instructions to
bit operations, but change mdg_should_scalarize so that the lowerable
split variants of the pack/unpack instructions are generated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16175>
due to desync between the frontend and the driver, the size that the
depth buffer was created with may not match the size of the swapchain if
the window is being resized very quickly, so just go ahead and clobber
the existing depth buffer with a series of very illegal internal object
replacements to make everything match up
do not try at home.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16151>
Rather than using it as a catch-all initialize, use it to fill in derived from
fields from a partially initialized image_layout. This is easier to understand
and, more importantly, easier to unit test.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15991>
Midgard has multiple Surface Descriptor formats selectable in the texture
descriptor. Previously, we have used both the "64-bit surface descriptor" and
the "64-bit surface descriptor with 32-bit line stride and 32-bit layer stride".
A delicate routine tried to guess what stride the hardware will use if we don't
specify it explicitly, and omit the stride if it matches. Unfortunately, that
routine is broken in at least two ways:
* Textures with ASTC must always specify an explicit stride. Failing to do so
(like we were doing) is invalid.
* It applies even for interleaved textures. The comment above the function
saying otherwise is incorrect. (TODO: double check this)
Bifrost onwards always specify the strides explicitly. Let's just do that and
unify the gens. What is lost from doing this? A ludicrously trivial amount of
memory and texture descriptor cache space. 8 bytes per layer*level per texture,
in fact. Compared to the size of the textures being addressed, the memory usage
is trivial. The texture descriptor cache size maybe matters more. But given
Arm's hardware people went this direction for Bifrost and stuck to it, I doubt
it matters much.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15991>
Before we used GenXML, pan_texture mixed layout code with texture descriptor
packing code. For the most part, the layout code is generation-independent; the
pack code is not. We introduced an anti-pattern where the file was compiled N+1
times: N times for each PAN_ARCH value, and an extra time with no PAN_ARCH
value. And then the contents of the file changed completely depending on
PAN_ARCH. This is a pretty weird construction.
Let's instead split off the layout file from the descriptor file, compile the
layout file once, and compile the descriptor file per-gen.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15991>
Most of the time when the logging code is invoked, it means we're
already in an edge case. It should be as robust as possible, otherwise
we risk making hard to debug things even harder. To that end, instead
of blowing up if passed a NULL object on the list, handle it as
gracefully as we can.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16107>
Instead of being globbed into the RSD, Valhall uses minimal shader program
descriptors. For IDVS, we need separate descriptors for position and varying
shaders. It's actually worse -- we need separate descriptors for drawing points
and drawing lines/triangles in order to skip over the gl_PointSize write. Adapt
prepare_shader to upload all these descriptors.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16035>
Either as uniform remap table entries on Bifrost, or as simple buffer
descriptors on Valhall. The underlying hardware is different (and there are
compiler changes for load_ubo handling), but the high level UBO upload logic
does not have to care about that.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16035>
On Valhall, the fragment shader differs based on whether IDVS or the legacy
geometry flow is used be. In particular, varyings are accessed differently.
We use the legacy geometry flow for blitting on all GPUs, so indicate this in
the shader inputs.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16035>
Required to query texture features on Valhall. It's technically the same as
previous Malis (except for narrow ASTC), but conceptually it's different as
plane descriptors have superseded indexed pixel formats for block compressed
textures.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16035>
My understanding of the signal masks is that they control what stages
must complete before the semaphore is signaled. Using 0 theoretically
means the semaphore could be signaled immediately without waiting on
anything. Use ~0 instead to say it depends on everything.
Fixes: 97f0a4494b ("vulkan: implement legacy entrypoints on top of VK_KHR_synchronization2")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16145>
Without this we might choose 8 or 16 width, while the app assumes 32.
With subgroup operations it may cause wrong calculations and thus bugs.
Examples of such games are Aperture Desk Job and DOOM Eternal.
v2: Make it a driconf option instead of applying unconditionally, move
from brw_required_dispatch_width to brw_compile_cs
v3: Rename allow_assuming_full_subgroups -> assume_full_subgroups.
Include assume_full_subgroups value in anv_pipeline_hash_compute().
v4: Move actual workaround code from brw_fs.c -> anv_pipeline.c.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6171
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15708>
SVGAv3 changes the PCI id due to differences in how PCI configuration
is handled - removal of VRAM and FIFO PCI resources, switch to MMIO
registers and MSI/MSI-X IRQ support but the 3D commands remain largely
the same.
This enables 3D/graphics acceleration support on SVGAv3.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16061>
SVGA device always supports direct maps which are preferable in all cases
because they avoid temporary surfaces and extra transfers. Furthermore
DMA transfers on devices with GB objects have undefined timing semantics.
Also the DMA transfers can not work on SVGAv3 because the device lacks
VRAM to be able to perform them.
Fix the last paths still using DMA transfers to make sure they're never
used on GB enabled configs. This fixes gnome-shell startup on SVGAv3.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Michael Banack <banackm@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16061>
Flushing the command queue before mapping a resource is not enough
to guaruantee that the mapped content is not stale. We have to finish
to make sure that the gb readback actually updated the guest surface.
This fixes races in direct maps (map reads raced with gb readbacks)
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16061>
svga used to use vmx backdoor directly to send logs to the host.
This functionality has been implemented in vmwgfx 2.17, but
to make sure we still work with old kernels the functionality
to use the backdoor directly has been kept.
There's no reason to port that code to arm since vmwgfx
implements it and arm64 (or other new platforms) would
depend on vmwgfx versions a lot newer than 2.17, so everywhere
but on x86/x64 it's fine to assume vmwgfx always support the host
logging ioctls.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16061>
Models a double-ended queue of elements from an a list. Based on NIR's worklist
data structure. This is useful in most backend compilers for data flow analysis.
Using this data structure has several advantages for backends:
* Simplicity, avoids open-coding a worklist data structure.
* Performance, the data structure is lighter weight than e.g sets
* Correctness, e.g. sets are nondeterministic and can cause random bugs.
Using a worklist approach at all is good for performance of liveness analysis
to avoid performing excess walks over the IR.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16046>
Change ASSERT_EQ to EXPECT_EQ to avoid aborting before freeing memory.
Fix defects reported by Coverity Scan.
Resource leak (RESOURCE_LEAK)
leaked_storage: Variable tiled going out of scope leaks the storage it points to.
leaked_storage: Variable linear going out of scope leaks the storage it points to.
leaked_storage: Variable ref going out of scope leaks the storage it points to.
Fixes: bb6c14a697 ("panfrost: Unit test u-interleaved tiling routines")
Suggested-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16127>
Implement as f2f32(f2f16(x)) with the conversions in flush-to-zero mode.
Accessing flush-to-zero mode on Bifrost is nontrivial: it is specified
per-clause, rather than per-instruction. I've opted to pipe support for ftz
clauses through the scheduler. This solution has two nice properties:
* It uses the native hardware for flushing subnormals, avoiding extra lowering.
* It's "smart" about scheduling around FTZ requirements, meaning we get good
code generated even for a shader that e.g. quantizes a vector.
With an unrelated scheduler fix, the *V2F32_TO_V2F16/+F16_TO_F32 operation fits
in a single tuple, minimizing the overhead of the special FTZ clause.
We'll have to do something a bit different for Valhall (FLUSH.f32), but we'll
worry about when we actually have PanVK brought up on Valhall.
Fixes dEQP-VK.spirv_assembly.instruction.compute.opquantize.*
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16123>
alignof(void) is a non-standard GCC extension, and it doesn't compile on
MSVC. But since the Windows CI has been disabled due to stability
issues, a breakage snuk in nevertheless.
Since alignof(char) works the same as alignof(void), let's pass char
instead of void here. That hides the GCC weirdness without doing any
functional changes.
Fixes: 591da98779 ("vulkan: Add a common VkPipelineCache implementation")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16134>
Previously, if the shader was already cached, the pipe->link_shader hook
wouldn't be called, and the gallium driver wouldn't know that shaders
were being linked.
This helps VirGL, because sometimes the guest shader cache can be hit,
while the host shader cache would be missed. VirGL uses this hook to
make the host immediately link shaders, instead of lazily linking them
when a draw call happens, which can degrade performance.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15927>
With mutable descriptor types, we can end up in a situation where a
binding can be, for instance, both a UBO and an acceleration
structure.
While we can promote the UBO to a binding table entry and the shader
can use it, this isn't true of acceleration structures that have no
surface state. In that case just skip the entry. The shader is already
compiled to use the descriptor entry.
In the non mutable case, the entry will not be created by
anv_nir_apply_pipeline_layout.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 63e91148b7 ("anv: Enable VK_VALVE_mutable_descriptor_type")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15969>
Variables that are only used for assertions are considered unused in release
builds. Don't treat this as an error, since we build with -Werror even for
release in CI. This causes reasonable code to build and pass tests locally (and
therefore to be queued for merge by reasonable developers), but later fail in CI
due to a variable used only as an assertion. This pattern is common enough we
have an ASSERTED macro to workaround the behaviour, but failing a CI run to
have the author go back and add in the ASSERTED and re-queue later is a recipe
for frustration, wasted time, and wasted CI bandwidth.
Disable this behaviour to reduce CI friction.
In my view, sprinkling in ASSERTED clutters the code, rather than helps; I find
CI's insistence on doing so actively counterproductive. Developers are free to
continue doing so after this change. But this way CI won't fail merge requests
over it. After all, CI enforces policy, and we shouldn't have "mark variables
only used for assertions as ASSERTED" as policy. Let's pick our battles wisely
and improve CI's signal-to-noise ratio.
As an added benefit, this eliminates a class of defects where ASSERTED is used
incorrectly, e.g:
c91e3c6a42 ("util: Should not use ASSERTED in util_thread_get_time_nano")
3e22fc27af ("zink: remove incorrect ASSERTED macro")
0d08ce287b ("pan/bi: Remove dated ASSERTED properties")
Note that actual unused variables will be caught by debug builds. It is expected
that developers do debug builds locally before ramming code through CI, so that
should be caught.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15582>
This may be needed by ACO, but it doesn't do anything for LLVM yet other
than making the initial LLVM IR smaller.
It will be needed by a future commit, which rewrites ac_optimize_vs_outputs
in NIR, which relies on NIR matching the shader key.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14414>
Instead of having two different helpers, delete the pipeline_cache ones.
Also, instead of manually handling the cache == NULL case in every
vkCreateFooPipelines call, handle it inside the helpers. This means
that BLORP can use them too by passing cache=NULL.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13184>
This is partly copied+pasted from ANV but is mostly new code with lots
of reference counting bugs fixed (I hope!). The new cache caches
"object" which derive from a base vk_pipeline_class_object struct. It
uses a kernel-style "ops" interface for virtual methods on these objects
to allow for easy destruction (when the reference count hits zero) as
well as serialization an deserialization interfaces. This should allow
drivers to cache basically whatever they want without having to think
too hard about the details.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13184>
Unfortunately it's not as simple as passing the indirect draw count
buffer to ExecuteIndirect. The compute job that populate the execute
buffer also needs to know the number of entries that need to be
patched. Instead of transitioning the indirect count buffer from
GENERIC_READ to INDIRECT_ARGUMENT we just keep at as a read-only
resource and copy the draw_count value to the exec buffer in the
compute job.
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15914>
The key is created on stack, so as soon as the function returns this key
is lost, so the inserted key in the hashtable is invalid.
Rather, insert a duplicated version on heap.
This fixes a stack-buffer-overflow when running some Vulkan CTS tests.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16083>
The Iris code that deals with implicit tracking is protected by
bufmgr->bo_deps_lock. Before this patch, we hold this lock during
update_batch_syncobjs() but don't keep it held until we actually
submit the batch in the execbuf ioctl. This can lead to the following
race condition:
- Context C1 generates a batch B1 that signals syncobj S1.
- Context C2 generates a batch B2 that depends on something that B1
from C1 is using, so we mark B2 as having to wait syncobj S1.
- C2 calls submit_batch() before C1 does it.
- The Kernel detects it was told to wait on syncobj S1 that was
never even submitted, so it returns EINVAL to the execbuf ioctl.
- We run abort() at the end of _iris_batch_flush().
- If DEBUG is defined, we also print:
iris: Failed to submit batchbuffer: Invalid argument
I couldn't figure out a way to reproduce this issue with real
workloads, but I was able to write a small reproducer to trigger this.
Basically it's a little GL program that has lots of contexts running
in different threads submitting compute shaders that keep using the
same SSBOs. I'll submit this as a piglit test. Edit: Tapani found a
dEQP test case which fails intermintently without this fix, so I'm not
sure a new Piglit is worth it now.
The solution itself is quite simple: just keep bo_deps_lock held all
the way from update_batch_syncobjs() until ioctl(). In order to make
that easier we just call update_batch_syncobjs() a little later. We
have to drop the lock as soon as the ioctl returns because removing
the references on the buffers would trigger other functions to try to
grab the lock again, leading to deadlocks.
Thanks to Kenneth Graunke for pointing out this issue.
This has also been confirmed to fix a dEQP test that was giving
intermittent failures:
dEQP-EGL.functional.sharing.gles2.multithread.random.images.copyteximage2d.12
v2: Move decode_batch() out, just to be safe (Jason).
v3: Do it all after assembling validation_list (Ken).
Cc: mesa-stable
Fixes: 89a34cb845 ("iris: switch to explicit busy tracking")
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14964>
With nir-to-tgsi, virgl saw a case where a previous declaration of array
.x and scalar .y (turning into an array with .xy) ended up being a
declaration of scalar .x and array .y (turning into a scalar with .xy).
Make sure we extend the declared array length as well.
One might think that the fix would be to union the .first/.last between
the two declarations being merged, but note that ureg_DECL_output() passes
in the current nr_output_regs as the index, so the .last would end up
getting extended for those callers (such as nir_to_tgsi fs outputs) every
time you merged.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13295>
We can't declare input arrays because mesa/st lowers NIR VS output
declarations to elements no matter what, and virgl has depended on
matching array sizes of declarations between producers and consumers. So,
we have to lower it away (which is fine because hardware drivers will
generally be lowering anyway).
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13295>
NTT can't convert local 64 variables of type vec3 or vec4, therefore,
they need to be split into vec2 + double or vec2 + vec2.
At the same time deal splitting the phi nodes.
The pass goes into the global namespace because it is also useful
for r600.
v2: only lower function_temps (Emma) and handle array of arrays (Jason)
v3: - remove bool parameter in function to merge
vec3 and vec4
- simplify load/store_deref_(array|var)
- use a pointer hash table
- simplify PHI splitting (all Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15945>
Vangogh is the codename of the custom APU found in the Steam Deck.
As it only has 4 Zen2 cores and a 15W TDP, don't expect fast runs
until we get more of them in CI :)
Just like the Renoir and one of the NAVI10 runners, this VanGogh
runner is hosted in my CI farm, behind a couple of UPSes.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Charlie Turner <cturner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15555>
The `IMAGE_UNDER_TEST` variable set in `.b2c-test` got broken with
the merge of 7d474c1 (ci: Move most stuff out of root .gitlab-ci.yml).
During the shuffling, the `MESA_BASE_TAG` and `MESA_IMAGE_TAG`
variables were dropped, leading to `IMAGE_UNDER_TEST` being an
unexisting container.
To make this issue less likely to happen in the future, this patch
drops the code duplication that led to `IMAGE_UNDER_TEST` to be
the same as `MESA_IMAGE` and instead re-uses .use-debian/x86_test-vk
to generate `MESA_IMAGE`, which we then use verbatim in
`IMAGE_UNDER_TEST`.
The renaming is `MESA_IMAGE` into `IMAGE_UNDER_TEST` there to make the
distinction clear between the image run by gitlab-runner (what is
usually called `MESA_IMAGE` but we instead hardcode to valve-infra's
trigger container), and the image we are running on the test machines.
Fixes: 7d474c1 (ci: Move most stuff out of root .gitlab-ci.yml)
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Charlie Turner <cturner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15555>
src/gallium/auxiliary/tgsi/tgsi_scan.c:287: scan_src_operand: Assertion `info->sampler_targets[index] == target' failed.
assert was being triggered by
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_multisampled_to_singlesampled_blit
using the stencil fallback with zink.
Fixes: f05dfddeb1 ("u_blitter: fix stencil blit fallback for crocus.")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16069>
When setting the dst framebuffer width height, it might be silly
to constrain this beyond the dst resource, but at least constrain
it correctly to take account of x/y offsets.
This fixes some uses of this as a fallback for zink with
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_stencil_blit
Fixes: b4c07a8a87 ("gallium/util: allow scaling blits for stencil-fallback")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16069>
Split the write desc helpers in two halves, one taking a descriptor
offset directly, and the other one taking a descriptor set pointer.
This will allow us to pre-calculate descriptor offsets when creating
a descriptor_update template and speed up a bit the write step in that
case.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15691>
Let's be consistent with other helpers taking a dzn_descriptor_set_ptr
object and prefix all such functions with dzn_descriptor_set_ptr_.
We also rename dzn_descriptor_set_ptr_get_desc_vk_type() into
dzn_descriptor_set_ptr_get_vk_type() to shorten it a bit.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15691>
GL spec states that the stride for indirect multidraws:
* cannot be negative
* can be zero
* must be a multiple of 4
some drivers can't support strides which are not a multiple of the
size of the indirect struct being used, however, so rewrite those to
direct draws
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15963>
Towards the renderer, venus better uses VK_EXT_image_drm_format_modifier
to force linear with tiling modifier and mod_linear. Doing so won't make
any difference on the mesa implementations we care about given we have
required VK_EXT_image_drm_format_modifier for wsi support.
A lucky side effect of this is to allow common wsi to work with host
implementations not supporting dma_buf export.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15993>
this was an attempt to minimize the number of xfb barriers being emitted,
but really xfb barriers need to always be emitted in order for xfb to work
cc: mesa-stable
fixes (nv):
KHR-GL46.texture_view.reference_counting
KHR-GL46.transform_feedback_overflow_query_ARB.multiple-streams-multiple-buffers-per-stream
KHR-GL46.transform_feedback_overflow_query_ARB.multiple-streams-one-buffer-per-stream
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16065>
this was well-documented, but ultimately wrong: the synchronization
being used was for binding streamout buffers (not counter buffers) as
vertex buffers, which was already handled just fine in the normal
vertex buffer binding
drawing from streamout ONLY uses the counter buffer, which means
the counter buffer needs to be synchronized for reading
cc: mesa-stable
fixes (nv):
KHR-GL46.transform_feedback.draw_xfb_feedbackk_test
KHR-GL46.transform_feedback.draw_xfb_instanced_test
KHR-GL46.transform_feedback.draw_xfb_stream_instanced_test
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16065>
I had some workarounds in ALU op emits trying to fix up when we were asked
to store to unsupported channels when the ALU op had 64bit srcs (so only
vec2 supported) but a 32-bit dest with a >vec2 writemask.
Those workarounds had some bugs breaking 64-bit uniform initializer tests
on virgl, and also set up too wide of a writemask such that they triggered
assertion failures on nvc0. We can avoid the need for those workarounds
at emit time by just having nir_lower_vec_to_movs not generate unsupported
writemasks in the first place.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15934>
Marge jobs are failing at their 1 hour timeout regularly because windows
CI lacks capacity. In the job I looked at, this test took 18 minutes,
which is surely contributing to the load. Cut it down to get us some hope
of getting MRs through that run windows jobs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16062>
There's no hardware instructions for them until then. These chips don't
expose the extension provinding the GLSL builtins for operations like
bfrev, but NIR can recognize the construct and optimize it to
bitfield_reverse, which pre-nvc0 would then fail to codegen. Prevents a
regression when moving to nir-to-tgsi. Other lower_bitfield flags are set
as well for when someone comes along and adds optimizations for them too.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
Also adjust the lowering pass to handle wide SSBO loads that we now emit
for the nir case.
This improves generated code quality since memoryopt can't
merge SSBO loads that end up predicated on a bounds check.
This also happens to fix a few test cases, only because the simpler generated
IR is less likely to trigger other compiler bugs. Eg on kepler with
NV50_PROG_USE_NIR=1, this fixes
arb_gpu_shader_fp64-fs-non-uniform-control-flow-ubo
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
For !8044 I'm working on getting all drivers to accept NIR. The NIR
compiler in the driver is apparently not quite ready, so use NIR-to-TGSI
instead. This is a net win in testcases working on my RV770 and Turks
cards (especially in some important piglit tests involving YUV dma-buf
decode), though it's not regression-free.
shader-db (R600):
total dw in shared programs: 8553412 -> 8358918 (-2.27%)
dw in affected programs: 7476702 -> 7282208 (-2.60%)
total gprs in shared programs: 217286 -> 213217 (-1.87%)
gprs in affected programs: 72747 -> 68678 (-5.59%)
total loops in shared programs: 398 -> 330 (-17.09%)
loops in affected programs: 68 -> 0
total cf in shared programs: 558835 -> 332768 (-40.45%)
cf in affected programs: 420475 -> 194408 (-53.76%)
shader-db (Turks):
total dw in shared programs: 14104598 -> 13556782 (-3.88%)
dw in affected programs: 12161972 -> 11614156 (-4.50%)
total gprs in shared programs: 321068 -> 313690 (-2.30%)
gprs in affected programs: 114899 -> 107521 (-6.42%)
total loops in shared programs: 736 -> 651 (-11.55%)
loops in affected programs: 111 -> 26 (-76.58%)
total cf in shared programs: 925771 -> 581226 (-37.22%)
cf in affected programs: 678600 -> 334055 (-50.77%)
total stack in shared programs: 27853 -> 27855 (<.01%)
stack in affected programs: 5 -> 7 (40.00%)
glmark2 terrain: 0.137649% +/- 0.0511938% (n=6)
glmark2 jellyfish: no change (n=8)
unigine valley (extreme) 5.36 -> 5.45 (n=1 it takes so long to run)
unigine heaven (basic) 16.13 -> 16.15 (n=1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14319>
KHR-GL33.shaders.indexing.tmp_array.vertexid regressed with the switch to
NIR-to-TGSI because the shader got optimized enough to emit a read just
after writing to the array (the kind of situation where a non-rel write
would have been followed by a PV/PS read). The R600 and EG docs say you
always need to do this, but apparently some hardware gives you the right
answer anyway so we don't flag it on all of them.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14319>
We are underprovisioned for Windows, almost certainly not running wide
enough on the insufficient number of slots we do have, and there are
also indications that the machine itself is having physical issues.
Disable it until it's fixed.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16055>
util_cpu_detect is an anti-pattern: it relies on callers high up in the call
chain initializing a local implementation detail. As a real example, I added:
...a Mali compiler unit test
...that called bi_imm_f16() to construct an FP16 immediate
...that calls _mesa_float_to_half internally
...that calls util_get_cpu_caps internally, but only on x86_64!
...that relies on util_cpu_detect having been called before.
As a consequence, this unit test:
...crashes on x86_64 with USE_X86_64_ASM set
...passes on every other architecture
...works on my local arm64 workstation and on my test board
...failed CI which runs on x86_64
...needed to have a random util_cpu_detect() call sprinkled in.
This is a bad design decision. It pollutes the tree with magic, it causes
mysterious CI failures especially for non-x86_64 developers, and it is not
justified by a micro-optimization.
Instead, let's call util_cpu_detect directly from util_get_cpu_caps, avoiding
the footgun where it fails to be called. This cleans up Mesa's design,
simplifies the tree, and avoids a class of a (possibly platform-specific)
failures. To mitigate the added overhead, wrap it all in a (fast) atomic
load check and declare the whole thing as ATTRIBUTE_CONST so the
compiler will CSE calls to util_cpu_detect.
Co-authored-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15580>
NIR doesn't have that knob currently, so we end up throwing errors about
it being ignored.
This should fix cases of "tgsi_to_nir: unhandled TGSI property 23 = 1",
and presumably do better at DX9 muls on nv50 and r600.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14883>
Otherwise, we get an ishl that the HW can't support, and a ushr if the NIR
ends up being lowered to ubo_vec4, which may not get constant-folded if
the offset was non-constant.
This matches what mesa/st uses for this arg to uniform lowering.
Fixes: #5971
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14883>
On virglrenderer it is of interest to not re-use temporaries when we
want to handle precise, invariant, and highp/mediump with better
possibility for optimization.
v2: Force optimized RA if the number of registers is too large
(Emma: only 16 bit signed int are reserved for register indices)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16051>
Found in virgl, where a glslparsertest accidentally gets its inputs
lowered to undefs, and 64-bit undefs don't get split by the normal
alu/intrinsic splitter (and would be hard to split because other passes
would see reconstruction of the vec4 from undefs and turn it back into
vec3/vec4 undef).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16043>
Getting opengl32*.def consistence with Windows SDK.
Getting osmesa.mingw.def's gl* functions consistence with Windows SDK.
stw_* functions are cdecl, not stdcall, so there is no need mangling the symbol.
Fixes egl.def for x86
d3d10sw: Move the place of d3d10_sw.def to d3d10_sw.def.in
Fixes vulkan_lvp.def for x86
Fixes#5552
Remove stdcall-fixup
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14041>
gfxbench sets these between the gbuffer subpass and the following ones.
They should be no-ops as subpass dependencies. gfxbench vk-5-debug perf
12.8 -> 14.6 fps thanks to getting gmem on the gbuffer rendering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15982>
gfxbench vk-5-normal has a shader that sampels into a texels[] array at
the top, then in a loop calls a GLSL function passing texels[] in by
value. This resulted in a copy to a temp inside the loop, which got
lowered to scratch stores since it was pretty big.
By doing find_array_copies(), we notice that it's equivalent to
copy_deref, then get to copy-propagate from the array at the top. Then we
only have to set up the scratch array outside of the loop and load_scratch
from it in the called function inside the loop. This also causes there to
be less spilling, stps 1144 -> 354 and ldps 826->36.
However, it doesn't seem to change performance on the test. So, while
this seems to be an improvement for the shader, and we could maybe even do
better by rematerializing the txl samples inside the loop instead of
storing the texture fetches to scratch in the first place, it doesn't
currently seem worth pursuing more optimization of this shader.
No change on freedreno shader-db.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15982>
This was useful for comparing image allocations between gfxbench
gl_5_normal and vk_5_normal to see if rendering was generally equivalent
(formats, MSAA, UBWC choices, and notably gfxbench vk was choosing DXT5
instead of ASTC on non-android builds!)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15982>
Refactored tu6_calculate_lrz_state and added comments.
1) If there is no depth write we could keep LRZ valid with any
compare op, we just have to temporary disable LRZ for incompatible
ops in such case.
2) Found that VK_COMPARE_OP_EQUAL is not compatible with LRZ,
and since it doesn't change LRZ buffer - LRZ could be just
temporary disabled. This fixes rendering of grass/trees in
PUBG mobile on angle.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6127
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16014>
The tgsi path already marked all aliasing loads of atomic counters with
CACHE_CG, so we don't need to emit a cctl. This patch uses the cache
flag on the atomic to model whether the L1 cache needs the stale
values to be flushed or not.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14386>
Using the vulkan-helpers from C++ code has turned out to have a lot of
friction, because no other driver uses C++ for this.
So let's bite the bullet and call the D3D12 C-API instead. The C-API
wasn't really around when we started out, but it's there now.
This is still far from ideal; we should really create some wrapping
macros to generate the extremely verbose COM calls.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15816>
When per-sample shading is forced and all input variables have a flat
interpolation, DXIL validation detects a mismatch between the
SampleFrequency property and the fact that no variables are per-sample
and SV_SampleIndex is never read. When that happens, add a dummy
SV_SampleIndex.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15916>
On swizzled copies canonical formats are used to reduce the formats to a
simpler subset.
Nevertheless, it is possible that some of the canonical formats defined
in Gallium are actually not supported by the drivers themselves.
This provides a driver-defined hook that can be used to provide an
alternative canonical format in case the canonical one defined by
Gallium is not supported by the driver.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15693>
When trying to perform a TLB-based blit, we need to create a surface out
of the src and dst resources. But instead of using the same formats as
the resources, we need to use the format that is passed through
pipe_blit_info.
This was making some cases to use the render-based blit instead of the
TLB-based blit, which is more performant.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15693>
The BITFIELD_MASK() macro is intended for using with actual bitfields,
not with nir_component_mask_t. This means we do some extra work to
handle values that are invalid for nir_component_mask_t in the first
place.
This eliminates some warnings on Clang, where the compiler complains
about casting UINT32_MAX to UINT16_MAX.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15547>
VkObjectType and VkDebugReportObjectTypeEXT has the same enum-values.
Why the Vulkan WG thought this was a good idea, beats me. But it's what
we have to live with now.
Anyway, instead of having a statement that implicitly casts two
different values from the former to the latter, let's fully relsove the
type as the former, and cast the value when using it instead.
Fixes: 41318a5819 ("vulkan: Use vk_object_base::type for debug_report")
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15547>
the pipe cap is used for gating wideline support, so this will always
be 1.0 when not supported
furthermore, the previous code wasn't accurately checking line width
for tess shaders, breaking tests
cc: mesa-stable
fixes (nv):
KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15960>
if a rendertarget-specified image can't be a rendertarget or a blit dst
then it can't be used for the designated functionality and must be rejected
cc: mesa-stable
fixes hangs on various nv driver versions:
dEQP-GLES2.functional.texture.mipmap.2d.generate.rgba5551_fastest
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15960>
This is used to determine the geometry shader info on GFX9, and it
looks like it was broken for topologies that use adjacency.
This is also used to remove PSIZ from shaders that don't need it.
Found by inspection.
fossils-db (Polaris10):
Totals from 140 (0.10% of 135960) affected shaders:
SGPRs: 10448 -> 9696 (-7.20%)
VGPRs: 4376 -> 4264 (-2.56%)
CodeSize: 164316 -> 161028 (-2.00%)
Instrs: 26449 -> 25767 (-2.58%)
Latency: 184448 -> 180468 (-2.16%)
InvThroughput: 80772 -> 79092 (-2.08%)
VClause: 337 -> 328 (-2.67%); split: -2.97%, +0.30%
SClause: 859 -> 813 (-5.36%); split: -5.70%, +0.35%
Copies: 1027 -> 790 (-23.08%)
PreSGPRs: 2751 -> 2331 (-15.27%)
PreVGPRs: 3887 -> 3836 (-1.31%)
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15948>
It's quite likely that the source of the f2i32 was already an integer, in
which case we can skip the ftrunc (particularly useful on the int-to-float
class of hardware that's unlikely to just have a native trunc opcode!).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15870>
Just like the other make-the-float-an-integer opcodes. Noticed in a
gallium nine shader run through TGSI-to-NIR, where the array index had
been floored by the user, but got implicitly rounded by DX9 array
indexing.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15870>
When we put NIR in the compiler stack for r300, indirect addressing broke
for gallium nine. DX's array indirects round the float value, so the DX
shader gets mapped to a TGSI "ARR ADDR[0] src.x" instruction. Translating
that to NIR maps to r0[f2i32(fround(src.x))]. While we might hope that in
translation back using nir-to-tgsi after optimization we would recognize
the construct and emit ARR again, that's going to be error prone (think
"what if src.x is in a NIR register?") so we need a fallback plan. r300
will be able to handle this lowering, so get it in place first to fix the
regression.
Fixes: #6297
Fixes: 7d2ea9b0ed ("r300: Request NIR shaders from mesa/st and use NIR-to-TGSI.")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15870>
Commit af1ee8e010 dropped support to
wl_drm, as we thought that most compositors from active projects were
already supporting zwp_linux_dmabuf_v1.
But that's not true, so revert this commit in order to give these
projects a longer transition period.
Note that we didn't add back the support to GEM name API, and that was
on purpose.
Signed-off-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15822>
Document how to build and use Panfrost's drm-shim implementation. I hope by
documenting this process, other Mesa developers are better able to test
Panfrost. In particular, this allows developers without Mali hardware to run
shader-db for any Mali target, which may be useful for debugging regressions
from common NIR changes.
drm-shim is not a substitute for testing against real hardware.
Special thanks to Emma Anholt and Icecream95 for building this infrastructure.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15930>
Originally, PAN_GPU_ID was checked in the driver itself. I added the mechanism
to run Bifrost shader-db on my Midgard laptop. There was no drm-shim support at
this point, and this was a reasonable stop gap at the time.
Nowadays, we have a competent drm-shim implementation, which wholly replaces
this use case. So PAN_GPU_ID is only useful for drm-shim. Let's pull the code
into drm-shim and get it out of the driver. This allows NDEBUG drm-shim builds
to work properly.
While we're at it, the default emulated GPU is changed from Mali-T860 to
Mali-G52. This reflects our shifting development priorities.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15930>
Since 5d187e9cad ("panfrost: Add helpers to set batch masks"), we have common
helpers to set the colour and depth/stencil batch masks. Rather than set the
masks in various Midgard/Bifrost specific paths, set them generically based on
the finer dirty tracking. This lets us share the logic with Valhall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15956>
Midgard aggregates a large amount of state into its renderer state descriptor.
Our current dirty tracking reflects this, with a single RENDERER dirty flag.
That won't work well on Valhall, which splits out orthogonal state into
independent descriptors (a blend descriptor, a depth/stencil descriptor, and so
on). To prepare for Valhall support, this patch moves the driver to finer dirty
tracking.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15956>
The optimized routine documented the tiling format together with the software
algorithm. The reference implementation wants the tiling format alone
documented. Let's break out the high level documentation into somewhere
centrally accessible, and refocus the comments in the optimized file on the
optimization.
This documentation is linked bidirectionally with both implementations, so it
should be easy to find.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15803>
The exact semantics of these routines are subtle, although they match what
Gallium wants. We're about to add unit tests. Add some comments that make it
obvious what it is we expect these routines to do. (In particular, it's not a
general region-of-interest copy, it's a region-of-interest of the tiled image
and the entire linear staging image.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15803>
Now that tiled access to 3D textures works, we can enable tiling on all texture
targets. In particular, this adds tiling support for cube maps, arrays, and 3D
textures. Previously, these would usually fall back to linear, which is hard on
the caches.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15803>
Thanks to our macros and some type trickery, our generic tiling routines are
type-generic. So we just need to add 48-bit and 96-bit texel types to tile. Note
we only support power-of-two bit sizes in the specialized tile routines for the
sake of replacing a multiplication with a shift.
With this change, all pixel formats supported in Panfrost are tileable.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15803>
This fixes a number of tests like :
dEQP-VK.renderpass*.suballocation.multisample.s8_uint.*
dEQP-VK.renderpass*.suballocation.multisample.separate_stencil_usage.d24_unorm_s8_uint.*.test_stencil
dEQP-VK.renderpass*.suballocation.multisample.d24_unorm_s8_uint.*
dEQP-VK.renderpass*.suballocation.multisample.d32_sfloat_s8_uint.*
Because the driver asserts when generating RENDER_SURFACE_STATE with a
8 Valign value for stencil buffer (only 2 & 4 are supported).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12670>
Fossilize always enables all supported extensions, that means that
adjust_frag_coord_z would always be enabled on RDNA2, even if the
application doesn't enable it. The pipeline key would then be different
and precompilation wouldn't work. Move this per-pipeline since we can
know if VRS will be used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15444>
The DXIL validator doesn't like dynamic indexing into resources if the
resource was not declared as an array type. This commit makes it so that
we always generate array resource types if the original type was
declared as an array instead, not just when the number of elements is
greater than 1.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14988>
This prevents more use-after-free errors. Passing them around using
std::unique_ptr ensures that the LLVMContext gets destroyed but doesn't
ensure destruction order. Declaring it on the stack ensures that the
context doesn't get destroyed until right before the the function
returns which is after any other LLVM stuff is destroyed.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15937>
First, separate out the LLVM context logging to make it take a
clc_logger instead of passing in a string stream. Currently, the LLVM
context may outlive the string stream which we assign which may lead to
use-after-free errors. Second, use a separate string stream for clang
diagnosticl logging which we intentionally declare before the compiler
so the compiler can't outlive it.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15937>
The TGSI backend avoids TGSI_OPCODE_FMA (and thus OP_FMA) pre-nvc0,
replacing it with TGSI_OPCODE_MAD in that case.
Noticed when looking at native-NIR stats and finding that load
optimization wasn't taking place on the unsupported opcode.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15543>
The host virglrenderer can only handle moves to integer outputs, all
ALU opt that create integer outputs are created with extra code to convert
to float for the temporaries, and this breaks the output write
handling.
Fixes:
spec@arb_sample_shading@builtin-gl-sample-mask *
spec@arb_sample_shading@builtin-gl-sample-mask-simple *
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15921>
If we made a copy deref, then we need to do dead-write elimination for the
pervious writes or we'll just emit the same copy deref again next time
around. And, at the end of the opt loop, we need to lower copy derefs
because later passes (locals_to_regs, notably) depend on it.
Fixes infinite opt loop on fs-function-inout-array with virgl on NTT.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15899>
rusticl (and clover) would like to get a graceful fail here so they can
fall back to a shadow copy instead of us asserting. We also start
rejecting arrayed surface because isl doesn't allow selecting a QPitch
yet. Even if it did, QPitch is horribly restrictive, even for linear
surfaces, that it likely wouldn't be that useful.
Fixes: e81f3edf76 ("iris: Allow userptr on 1D and 2D images")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15903>
Even if we're the first job on some queue, there may be no wait
semaphores but we still need to ensure things happen in-order. (See
the "Implicit Synchronization Guarantees" section of the Vulkan spec.)
The client can submit back-to-back command buffers with no semaphores
between them and it needs to adt the same as if there were a semaphore.
If job->serialize is set because of a barrier or something, we still
need to synchronize across HW queues by waiting on last_job_syncs.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
In order to properly wait for a query to be complete, we need to first
wait for the end query job to flush through on the queue. Since query
end is always handled on the CPU, we can do this with a condition
variable. The 2s timeout is taken from ANV.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
Instead of having the CPU job execute the CSD job, put both jobs on the
list with the CPU job first which modifies the GPU job which gets kicked
off next. This gives the queue code more visibility into what types of
jobs are actually in the list. In particular, if an indirect compute
job is the last job in a batch buffer, it currently appears as if the
batch ends with CPU work which isn't true because it kicks off GPU work.
In that case, the last job on the list is now a GPU job, which better
matches reality.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
The v3dv kernel driver doesn't support timelines yet but we want
threaded submit and that requires WAIT_PENDING. Fortunately, it should
never sit in this loop for long in practice. The primary use-case is
sorting out dependencies and these checks will always trivially succeed
for non-shared semaphores because v3dv only has a single queue.
Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
c78be5da30 ("intel/fs: lower ray query intrinsics") introduced a
helper function using nir_(push|pop)_if which invalidated dominance &
block_index for the replacement of nir_intrinsic_rt_trace_ray.
We can still keep dominance/block_index metadata for the lowering of
nir_intrinsic_rt_execute_callable though.
This change uses 2 different lowering function with correct metadata
preservation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15910>
FLUSH_HDC is sufficient to flush things out to L3, so we'd rather
use that where possible. It's also emulated via DATA_CACHE_FLUSH
on platforms where it isn't supported, so we can use it unconditionally.
We still use DATA_CACHE_FLUSH for invalidating the data cache, and to
flush the DC-tagged cachelines in L3 to be globally-observable.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
Push constant loading is not coherent with L3 according to the document
that describes the hardware change for the vertex buffer L3 Bypass
Disable field.
If we've updated a push constant buffer with say, a blorp_buffer_copy,
we may need to flush both the render cache and the tile cache.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
When stream output is active, we need to let the cache tracker know
about any SO buffers, which we access via IRIS_DOMAIN_OTHER_WRITE.
In particular, we may have written to those buffers via another
mechanism, such as BLORP buffer copies. In that case, previous writes
happened via IRIS_DOMAIN_RENDER_WRITE, in which case we'd need to flush
both the render cache and the tile cache to make that data globally-
observable before we begin writing via streamout, which is incoherent
with the earlier mechanism.
Fixes misrendering in Ryujinx.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6085
Fixes: d8cb76211c ("iris: Fix MOCS for buffer copies")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
Most clients are L3-coherent these days. However, there are some
notable exceptions, such as push constants, stream output, and command
streamer memory reads and writes.
With the advent of the tile cache, flushing the render or depth caches
alone are no longer sufficient for memory to become globally-observable.
For those, we need to flush the tile cache as well. However, we'd like
to avoid that for L3-coherent clients, as it shouldn't be necessary,
and is expensive.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
The render, depth, sampler, and data (HDC) caches are all coherent
with L3. We consider OTHER_READ and OTHER_WRITE to be non-coherent,
as they're kitchen-sink domains which include non-L3-clients.
Starting with Tigerlake, the VF cache is coherent with L3 (because we
set the L3BypassDisable bit in the vertex/index buffer packets).
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
On Tigerlake, we use the data cache for reading indirect UBOs instead
of the sampler. But we still use the constant cache for direct UBO
access, so unfortunately we may access it through two different domains.
To work around this, we add a new domain for pull constants (UBOs),
which will be either constant+texture or constant+data.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
The bulk of IRIS_DOMAIN_OTHER_READ domain usage was the 3D sampler, but
there were also a few oddball cases like command streamer reads, blitter
access, and so on. The sampler is definitely L3 coherent, but some off
the more esoteric reads may not be, so I'd like to separate them, so
that OTHER_READ can become a non-L3-coherent kitchen-sink domain.
The sampler cases only need TEXTURE_CACHE_INVALIDATE, and can skip the
CONSTANT_CACHE_INVALIDATE we had on IRIS_DOMAIN_OTHER_READ.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
We were using IRIS_DOMAIN_OTHER_READ for read-only depth/stencil access
in an attempt to avoid unnecessary flushing; IRIS_DOMAIN_DEPTH_WRITE
could indicate read-write access.
However, IRIS_DOMAIN_OTHER_READ is clearly the wrong domain. Depth and
stencil data is read via the depth cache, while IRIS_DOMAIN_OTHER_READ
currently corresponds to the sampler cache and constant cache together
(although this will change in future patches).
It's unclear whether this hack was useful. For now, just drop it and
use the correct depth cache domain, even if it's marked as read-write.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
For texture fetches and buffer load the fix is not needed,
and the override creates faulty TGSI.
In addition remove all modifiers from the src in the additional mov
instruction.
Fixes: d1c7a7b131
virgl: Add an extra mov for int outputs from constant and immediate inputs
v2: Move workaround after the use of
virgl_tgsi_rewrite_src_for_input_temp (Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15896>
This patch adds support for transcoding ASTC to BC7 (BPTC) and prefers
it over BC3 (DXT5) when hardware supports that format.
BC7 is a much newer format (~2009 vs. ~1999) and offers higher quality
than the older BC3 format. Furthermore, our encoder seems to be faster.
Tapani put together a small benchmark for transcoding a 1024x1024 ASTC
texture, and switching from BC3 to BC7 improves performance of that
microbenchmark by 25% on my Tigerlake NUC (with hardware ASTC disabled
so we can test this path). Presumably, this isn't fundamental to the
formats, but rather reflects the speed of our in-tree compressors.
So, we should use BC7 where possible.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15875>
now zink will init using a priority system if multiple devices are available
multiple devices will ONLY be available if:
* the user does not specify VK_ICD_FILENAMES as they should
* the user does not specify LIBGL_ALWAYS_SOFTWARE
* multiple drivers exist
I've prioritized the virtualized gpu here with the assumption that if
such a thing is detected, the environment is most likely virtualized
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15857>
It can happen that a single vector constant is referenced multiple times
by the same node, with different swizzles.
This needs to be taken into account by checking and updating the
swizzles for all the srcs of a target node when inserting the const
node to the same instruction.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15726>
When emitting the AR forces splitting an ALU group, and at the same time
a new CF instruction is started, then the last instrcution in the finished
CF block might not have the "last" bit set, which results in an invalid
shader that might hang, or crash SB.
So when a new CF is started, force the last bit in the last ALU instruction.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
ALU_SRC_PARAM_BASE is an inline constant that defines the
address for pulling data from LDS memory for interpolation
and not a value from the kcache, so there is no need to
take these values into account when allocating kcache
load slots.
v2: Fix the constant range check to not exclude the translated
ranges for kcache banks 2 and 3.
v3: limit range check to only include kcache values and and
rename relevant function (Emma).
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
We don't need to disable this path if there are indirect or 8/16/64-bit
push constant loads. We can just use the default path for them.
fossil-db (Sienna Cichlid):
Totals from 21 (0.02% of 134621) affected shaders:
CodeSize: 2028 -> 1884 (-7.10%)
Instrs: 366 -> 363 (-0.82%); split: -2.46%, +1.64%
Latency: 6630 -> 6579 (-0.77%)
InvThroughput: 26520 -> 26316 (-0.77%)
Copies: 84 -> 102 (+21.43%)
PreSGPRs: 141 -> 222 (+57.45%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145>
NIR doesn't propagate precise through moves, and with NTT the
last output is usually preceded by a move, so that we no longer
see that the evaluation of some value is supposed to be exact,
and, hence we can't decorate the outputs accordingly.
Fixes with NTT:
dEQP-GLES31.functional.tessellation.common_edge.
triangles_equal_spacing_precise
triangles_fractional_odd_spacing_precise
triangles_fractional_even_spacing_precise
quads_equal_spacing_precise
quads_fractional_odd_spacing_precise
quads_fractional_even_spacing_precise
v2: Don't clear the precise flag when we hit a mov, because we may
hit a if/else construct like below and we don't track branches
IF X
TEMP[0] = OP_PRECICE ...
ELSE
TEMP[0] = MOV CONST[]
ENDIF
Thanks Emma for pointing out the problem.
v2: allocate precise handling flags to transform_prolog (Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15836>
drm_shim.c undefines the _FILE_OFFSET_BITS macro, so plain off_t might
be 32 bits, while it's 64 bits in device.c. To avoid this mismatch,
use off64_t which will always be 64 bits in both source files.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
loader_open_render_node returns the first device in /dev/dri that it
can use. To make sure the drm-shim device always gets chosen, return
the fake entries in readdir before returning the real ones.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
Will allow us to drop more GLSL IR code in future once we switch
all drivers to NIR. Also stops the need for all drivers to call
this pass to remove indirect temps that may have been added during
the NIR varying linking lowering/optimisations.
This patch fixes some tests on i915, d3d12, lima and vc4.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15871>
We've wanted to remove destructors from ralloc's API for a long time (it's
an extra storage cost per ralloc for a rarely-used feature), and for the
suballoc change we'd need to spend more storage on storing the tu_device
pointer per result since destructors don't get anything else but the
pointer passed into them.
Fixes use-after-frees:
=================================================================
==2383==ERROR: AddressSanitizer: heap-use-after-free on address 0xffff88fe1940 at pc 0xffff934f427c bp 0xfffff5481e90 sp 0xfffff5481ea8
WRITE of size 8 at 0xffff88fe1940 thread T0
#0 0xffff934f4278 in list_del ../src/util/list.h:108
#1 0xffff934f4278 in result_destructor ../src/freedreno/vulkan/tu_autotune.c:237
#2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#4 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
#5 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#6 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#7 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
[...]
0xffff88fe1940 is located 80 bytes inside of 112-byte region [0xffff88fe18f0,0xffff88fe1960)
freed by thread T0 here:
#0 0xffff9c1c90d8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
#1 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
#2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#4 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
#5 0xffff935cf2ac in tu_queue_submit_locked ../src/freedreno/vulkan/tu_drm.c:997
[...]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
We don't return unused space to the suballocator, so it's a little useful
to limit how much we overallocate to reduce memory footprint. I took a
look through the tu_cs_emit_array() calls and accounted for a couple of
them in the variant-specific space calculation, then dropped the base
allocation by factors of 2 until we started throwing asserts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
Allocating a BO for each pipeline meant that for apps with many pipelines
(such as Asphalt9 under ANGLE), we would end up spending too much time in
the kernel tracking the BO references.
Looking at CS:Source on zink, before we had 85 BOs for the pipelines for a
total of 1036 kb, and now we have 7 BOs for a total of 896 kb.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
The intel_perf_query path used for performance queries on GL was
passing a bogus "end" pointer to intel_perf_query_result_accumulate(),
causing it to accumulate garbage values. This was causing the values
of many performance counters to be corrupted.
The "end" pointer was incorrect because the current code was assuming
that different OA reports were located TOTAL_QUERY_DATA_SIZE bytes
apart, which is a hard-coded preprocessor define. However recent
(Gfx12+) hardware generations use a variable query size determined by
the query layout. Use the size derived from it instead, and remove
the stale define.
Fixes: 3c51325025 ("intel/perf: switch query code to use query layout")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15783>
gallium queries have to be mapped onto multiple vulkan queries,
this can happen for two reasons.
1. primitives generated and overflow any don't map directly, and
multiple vulkan queries are needs per iteration. These are stored
inside the "starts" as zink_vk_query ptrs.
2. suspending/resuming queries uses multiple queries, these are
the "starts". Every suspend/resume cycle adds a new start.
Vulkan also requires that multiple queries of the same time don't
execute at once, which affects the overflow any vs xfb normal
queries, so vk_query structs are refcounted and can be shared
between starts. Due to this when the draw state changes, it's
simple to just suspend/resume all queries so the shared vulkan
queries get handled properly.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15785>
This enables a lower power mode in the sampler hardware in certain
common scenarios. On Tigerlake, SAMPLER_MODE is not programmable by
userspace but the kernel already sets this bit for us.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
While this register still exists, it's no longer a per-context register.
Instead, on Gfx12+, SAMPLER_MODE exists per dual-subslice and is
accessed as a "multicast" register, where you write control which
version is accessed by the "steering control register".
At any rate, userspace cannot write it any longer, and so there's not
much point to it existing in our genxml (which was missing most of the
fields anyway).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
This allows the sampler to perform faster filtering of 8-bit UNORM
textures by filtering them at a different precision. The filtering
is intended to still be OpenGL and DirectX spec compliant.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
according to spec, a barrier is required any time the pixels of the
framebuffer are changed. since zink defers clears and runs them at
a later time, it must also be responsible for handling the required
synchronization for such operations
fixes (radv):
KHR-GL46.blend_equation_advanced.blend_all*
fixes#5572
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15831>
according to spec, for fbfetch this should match the subpass self-dependency of
* stage FRAGMENT_STAGE -> FRAGMENT_STAGE
* access 0 -> INPUT_ATTACHMENT_READ
zs fbfetch doesn't seem to be a thing, so that code is left for historical
and/or future purposes
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15831>
this is super annoying since it means that a build of zink cannot
be mix-and-matched with an existing build of lavapipe, e.g., for faster
bisecting
the env var should be sufficient to handle this, and if someone sets it
and doesn't have swrast enabled then they can deal with it
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15844>
Previously, the caller allocated storage and tgsi_transform_shader() would
emit into that, returning how many tokens it emitted. All the callers had
to guess at how much storage was necessary, trying not to over-allocate
but also getting enough that you wouldn't (effectively) silently run out
of space.
Instead, make tgsi_transform_shader() do the allocation for you, taking
just a hint of how much space you think you need, and internally double
size when necessary. Fixes failures on virgl with fp64 since we've added
more fp64 virglrenderer workarounds and its old "XXX: is this enough?"
allocation wasn't any more.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15782>
virglrenderer maps atomic accesses to atomic counter declarations using
the .Index field. We were previously emitting a .Index of 0 for array
accesses, so virglrenderer would emit
atomicIncrement(first_counter[counter_offset+array_index]). This would
mostly work because hardware doesn't care about the bounds of counter
declarations, but if the first counter was a non-array, then the [] GLSL
emit gets dropped (can't array access a scalar!) and you'd access the
non-array first_counter instead.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15824>
this is an illegal alignment, so clamp the range to the nearest
texel offset since the shader should be hitting the robustness
case for the partial texel
affects:
dEQP-GLES31.functional.texture.texture_buffer.modify.mapbuffer_readwrite.range_size_65537
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15827>
This uses WRITE_DATA on the ME engine to reset the register, to match what
PAL does on GFX9+.
This fixes
KHR-GL45.transform_feedback_overflow_query_ARB.multiple-streams-one-buffer-per-stream
on zink/radv.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15812>
The array is allocated for VkDrmFormatModifierPropertiesEXT, so
writring entried with type VkDrmFormatModifierProperties2EXT is
bogus.
It seems this was a mistake added with a change intended to get
rid of VK_OUTARRAY_MAKE, that changed the type of the write by
mistake.
Fixes: 56a2ccf058 ('v3dv: Stop using VK_OUTARRAY_MAKE()')
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15819>
There are reliability problems with the RTL8153 ethernet driver under
certain network loads, related to incompatibility of the device with
Link Power Management.
Add usbcore.quirks=0bda:8153:k to the kernel command line to enable the
USB_QUIRK_NO_LPM option.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15791>
This happens when we have a sequence of multiple beginQuery / endQuery
with the same target and query identifier.
As a BO is created on beginQuery but not free on endQuery (we need to
wait until either explicitly deleting the query or querying the
results), in the above sequence we are basically leaking the BO.
This ensures that any BO created before is unreference before creating
the new one.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15796>
We don't know if an application developer wants error-dialog by default
or not, even when using a Mesa OpenGL driver.
So let's not assume this is a good thing, and instead make an option for
this behavior that we can enable for CI testing.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15485>
These dialogs only exist on Windows, so let's not even expose a util
function for this on other platforms.
The code is only ever called from Windows specific code anyway.
While we're at it, clean up the name a bit as well.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15485>
Move one of the is_supported() check before we start filling the
structure so we don't end up with a partially filled object when
we return VK_ERROR_FORMAT_NOT_SUPPORTED (which deqp doesn't seem to like,
so it's probably coming from a spec requirement).
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15698>
VK_REMAINING_ARRAY_LAYERS maps to -1 in the D3D12 world. Let's make sure
we set WSize to -1 in that case, because the layer_count calculated by
dzn_get_layer_count() won't work for 3D images which never have more
than one layer (in case of 3D images, we treat slices as layers).
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15698>
We have been reconstructing/rematerializing uniforms for a while, but we
can do this in more scenarios, namely instructions which result is
immutable along the execution of a shader across all channels.
By doing this we gain the capacity to eliminate TMU spills which not
only are slower, but can also make us drop to a fallback compilation
strategy.
Shader-db results show a small increase in instruction counts caused
by us now being able to choose preferential compiler strategies that
are intended to reduce TMU latency. In some cases, we are now also
able to avoid dropping thread counts:
total instructions in shared programs: 12658092 -> 12659245 (<.01%)
instructions in affected programs: 75812 -> 76965 (1.52%)
helped: 55
HURT: 107
total threads in shared programs: 416286 -> 416412 (0.03%)
threads in affected programs: 126 -> 252 (100.00%)
helped: 63
HURT: 0
total uniforms in shared programs: 3716916 -> 3716396 (-0.01%)
uniforms in affected programs: 19327 -> 18807 (-2.69%)
helped: 94
HURT: 50
total max-temps in shared programs: 2161796 -> 2161578 (-0.01%)
max-temps in affected programs: 3961 -> 3743 (-5.50%)
helped: 80
HURT: 24
total spills in shared programs: 3274 -> 3266 (-0.24%)
spills in affected programs: 98 -> 90 (-8.16%)
helped: 6
HURT: 0
total fills in shared programs: 4657 -> 4642 (-0.32%)
fills in affected programs: 130 -> 115 (-11.54%)
helped: 6
HURT: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15710>
Bit 20 isn't actually MERGEDREGS, the mode for the entire geometry
pipeline is controlled by SP_VS_CTRL_REG0::MERGEDREGS and it appears to
be something preamble-related instead since writing any register in the
preamble hangs if it's set. This fixes those hangs on freedreno and
turnip since we no longer set it.
Fixes: fccc35c2de ("ir3: Add preamble optimization pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15801>
since 1.0 is used in nearly every case, drivers requiring this exporting
can avoid potential shader variants by adding a 1.0 export to the base
shader variant and the only using the ubo upload when pointsize is explicitly
set for wide point functionality
drivers can then be responsible for removing unused pointsize exports
as needed (or desired)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15699>
We don't need to restrict our timeout to 5 seconds, because the kernel's
hangcheck will ensure that the wait completes in finite time if the GPU
gets wedged. If the GPU is making progress, we don't want to time out
early and have pipe_transfer_map() return an error, causing glReadPixels()
to throw a confusing GL_OOM even though we're not out memory.
The INFINITE arg to this function isn't actually infinite, it's limited to
an hour. But an hour of GPU processing to wait on is probably plenty.
This 5s timeout has caused problems with the CTS on freedreno at high
parallelism, and I suspect is the cause of recent issues in the closed
traces replay jobs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15805>
Much less noisy, and provides a path to further improvements. There is a slight
behaviour change: int-to-float conversions now use RTE instead of RTZ. For
32-bit opcodes, this affects conversions of integers with magnitude greater than
2^23 by at most 1 ulp. As this behaviour is unspecified in GLSL, this change is
believed to be acceptable.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15187>
To prepare for defeaturing round modes, replace uses of round-to-positive with
round-to-even in our unit tests. This doesn't meaningfully impact test coverage;
there is no way to generate that round mode.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15187>
There are new data structures that we need to (dirty) track. Add the
corresponding fields so we can proceed as with Bifrost dirty tracking.
Trivial increase in memory usage, but that should not matter as batches are
few.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15797>
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. The
maximum number of gradient components and coordinate components should
be 2. In spite of this limitation, the Bspec lists a mysterious R
component before the min_lod, so the maximum coordinate components is 3.
Fixes the following Vulkan CTS failures on DG2:
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2d_fragment
The Fixes: tag below is a bit misleading. This commit fixes some test
cases similar to ones fixed by the Fixes: commit. I just want to make
sure this commit gets applied everywhere that commit was also applied.
Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15781>
Valhall removed the ability to set an explicit primitive restart index as
required by desktop OpenGL, in favour of fixed primitive restart indices only as
required by OpenGL ES.
Set the CAPs accordingly so that mesa/st lowers unusual primitive restart
indices at draw call time.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15795>
Surface descriptors have been replaced by plane descriptors, which facilitate
the intermediate layout of textures. This allows for more sophisticated handling
of texture compressions, of particular to interest to copy_image. However, it
requires a considerable amount of new logic to handle.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15795>
Issue: When a video is decoded where the max_references is updated the
decoder keeps using same old value. This results into green patches and
decoding is not proper.
Root Cause: The max_references is updated only once when the instance is
created for the first time.
Fix: Added a check along with the context->decoder to check if the
context->templat.max_references has changed. If yes, then we go ahead
and create the decoder again.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Krunal Patel <krunalkumarmukeshkumar.patel@amd.corp-partner.google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15750>
Based on hardware behaviour, it appears vertex_id is zero-based with the legacy
geometry flow but not with the new malloc IDVS flow. Since the geometry flow is
per-shader (not per-machine), there's not a good way to communicate this to NIR.
Rather than trying to shoehorn this obscure detail into NIR, just do the
lowering ourselves instead of in NIR. It's not much more code anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>
Our swizzle lowering optimizations depend on replication of scalar fp16. This
holds on Bifrost (at least for now), but not on Valhall which has proper support
for write masks. For now, enforce Bifrost-compatible behaviour as we do not make
use of the write masks on Valhall yet.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>
Varying stores was changed in Valhall. Rather than using attribute descriptors
like on Bifrost and Midgard, on Valhall we store to memory directly with
hardware-allocated buffers. This requires a new implementation of store_output,
with special provisions for writing gl_PointSize from a position shader.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>
Something an instruction has two logic flow controls, namely wait + reconverge.
These are orthogonal -- we need to insert a NOP to handle this. Add a lowering
pass that works out flow control to replace the ad hoc previous va_pack_flow.
Fixes dEQP-GLES31.functional.ssbo.layout.single_basic_type.shared.lowp_vec3.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15756>
Similar to pipeline statistics but done for a single counter.
We use REG_A6XX_RBBM_PRIMCTR_7 to get generated primitives
and not PRIMCTR_8 because PRIMCTR_7 counts pre-clipped prims
while PRIMCTR_8 counts them after clipping.
OpenGL spec for GL_PRIMITIVES_GENERATED says:
"Subsequent rendering will increment the counter once for every
vertex that is emitted from the geometry shader, or from the
vertex shader if no geometry shader is present."
Passes tests:
dEQP-VK.transform_feedback.primitives_generated_query.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15746>
This has one negative side-effect; we're no longer able to print
validation errors without dxcompiler.dll. I doubt that's a real problem,
but if it is, we should add this ability to dxil_validator instead of
having a second implementation here.
The reasons I didn't try adding this in the first place is:
1. This code seems a bit janky; it doesn't consult the "known"-variable
to figure out if the encoding is OK, and it's lacking a fallback path
in that case.
2. It seems unlikely that the compiler varies the encoding of the output
in the first place; one of the two code-paths in here is probably
untested.
3. Since dxil_validator leaves reporting to the call-site, we'd need to
either add and output-encoding to the API (yuck), or re-encode the
string to UTF-8 using WinAPI.
Right now, it seems questionable if fixing all of the above is worth it.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15751>
by utilizing a separate slot map for patch variables, the entire i/o
assignment mechanism can be simplified to accurately manage i/o for all
types of variables and avoid location conflicts
affects:
KHR-Single-GL46.enhanced_layouts*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15770>
the previous methodology triggered warnings any time a rasterizer state
was created with unsupported features without determining whether those
features would actually be used
a more optimal process is to check for missing features at pipeline creation,
as all the necessary info is now available, and spurious warnings can be avoided
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15778>
deqp-runner uprevved to reduce memory usage on HW runners, let us
experiment with shader cache on tmpfs, and hopefully provide a tool for
virgl to be able to plausibly run piglit under crosvm instead of vtest.
piglit uprevved to avoid a flake in softpipe in glx-multithread-texture,
and improve performance of the test, too. This also brings in the
fbo-blending-format-quirks fix to properly initialize the buffers, fixing
some fails/flakes.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15419>
When porting lavapipe to the common sync framework, I stole the dummy
sync signal_for_memory idea from RADV but didn't actually do it
correctly. Unless you set wsi_device::signal_semaphore_with_memory and
wsi_device::signal_fence_with_memory, it doesn't actually signal
anything. If you do set those, it works but also results in dummy
syncs being created for present fences which we see as signals. We
could choose to just skip those like RADV does but that's too magic.
Instead, have our own AcquireNextImage2() again which sets dummy syncs.
Fixes: 3b547a9b58 ("lavapipe: Switch to the common sync framework")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15774>
Cube descriptors require a different sampling instruction in shader,
however we don't know whether image is a cube or not until the start
of a renderpass. We have to patch the descriptor to make it compatible
with how it is sampled in shader.
For the reference subpassLoad is currently translated into isaml.a
Blob v615 also doesn't handle this case correctly.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15734>
When we don't have a good dxcompiler.dll that we can load IDxcLibrary
from to help with diagnostics, we currently return true for validation
even if the validation actually failed.
Let's fix that, and also add a debug-message explaining what went wrong
for those who are debugging and wondering what's up.
Fixes: 2ea15cd661 ("d3d12: introduce d3d12 gallium driver")
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15744>
In order to ensure consistent results when running performance tests,
lock the frequency for Intel GPUs to ~70% of the maximum allowed by
hardware.
This seems to offer a good balance between execution speed and results
consistency.
An increase of the frequency will also increase the rate of throttling
events, with a negative impact on consistency. Such events are logged,
as in the following example:
GPU throttling detected: act=200 min=850 cur=850 RPn=100
This shows the actual GPU frequency (200 MHz) dropped below the minimum
requested (850 MHz).
For more details about the various frequency information sources, please
see the script header comments in ".gitlab-ci/common/intel-gpu-freq.sh".
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15662>
Add script to manage Intel GPU frequencies.
It can be used for debugging performance problems or to lock a stable
frequency while executing benchmark tests.
Typical use cases:
- Get all available GPU frequency information
$ ./intel-gpu-freq.sh -g all
* Hardware capabilities
RP0: 1350 MHz
RPn: 100 MHz
RP1: 400 MHz
* Enforcements
max: 1350 MHz
min: 100 MHz
boost: 1350 MHz
* Actual
act: 100 MHz
cur: 400 MHz
- Lock frequency to 80% of the maximum allowed by hardware and enable
throttling detection
$ ./intel-gpu-freq.sh -s 80% -d
GPU throttling detected: act=1050 min=1350 cur=1350 RPn=100
GPU throttling detected: act=1100 min=1350 cur=1350 RPn=100
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15662>
Vulkan spec says:
"When an image view of a depth/stencil image is used as a depth/stencil
framebuffer attachment, the aspectMask is ignored and both depth and
stencil image subresources are used."
Since we use two planes for D32S8 format we have to add a special
case for depth in addition to already existing case for stencil.
Fixes hang in CTS:
dEQP-VK.renderpass.depth_stencil_write_conditions.stencil_kill_write_d32sf_s8ui
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15532>
- When resolving d32s8 to s8 we stored stencil with a wrong format.
- For unaligned multi-sample store we used wrong gmem offset for stencil.
If unaligined store is forced this change fixes a hang in:
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_32_32.samples_2.d32_sfloat_s8_uint_separate_layouts.compatibility_depth_zero_stencil_zero_testing_stencil
Fixes: b157a5d0d6
("tu: Implement non-aligned multisample GMEM STORE_OP_STORE")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15532>
this interaction requires read-only sampler access from
depth component with writes to the stencil component, which can only
be done in the GENERAL layout
affects:
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_color_and_stencil_blit
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15716>
ssbo binds are always at least readable, so without deeper shader
inspection, never assume that write binds mean no read access
this is different than image descriptors which can explicitly get
write-only access set
cc: mesa-stable
fixes#6231
THANKS TO @daniel-schuermann FOR SOLVING THIS WITH HIS INCREDIBLE TALENT AND WIT
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15697>
The common Vulkan sync framework will do most of the queueing for us.
It will even sort out timeline semaphore dependencies and ensure
everything executes in-order. All we have to do is make sure our
vk_sync type implements VK_SYNC_FEATURE_WAIT_PENDING. This lets us get
rid of a big pile of code.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15651>
dEQP-VK.binding_model.buffer_device_address.set3.depth3.basessbo.convertuvec2.nostore.multi.scalar.vert
runtime -24.4002% +/- 1.94375% (n=7). The win (I think) is in LLVM not
having to chew through handling the extra loops on every constant-offset
SSBO load, not in actual rendering time.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14999>
If we know the offset is constant, we don't have ask LLVM to loop over the
elements pulling the same value out over and over.
This doesn't seem to have produced a win in the testcase I was looking at,
but it was an easier entrypoint to figuring out how to do scalar memory
access than load_memory, and will probably affect some workload.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14999>
In a fragment shader or inside of control flow, invocation 0 might be
inactive, and so our use-first-invocation-and-broadcast optimizations
would be invalid, and the loop logic of an emit_read_invocation would
defeat the point of these optimized paths.
For load_kernel_input, I didn't guard the uniform path with the check
because some CL tests that are currently passing are doing the
load_kernel_input under (presumably) uniform control flow. Instead, I
dropped in a once warning for the next person to be debugging CL.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14999>
You're not supposed to include a '\n' in mesa_log*() messages because
android logging will log what you provide on its own line anyway, so each
mesa_log() should be the body of a log line. But also, getting everyone
to consistently not do that is hopeless because we're all so trained by
printf(). So, just detect an existing \n and don't add a new one.
Cleans up deqp-vk debug output a bunch from turnip.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15332>
We do require C++20 still, because designated initializers is part of
that standard. This is almost a revert, but conditionally selecting
between c++latest or c++20 when available, as that's what we really want.
Fixes: 55ca1c8db3 ("vulkan/microsoft: Remove `override_options: ['cpp_std=c++latest']` option for visual studio")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15706>
Designated initializers are a C++20 feature, but we don't use C++20. In
fact, enabling C++20 for ACO triggers new compiler errors due to some
equality semantics details.
So let's instead stop using designated initializers here.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15706>
We can now use the same cache tracking mechanism for synchronizing QBO
writes instead of the unconditional PIPE_CONTROL performed currently,
which is unable to invalidate any incoherent caches which may contain
stale data for the buffer object.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15738>
The unconditional flushing performed by
iris_flush_and_dirty_for_history() is now redundant with the memory
barriers introduced previously in this series, which should be in a
better position to determine from which domain the buffer will
actually be used in the future, and whether an additional flush or
invalidation is required or redundant with other PIPE_CONTROL commands
emitted elsewhere.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15738>
We don't support 'Update After Bind', however, the limits for this
model also include the ones without it. See the with or without remark
in the spec below:
"maxPerStageDescriptorUpdateAfterBindInlineUniformBlocks is similar to
maxPerStageDescriptorInlineUniformBlocks but counts descriptor bindings
from descriptor sets created with or without the
VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT bit set."
Fixes:
dEQP-VK.api.info.vulkan1p2_limits_validation.ext_inline_uniform_block
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15732>
Since the packing functions generated by csbgen use a void pointer
for the buffer in which to pack, it's possible to easily write out
of bounds. This commits attempts to reduce the chances by
having the pack macro check that the pointer passed points to an
element sized equally to the state word being packed. Catching
these errors earlier.
As can be seen in this commit, there already was a case of this:
"pds_ctrl". The word size is meant to be 64 bits but the pointer
was pointing to a 32 bit field.
Although it's fine for the word size to be smaller than the
storage pointed to by the pointer, this is not allowed just to
be extra careful.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15687>
If we know the height is 1, then it would be a waste to align each
miplevel to tile height. For non-mipmapped textures, it doesn't save us
memory (since you still align to 4 on the last miplevel), but it should be
better cache locality by not loading those unused lines.
Incidentally, this gets us some more coverage of swap != WZYX cases in CTS
tests, which often use optimal tiling without also testing linear.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15293>
This field does appear to work as expected: with 1D/1DArray turnip storage
images switched to be always linear, it fixes the dEQP-VK.image.*store*
tests using a color swapped format (once we allow color swap).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15293>
This reports all of our storage formats as supporting read/write without
format, since we don't have any in-shader format conversions. Similarly,
shadow comparisons were already supported on all the depth formats.
This extension is required for VK 1.3.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15293>
This is a potential performance optimization, from the docs:
The PVS Instruction which updates the clip coordinatea position for
the last time. This value is used to lower the processing priority
while trivial clip and back-face culling decisions are made. This
field must be set to valid instruction.
Right now it is set at the last instruction. However, there is no
measurable performance effect in either Unigine Sanctuary, Lightsmark
or GLmark.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6045
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15677>
This ensures that any channels used for helper invocations are
also spilled/filled correctly.
Alternatively, we could recursively track all temps that get
involved in computing values that are then used in explicit
(dfdx,dfdy) or implicit (texture coordinates for mipmap or
anisotropic filtering, etc) derivatives, and only enable
per-quad on these (or disable spilling of any of these
values).
Fixes:
dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15705>
This extension has been promoted to Vulkan 1.2 which means it has been
silently enabled when we implemented Vulkan 1.2.
Enable it explicitely to make mesamatrix happy and also for consistency.
This extension was designed for potential performance improvements of
MSAA depth/stencil images but it's currently a no-op in RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15665>
The current code was allowing multiple query pools of the same type to be
active on the same commmand buffer which creates validation warnings.
Restructure the query handling, so the query pools are allocated on the
context on first use, then have queries allocate query_id from those pools.
This approach creates a new start dynamic array that contains each time a
vulkan query is started for a gallium query, and holds the flags for it rather
than storing them in a massive array.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15702>
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. Cube
maps and 3D surfaces were already handled, but 1D array and 2D array
surfaces were not.
Fixes the following Vulkan CTS failures on DG2:
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2darray_fragment
The Fixes: tag below is a bit misleading. This commit adds another
lowering, similar to the one in the Fixes: commit, that probably should
have been added at the same time. I just want to make sure this commit
gets applied everywhere that commit was also applied.
Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15681>
Moving duplicate vk_format helper functions to common
vulkan/util/vk_format.h and also renaming
vk_format_get_component_size_in_bits to match how amd and
freedreno name the same function. Not moving this function
to common code as freedreno's implementation is a bit different.
Signed-off-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15696>
In cases when no immutable samplers were present but sampler
descriptor set layout bindings were, a seg fault was being caused
by an attempt to get the immutable sampler from within the
immutable sampler array while the array was not allocated.
This commit also remove the binding type check since only those
specific types can have immutable samplers. The check is covered
by the descriptor set layout creation and assignment of
has_immutable_samplers.
This commit also makes the immutable samplers const, since they're
meant to be immutable.
This commit also adds has_immutable_samplers field to descriptor
set layout. Previously to check whether immutable
samplers were present or not you'd check for the layout binding's
descriptor count to be 0 and the immutable sampler offset to be 0.
This doesn't tell you everything. If you have a descriptor of the
appropriate type (VK_DESCRIPTOR_TYPE_{COMBINED_,}IMAGE_SAMPLER).
So descriptor count of >1. The offset can be 0 if you don't have
immutable sampler, or have them at the beginning of the immutable
samplers array. So you can't determine if you really have immutable
samplers or not. One could attempt to perform a NULL check on the
array but this would not work in cases where you have following set
layout bindings with immutable samplers as the array stores the
whole layout's immutable samplers.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15688>
radv_image_queue_family_mask was still using queue family index values
for the special cases, while being passes a radv_queue_family enum.
This adds the bits to the enum and vk_queue_to_radv so we can implement
radv_image_queue_family_mask completely in terms of the enum.
Fixes: 1ec4e568 ("radv: abstract queue family away from queue family index.")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15423>
This is for compatibility with loaders that don't know about
__DRI_DRIVER_GET_EXTENSIONS. xserver has supported that since 1.15.0,
which is almost nine years old now.
While we're about it, fix a comment in meson.build that used to be about
megadrivers to reflect reality.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15649>
In the case where u_index_generator returns zero new vertices, we never
filled tmp_indices before trying to duplicate the last veretx. This
causes us to read unitialized memory.
This fixes a Valgrind issue triggering in glxgears on Zink:
---8<---
==296461== Invalid read of size 2
==296461== at 0x570F335: compile_vertex_list (vbo_save_api.c:733)
==296461== by 0x570FEFB: wrap_buffers (vbo_save_api.c:1021)
==296461== by 0x571050A: upgrade_vertex (vbo_save_api.c:1134)
==296461== by 0x571050A: fixup_vertex (vbo_save_api.c:1251)
==296461== by 0x57114D1: _save_Normal3f (vbo_attrib_tmp.h:315)
==296461== by 0x10B750: ??? (in /usr/bin/glxgears)
==296461== by 0x10A2CC: ??? (in /usr/bin/glxgears)
==296461== by 0x4B3F30F: (below main) (in /usr/lib/libc.so.6)
==296461== Address 0x11ca23de is 2 bytes before a block of size 1,968 alloc'd
==296461== at 0x4845899: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==296461== by 0x570E647: compile_vertex_list (vbo_save_api.c:604)
==296461== by 0x570FEFB: wrap_buffers (vbo_save_api.c:1021)
==296461== by 0x571050A: upgrade_vertex (vbo_save_api.c:1134)
==296461== by 0x571050A: fixup_vertex (vbo_save_api.c:1251)
==296461== by 0x57114D1: _save_Normal3f (vbo_attrib_tmp.h:315)
==296461== by 0x10B750: ??? (in /usr/bin/glxgears)
==296461== by 0x10A2CC: ??? (in /usr/bin/glxgears)
==296461== by 0x4B3F30F: (below main) (in /usr/lib/libc.so.6)
---8<---
Fixes: dcbf2423d2 ("vbo/dlist: add vertices to incomplete primitives")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15633>
Now that we have a threading mode in the device, we can set that based
on the environment variable instead of delaying it to submit time. This
allows us to avoid the static variable trickery we use to avoid reading
environment variables over and over again. We also move the enabling of
the submit thread up a level or two and give it a bit more obvious
condition.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15566>
Fossilize used the git:// protocol for fetching submodules
but as of January 11, 2022, GitHub has tempirarily disabled
acceptance of the Git protocol until March 15, 2022 whereby
it will be permanantly disabled.
This patch uprevs to the Fossilize commit that switches
submodule URLs to https:// instead. Otherwise, we would get
an error stating that "The unauthenticated git protocol on
port 9418 is no longer supported." when trying to clone
submodules.
See https://github.blog/2021-09-01-improving-git-protocol-security-github
for more info.
Signed-off-by: Omar Akkila <omar.akkila@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15626>
if the driver requires pointsize uploads, only flag the last vertex
stage for updates, not all vertex stages
this should be functionally equivalent but without the unnecessary overhead
of also scanning the other stages
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15590>
the vulkan spec doesn't explicitly state whether this function progresses
a given semaphore's timeline, and apparently there are some cases where
it's assumed that progress occurs if this function is called in a loop
instead of WaitSemaphores, so check the current fence for completion
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15453>
according to KHR_gl_texture_3D_image:
If <target> is EGL_GL_TEXTURE_3D_KHR, <buffer> must be the name of a
complete, nonzero, GL_TEXTURE_3D (or equivalent in GL extensions) target
texture object, cast
into the type EGLClientBuffer. <attr_list> should specify the mipmap
level (EGL_GL_TEXTURE_LEVEL_KHR) and z-offset (EGL_GL_TEXTURE_ZOFFSET_KHR)
which will be used as the EGLImage source; the specified mipmap level must
be part of <buffer>, and the specified z-offset must be smaller than the
depth of the specified mipmap level.
thus a 2d view of a 3d surface is not only legal, it's part of the spec and
must be supported when available
cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15584>
Right now we always copy pos to wpos regardless of if the fragment
shader needs it or not. Instead just do it only for the shaders
that really need it.
Shader-db is not able to measure preciselly the effect as we
build only the non-wpos variants, but given that majority of fragment
shaders don't needs the wpos, the real effect should be close:
total instructions in shared programs: 105427 -> 104313 (-1.06%)
instructions in affected programs: 53098 -> 51984 (-2.10%)
total temps in shared programs: 13991 -> 13958 (-0.24%)
temps in affected programs: 114 -> 81 (-28.95%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15321>
Make the r300_vertex_shader structure resemble the r300_framgment_shader
in order to allow support for shader variants. There is no functional
change, but it will be used in the next commit to only output wpos
when needed from vertex shaders.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15321>
Right now we always write to a temp and than emit movs to both
POS and WPOS.
When the result is trivial, like:
MOV temp[0], in[0]
MOV output[0], temp[0]
MOV output[1], temp[0]
the dataflow analysis passes can take care of it, this is however
one of the last optimizations we need the pass for. Additionally
it fails even for some still trivial cases like:
MAD temp[2], const[3], temp[0].wwww, temp[1];
MOV output[0], temp[2];
MOV output[2], temp[2];
This patch will just duplicate the output instuction when there
is only a single output write.
The long-term plan would be to just trust NIR, do as little in
the backend as possible and optimize the remaining backend passes
to not need any additional cleanups.
total instructions in shared programs: 105862 -> 105427 (-0.41%)
instructions in affected programs: 20527 -> 20092 (-2.12%)
total temps in shared programs: 14029 -> 13991 (-0.27%)
temps in affected programs: 152 -> 114 (-25.00%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15321>
Instead just emit both outputs as soon as possible.
If the last write is inside a loop or a branch, emit it after
the ENDLOOP or ENDIF. This saves some temps and also allows us
to potentially benefit from R300_PVS_XYZW_VALID_INST as right
now the position output write is always penultimate with the
WPOS output being the last.
total temps in shared programs: 14101 -> 14029 (-0.51%)
temps in affected programs: 435 -> 363 (-16.55%)
helped: 72
HURT: 0
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15321>
Now that we're using 3DSTATE_BINDING_TABLE_POOL_ALLOC to set the base
address for the binding table pool separately from surface states, we
don't actually need to update surface state base address anymore.
Instead, we can just set STATE_BASE_ADDRESS once at context creation,
and never bother updating it again, saving some heavyweight flushes
and freeing us from the need for address offsetting trickery.
This patch was originally written by Jason Ekstrand, with fixes from
Lionel Landwerlin, but was targeting Icelake. Doing it there requires
additional changes (15:5 -> 18:8 binding table pointer formats) which
also involve some trade-offs, whereas the XeHP change is purely a win,
so we'll do it here first.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15616>
You should probably resize the sources array before accessing entries
that might be out of bounds. inst->resize_sources() always allocates
enough space for at least 3 sources, so this is really only an issue
when there are 4+ sources.
Fixes: a920979d4f ("intel/fs: Use split sends for surface writes on gen9+")
Fixes: 4f86a70599 ("intel/fs: Lower DW untyped r/w messages to LSC when available")
Fixes: d372abe397 ("intel/fs: Add surface OWORD BLOCK opcodes")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15632>
Thix fixes two bugs. First, we stop leaking in/out fences with
multisync. Because the in_syncs and out_syncs parameters to
set_multisync were arrays and not pointers to arrays, the caller's
in_syncs and out_syncs pointers never got set and remained NULL so
multisync_free() always sees to NULL pointers and does nothing, leaking
both arrays. Not sure how this isn't showing up in the dEQP leak check
tests.
Second, the struct drm_v3d_multi_sync was in the scope of the then
clause of the `if (device->pdevice->caps.multisync)` so it goes out of
scope before the ioctl. This is, effectively, a use-after-free and,
depending on stack allocation details, may result in the multisync
extension struct getting stompped before the ioctl.
Fixes: ff8586c345 ("v3dv: enable multiple semaphores on cl submission")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15512>
The spec states that descriptor set layouts can be destroyed almost
at any time:
"VkDescriptorSetLayout objects may be accessed by commands that
operate on descriptor sets allocated using that layout, and those
descriptor sets must not be updated with vkUpdateDescriptorSets
after the descriptor set layout has been destroyed. Otherwise,
descriptor set layouts can be destroyed any time they are not in
use by an API command."
Based on a similar fix for RADV.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5893
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15634>
iris may rely on 3DSTATE_BINDING_TABLE_POOL_ALLOC's address being
inherited from a previous batch. So, we need to tell the decoder
about the address in case it sees binding tables to decode before
it encounters such a command in the batch.
We used to update surface_base, now we update bt_pool_base instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15625>
Use a unified si_create_copy_image_cs() to generate both the 1D and 2D
copy shaders; the shaders themselves remain separate.
Also:
- add a helper deref_ssa() for nir deref
- add a helper set_work_size() for setting workgroup and grid sizes
- pass si_context (instead of pipe_context) to the copy_image generator
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15268>
Fix defect reported by Coverity Scan.
Evaluation order violation (EVALUATION_ORDER)
write_write_typo: In queue_create = queue_create = &pCreateInfo->pQueueCreateInfos[0], queue_create is written twice with the same value.
Fixes: 8991e64641 ("pvr: Add a Vulkan driver for Imagination Technologies PowerVR Rogue GPUs")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15604>
It seems that a660 has the same bug. Without the workaround there
are a lot of flakes with depth-stencil tests, e.g. in:
dEQP-VK.pipeline.extended_dynamic_state.*
dEQP-VK.renderpass.depth_stencil_write_conditions.*
dEQP-VK.pipeline.stencil.format.d24_unorm_s8_uint.states.*
Or guaranteed failures like of:
dEQP-VK.pipeline.render_to_image.core.2d.huge.width.r8g8b8a8_unorm_d32_sfloat_s8_uint
Enabling the workaround fixes all of them.
cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15548>
in scenarios like:
vec1 32 ssa_151 = deref_var &shadow_map (uniform sampler2D)
vec1 32 ssa_152 = deref_var &shadow_map (uniform sampler2D)
vec2 32 ssa_153 = vec2 ssa_151, ssa_152
vec1 32 ssa_154 = deref_var ¶m@4 (function_temp uvec2)
intrinsic store_deref (ssa_154, ssa_153) (wrmask=xy /*3*/, access=0)
vec1 32 ssa_160 = deref_var ¶m@4 (function_temp uvec2)
vec2 32 ssa_164 = intrinsic load_deref (ssa_160) (access=0)
vec1 32 ssa_167 = mov ssa_164.x
vec1 32 ssa_168 = deref_cast (texture2D *)ssa_167 (uniform texture2D) /* ptr_stride=0, align_mul=0, align_offset=0 */
vec1 32 ssa_169 = mov ssa_164.y
vec1 32 ssa_170 = deref_cast (sampler *)ssa_169 (uniform sampler) /* ptr_stride=0, align_mul=0, align_offset=0 */
vec1 32 ssa_172 = (float32)tex ssa_168 (texture_deref), ssa_170 (sampler_deref), ssa_171 (coord), ssa_166 (comparator)
the real variable is stored to a function_temp and then loaded back again,
which means it isn't a direct deref and lower_vri_instr_tex_deref() will
crash because the variable can't be found
BUT running only the passes needed to eliminate derefs will break other tests,
so just run the whole optimize loop again here to avoid such issues
for #5945
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15511>
Previously we were just doing a group barrier for both membar and barrier.
This sometimes worked out, because atomics and reads waited for ack
already, but writes were not waiting for ack. Use the need_wait_ack
pattern that scratch writes used, with a little refactoring for
reusability.
The refactor also incidentally fixes the atomics waiting for outstanding
acks to be > 1 instead of > 0.
Cc: mesa-stable
Fixes: #6028
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14429>
Prevents several regressions when NIR-to-TGSI is enabled where it was
allocating arrays on top of each other.
Fixes vec3 fails on RV770,
dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_1 and 2
in general, and fixes another piglit but breaks two others. Still, this
seems to be a win.
Cc: mesa-stable
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14429>
The two types of instructions get added to the same CF list, but not the
same instr list within the CF list. So, if you SSBO fetched your
texcoord, the emission of the SSBO fetch would come *after* the texcoord
fetch.
Avoids regressions when NIR-to-TGSI starts optimizing more.
Cc: mesa-stable
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14429>
Avoids a regression when enabling shader precompilation, where the
precompile would happen with MSAA disabled (so no sample mask export) but
we'd never catch up to the shader being rendered with MSAA.
Doesn't fix any current testcases, though.
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14427>
This test was never supposed to be skipped, and the referenced commit
just exposed a bug in turnip fixed by the previous commit. It was
hanging due to a CTS bug making the submit take way too long, which will
be fixed once the CTS change lands.
Also, add it to the a630 skips.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15563>
In this case we should relax checks based on the format, since the user
will be responsible for them when creating an image view.
This gets dEQP-VK.image.sample_texture.*_bit_compressed_format_* not
skipping again after VK-GL-CTS 736eec57dc0c ("Fix checkSupport in
compressed texture sampling tests").
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15563>
We have to add WFM to pending bits when we are flushing into CP
for indirect draw to know when they should apply WFM workaround.
Fixes CTS tests:
dEQP-VK.draw.renderpass.indirect_draw.*_data_from_compute.indirect_draw_count*
Fixes: abf0ae014a
("tu: Properly handle waiting on an earlier pipeline stage")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15577>
We handle uniforms by copying them into the uniform stream to be
consumed with ldunif when they have a constant offset. Otherwise
we fallback to general TMU access, which has more latency.
However, just like we did for UBOs and read-only SSBOs, we can
also try to use the unifa mechanism to handle indirect accesses
in certain cases instead of the TMU fallback.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>
Inline uniform blocks store their contents in pool memory rather
than a separate buffer, and are intended to provide a way in which
some platforms may provide more efficient access to the uniform
data, similar to push constants but with more flexible size
constraints.
We implement these in a similar way as push constants: for constant
access we copy the data in the uniform stream (using the new
QUNIFORM_UNIFORM_UBO_*) enums to identify the inline buffer from
which we need to copy and for indirect access we fallback to
regular UBO access.
Because at NIR level there is no distinction between inline and
regular UBOs and the compiler isn't aware of Vulkan descriptor
sets, we use the UBO index on UBO load intrinsics to identify
inline UBOs, just like we do for push constants. Particularly,
we reserve indices 1..MAX_INLINE_UNIFORM_BUFFERS for this,
however, unlike push constants, inline buffers are accessed
through descriptor sets, and therefore we need to make sure
they are located in the first slots of the UBO descriptor map.
This means we store them in the first MAX_INLINE_UNIFORM_BUFFERS
slots of the map, with regular UBOs always coming after these
slots.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>
shader_storage_blocks_write_access was computed using the buffer indices
in the program but ShaderStorageBlocksWriteAccess is used with the shader
buffers.
So if a VS had 3 SSBOs and a FS had 4, the mask for VS was 0x3 (correct) but
the mask for the FS was 0x78 instead of 0x15.
Fix this by substracting the index of the first shader buffer in the program's
buffers.
Fixes: 79127f8d5b ("glsl: set ShaderStorageBlocksWriteAccess in the nir linker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6184
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15552>
In general, an atomic intrinsic may perform separate atomics for every
enabled SIMD channel, as each channel may operate on different memory.
However, an extremely common case is for all channels to access the same
memory location. In this case, we can simply perform a reduction/scan
across the subgroup, and perform one atomic for the whole subgroup,
rather than one per channel. For example, if an intrinsic says to take
the minimum value of the existing memory and the value in each channel,
we can do a thread-local minimum of all enabled channels, then do a
single atomic to take the minimum of that and the existing memory.
Our hardware doesn't optimize the case where multiple channels ask for
atomics on the same memory location; it assumes the compiler will do so.
nir_opt_uniform_atomics() uses divergence analysis to detect this case,
adds the necessary subgroup operations, and moves the atomic inside a
conditional that disables all but a single invocation. It even detects
cases where the shader code already performs this kind of optimization,
and avoids doing it a second time.
This may not be the optimal solution for us. In the backend, we could
detect this case and emit send(1) instructions with NoMask, rather than
generating if...send(16)...endif, and a lot of unnecessary ALU ops. But
it's simple to do, reuses the same path as ACO, and still provides most
of the benefit by cutting up to 16x atomics down to a single atomic,
which is more merciful to the memory bus.
Improves performance of Shadow of the Tomb Raider by 5.5% on XeHP.
Improves performance of a customer-internal benchmark on XeHP at
3840x2160 and low settings by approximately 30%.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
We haven't exposed this intrinsic as it doesn't directly correspond to
anything in SPIR-V. However, it's used internally by some NIR passes,
namely nir_opt_uniform_atomics().
We reuse most of the infrastructure in brw_find_live_channel, but with
LZD/ADD instead of FBL. A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is
like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
- load_reloc_const is just an immediate constant load, it's convergent.
- nir_intrinsic_load_global_const_block_intel should be convergent,
it says the address must be uniform, and we uniformize the predicate
- Lowered image intrinsics: image_deref_load_param_intel just reads
information about an image, as long as the image variable is
convergent it should be too. load_raw_intel...if the address we
come up with is convergent, it ought to be as well.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
Instead of reusing the in/out slot mechanism, use a separated NIR
variable mode. This will make easier later to implement staging the
output in shared memory (and storing all at the end to the URB).
Note to get 64-bit type support we currently rely on the
brw_nir_lower_mem_access_bit_sizes() pass.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15022>
The only ways that function can return NULL are:
- the xcb connection was closed
- the window for the swapchain was destroyed
- the special event listener was unregistered from another thread
- malloc failure
All of these are permanent errors, the swapchain is no longer in a
usable state, so we should treat this as VK_ERROR_SURFACE_LOST_KHR.
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15558>
Logic is lifted from bi_layout.c, adapted to work on instructions (not
clauses) and for Valhall's off-by-one semantic which is annoyingly
different than Bifrost. (But the same as Midgard -- Bifrost was
annoyingly different than Midgard!)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15223>
We already collect enums in the ISA description XML. Export them for use in the
compiler backend, particularly the packing code.
Usually we'd use Mako for templating. In this case, the script is so trivial a
template engine didn't seem worth it. (The obvious version with Mako was about
10 lines longer than just prints and f-strings used here.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15223>
If there are modifiers only used by pseudo instructions, not the real
instructions, bi_packer can get out-of-sync with bi_opcodes, causing
hard-to-debug issues. Do the stupid-simple thing to ensure this doesn't happen.
This may be a temporary issue, depending whether ISA.xml and the IR get split
out for better Valhall support.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15223>
NIR load_consts/inputs tend to happen together at the top of the program.
In the TGSI backend the loads got emitted at use time, while the NIR
backend was emitting the loads at load intrinsic time. By sinking the
intrinsics, we can greatly reduce register pressure.
nv92 NIR results:
total local in shared programs: 2024 -> 2020 (-0.20%)
local in affected programs: 4 -> 0
total gpr in shared programs: 790424 -> 735455 (-6.95%)
gpr in affected programs: 215968 -> 160999 (-25.45%)
total instructions in shared programs: 6058339 -> 6051208 (-0.12%)
instructions in affected programs: 410795 -> 403664 (-1.74%)
total bytes in shared programs: 41820104 -> 41660304 (-0.38%)
bytes in affected programs: 7147296 -> 6987496 (-2.24%)
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15542>
Fix 'error C4576: a parenthesized type followed by an initializer
list is a non-standard explicit type conversion syntax' errors by
declaring an actual variable and returning it in
vk_image_view_subresource_range().
All those MSVC/c++ related-constraints are quite annoying to be honest,
but it looks like the D3D12 headers have been updated to plain C
recently, which will allow us to write the driver in C, and hopefully
get all this sort of issues behind us.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14766>
Implements vkCreateSampler and vkDestroySampler APIs.
Also fixes maxSamplerLodBias value from 15.0f to 16.0f
as it's the max supported by our hardware.
Also changing maxSamplerAnisotropy from 16.0f to 1.0f to
temporarily disable anisotropy as we are missing software
support for it.
Signed-off-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15557>
the previous method of using affected_states to trigger constant updates
was ineffectual in the scenario where a ubo pointsize was needed on
the first time a non-precompiled shader was used after being the not-last
vertex stage:
* have vs+gs -> gs precompiles with pointsize lowering -> gs constants get updated
* remove gs -> vs was precompiled without pointsize lowering -> vs constants broken
now just do a quick check as in st_atom_shader.c and set the flag manually to
ensure the update is done correctly every time
cc: mesa-stable
fixes#6207
fixes (radv):
KHR-GL46.texture_cube_map_array.image_op_fragment_sh
KHR-GL46.texture_cube_map_array.sampling
KHR-GL46.texture_cube_map_array.texture_size_fragment_sh
KHR-GL46.constant_expressions*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15570>
Freedreno will check if the virtgpu supports the pass-thru context, and
if not will bail, falling back to virgl.
TODO this requires that virgl is also enabled in the mesa build, even if
it is not needed.. maybe there is a better way to handle this?
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
Add a new backend to enable using native driver in a VM guest, via a new
virtgpu context type which (indirectly) makes host kernel interface
available in guest and handles the details of mapping buffers to guest,
etc.
Note that fence-fd's are currently a bit awkward, in that they get
signaled by the guest kernel driver (drm/virtio) once virglrenderer in
the host has processed the execbuf, not when host kernel has signaled
the submit fence. For passing buffers to the host (virtio-wl) the egl
context in virglrenderer is used to create a fence on the host side.
But use of out-fence-fd's in guest could have slightly unexpected
results. For this reason we limit all submitqueues to default priority
(so they cannot be preepmted by host egl context). AFAICT virgl and
venus have a similar problem, which will eventually be solveable once we
have RESOURCE_CREATE_SYNC.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
We are going to want basically the identical thing, other than
flush_submit_list, for virtio backend. Now that we've moved various
other dependencies into the base classes, extract out an abstract base
class for submit/ringbuffer.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
Add a hint for buffers that we won't need to mmap. With the virtio
backend, virglrenderer needs to create a dmabuf fd for mapping into
the host, which we want to avoid when possible.
Low hanging fruit is to use this hint for anything tiled/ubwc. There
are probably more bo's that can be flagged as such.
TODO add fd_bo_upload() for memcpy to bo.. this would be useful for
uploads, for example, shaders which we just write once and never touch
again.. for virtio this could be implemented with a TRANSFER_TO_HOST
ioctl.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
Decoupling handle and fd_bo creation simplifies things for "normal" drm
drivers, avoiding duplication for the create vs import paths. But this
is awkward for the virtio backend when wants to do multiple things in
the same guest<->host round trip.
So instead, split the paths in the interface backend and move the code
sharing for the two different paths into the msm backend itself.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
Based on profiling, these 10us sleeps are behaving closer to ~60us
and causing higher-than-necessary CPU overhead. 120us seems like a
good balance between latency management and overhead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15565>
This transitions the resource to TRANSFER_SRC_OPTIMAL, but
never updates the res->layout field, so subsequent transitions
are wrong and throw validation errors.
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x6660f60, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x6660f60[] expects VkImage 0x2f000000002f[] (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15545>
This patch adds the PowerVR driver to the following shared builds:
* debian-arm64
* debian-clang
* debian-release
* debian-vulkan
* fedora-release
It also adds the associated "tools" to these builds:
* debian-arm64-build-test
* debian-release (already uses -Dtools=all)
* fedora-release
And removes them from these builds by expanding -Dtools=all:
* debian-gallium
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15284>
unreachable() doesn't lead to executing the code that follows it,
neither in debug nor release builds. So falling through doesn't make any
sense.
This fixes a compile-error on clang.
Let's move the default-block to the end to make it clearer that there's
no intended fallthrough.
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15531>
unreachable() doesn't lead to executing the code that follows it,
neither in debug nor release builds. So falling through doesn't make any
sense.
This fixes a compile-error on clang.
Let's move the default-block to the end to make it clearer that there's
no intended fallthrough.
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15531>
First, there is a problem if you do the following
vkCmdSetColorWriteEnableEXT(attachmentCount = 8)
vkCmdBindPipeline(GFX, with attachmentCount = 4)
vkCmdDraw()
vkCmdBindPipeline(GFX, with attachmentCount = 8)
vkCmdDraw()
Because in the dynamic state emission code we rely on the first
pipeline to figure the number of BLEND_STATE entries to prepare. This
is wrong, we should fill all entries so that the dynamic state works
regardless of the number of attachments in the pipeline. With regard
to the dynamic values, we should retain enable/disable values that do
not concern the current pipeline.
Second, 3DSTATE_WM was not always reemitted when the pipeline changed.
But since it is not emitted as part of the pipeline, this results in
inconsistent state being programmed.
Third, we end up disabling the fragment stage completely in some
cases. And that is programming the pipeline inconsistently and
triggering a hang on TGL.
v2: Fix comment (Tapani)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b15bfe92f7 ("anv: implement VK_EXT_color_write_enable")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
The problem is that we missed looking at pipeline changes. Pipelines
hold bits of dynamic states and when it changes we might need to
reemit a packet.
v2: fix comment (Tapani)
Add missing anv_cmd_buffer_needs_dynamic_state() use (Tapani)
Cc: mesa-stable
Fixes: 505d176a8e ("anv: disable baked in pipeline bits from dynamic emission path")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
error log:
```
[2/43] Compiling C object src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj
FAILED: src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj
"cc" "-Isrc/gallium/drivers/llvmpipe/libllvmpipe.a.p" "-Isrc/gallium/drivers/llvmpipe" "-I../../src/gallium/drivers/llvmpipe" "-I../../src/gallium/include" "-Isrc/gallium/auxiliary" "-I../../src/gallium/auxiliary" "-Iinclude" "-I../../include" "-Isrc" "-I../../src" "-Isrc/compiler/nir" "-I../../src/compiler/nir" "-Isrc/util" "-I../../src/util" "-IC:/CI-Tools/msys64/clang64/include" "-fvisibility=hidden" "-fcolor-diagnostics" "-Wall" "-Winvalid-pch" "-std=c11" "-O0" "-g" "-DPACKAGE_VERSION=\"22.0.0-devel\"" "-DPACKAGE_BUGREPORT=\"https://gitlab.freedesktop.org/mesa/mesa/-/issues\"" "-DHAVE_WINDOWS_PLATFORM" "-DHAVE_SURFACELESS_PLATFORM" "-DUSE_ELF_TLS" "-DUSE_TLS_BEHIND_FUNCTIONS" "-DENABLE_ST_OMX_BELLAGIO=0" "-DENABLE_ST_OMX_TIZONIA=0" "-DEGL_NO_X11" "-DDEBUG" "-DHAVE___BUILTIN_BSWAP32" "-DHAVE___BUILTIN_BSWAP64" "-DHAVE___BUILTIN_CLZ" "-DHAVE___BUILTIN_CLZLL" "-DHAVE___BUILTIN_CTZ" "-DHAVE___BUILTIN_EXPECT" "-DHAVE___BUILTIN_FFS" "-DHAVE___BUILTIN_FFSLL" "-DHAVE___BUILTIN_POPCOUNT" "-DHAVE___BUILTIN_POPCOUNTLL" "-DHAVE___BUILTIN_UNREACHABLE" "-DHAVE___BUILTIN_TYPES_COMPATIBLE_P" "-DHAVE_FUNC_ATTRIBUTE_CONST" "-DHAVE_FUNC_ATTRIBUTE_FLATTEN" "-DHAVE_FUNC_ATTRIBUTE_MALLOC" "-DHAVE_FUNC_ATTRIBUTE_PURE" "-DHAVE_FUNC_ATTRIBUTE_UNUSED" "-DHAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT" "-DHAVE_FUNC_ATTRIBUTE_WEAK" "-DHAVE_FUNC_ATTRIBUTE_FORMAT" "-DHAVE_FUNC_ATTRIBUTE_PACKED" "-DHAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL" "-DHAVE_FUNC_ATTRIBUTE_ALIAS" "-DHAVE_FUNC_ATTRIBUTE_NORETURN" "-DHAVE_FUNC_ATTRIBUTE_VISIBILITY" "-DHAVE_UINT128" "-D_WINDOWS" "-D_WIN32_WINNT=0x0A00" "-DWINVER=0x0A00" "-DPIPE_SUBSYSTEM_WINDOWS_USER" "-D_USE_MATH_DEFINES" "-DUSE_SSE41" "-DUSE_GCC_ATOMIC_BUILTINS" "-DHAS_SCHED_H" "-DHAVE_CET_H" "-DHAVE_STRTOF" "-DHAVE_STRTOK_R" "-DHAVE_QSORT_S" "-DHAVE_ZLIB" "-DHAVE_ZSTD" "-DHAVE_COMPRESSION" "-DLLVM_AVAILABLE" "-DMESA_LLVM_VERSION_STRING=\"13.0.0\"" "-DLLVM_IS_SHARED=1" "-DDRAW_LLVM_AVAILABLE" "-DMESA_EXECMEM" "-DVK_USE_PLATFORM_WIN32_KHR" "-Werror=implicit-function-declaration" "-Werror=missing-prototypes" "-Werror=return-type" "-Werror=empty-body" "-Werror=incompatible-pointer-types" "-Werror=int-conversion" "-Wimplicit-fallthrough" "-Wno-missing-field-initializers" "-fno-math-errno" "-fno-trapping-math" "-Qunused-arguments" "-fno-common" "-Wno-microsoft-enum-value" "-Werror=format" "-Wformat-security" "-Werror=thread-safety" "-ffunction-sections" "-fdata-sections" "-pthread" "-D_FILE_OFFSET_BITS=64" "-D__STDC_CONSTANT_MACROS" "-D__STDC_FORMAT_MACROS" "-D__STDC_LIMIT_MACROS" "-Werror=pointer-arith" "-Werror=gnu-empty-initializer" -MD -MQ src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj -MF "src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj.d" -o src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj "-c" ../../src/gallium/drivers/llvmpipe/lp_setup_tri.c
In file included from ../../src/gallium/drivers/llvmpipe/lp_setup_tri.c:37:
In file included from ../../src/gallium/drivers/llvmpipe/lp_setup_context.h:38:
In file included from ../../src/gallium/drivers/llvmpipe/lp_setup.h:31:
In file included from ../../src/gallium/drivers/llvmpipe/lp_jit.h:40:
In file included from ../../src/gallium/auxiliary/gallivm/lp_bld_limits.h:37:
In file included from ../../src/util/u_cpu_detect.h:41:
In file included from ../../src/util/u_thread.h:35:
In file included from ../../include/c11/threads.h:64:
In file included from ../../include/c11/threads_win32.h:58:
In file included from C:/CI-Tools/msys64/clang64/x86_64-w64-mingw32/include/windows.h:69:
In file included from C:/CI-Tools/msys64/clang64/x86_64-w64-mingw32/include/windef.h:9:
In file included from C:/CI-Tools/msys64/clang64/x86_64-w64-mingw32/include/minwindef.h:163:
In file included from C:/CI-Tools/msys64/clang64/x86_64-w64-mingw32/include/winnt.h:1555:
In file included from C:/CI-Tools/msys64/clang64/lib/clang/13.0.0/include/x86intrin.h:15:
In file included from C:/CI-Tools/msys64/clang64/lib/clang/13.0.0/include/immintrin.h:37:
C:/CI-Tools/msys64/clang64/lib/clang/13.0.0/include/tmmintrin.h:582:1: error: redefinition of '_mm_shuffle_epi8'
_mm_shuffle_epi8(__m128i __a, __m128i __b)
^
../../src/gallium/auxiliary/util/u_sse.h:159:1: note: previous definition is here
_mm_shuffle_epi8(__m128i a, __m128i mask)
^
1 error generated.
```
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14077>
Because of incomplete codepaths further down in this function, we end up
trying to use this variable without initialization. Let's initialize it
to zero, just to keep the compiler happy.
This avoids a compile-time warning, which would get treated as an error
on CI.
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15525>
There's no guarantee that we don't have more than 128 PHI values either,
so let's stop asuming so.
We do this by changing dxil_phi_set_incoming to dxil_phi_add_incoming,
which lets us add more incoming phi-values to the current one instead of
setting a new set of them.
This also lets us reduce these stack-arrays a bit, down to something
much more reasonable.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15519>
As this function doesn't check for any control-flow
dependence, it only returns true for statically
(or globally) uniform values.
The same holds true for is_binding_dynamically_uniform()
in nir_opt_gcm().
Rename to better reflect that property.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14994>
the fastpath here can only be taken if there is exactly one stream active,
as this will otherwise break nonzero stream primitives generated queries
in truth, this num_vertex_streams thing should be a bitmask so that the case
of num_streams=1,stream_id!=0 could also be fastpathed, but the complexity
probably isn't worth it given the infrequency of use
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15506>
this can't be determined from pipe_shader_state::stream_output,
as this only contains xfb info, which is not the same as the vertex
stream info, and may break primitives generated queries
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15506>
When using cubemaps and the u/v values are 0, then this point
can be arrived at with rho = nan, and if rho is NaN, then lod
calculations end up at the max lod, whereas the spec suggests
they should end up at the most negative lod.
This fixes
dEQP-VK.glsl.texture_functions.query.texturequerylod.samplercube_float_zero_uv_width_fragment
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15335>
Otherwise, if batch->scissor_culls_everything is set for a single draw,
every draw after it in the batch will be skipped because the new
scissor/viewport state will never be processed. Process scissor state
early in draw_vbo to fix this interaction.
We do need to be careful: setting something on the batch can only happen when
we've decided on a batch. If we have to select a fresh batch due to too many
draws, that must happen first. This is pretty clear in the code but worth noting
for the diff.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Icecream95 <ixn@disroot.org>
Reviewed-by: Icecream95 <ixn@disroot.org>
Fixes: 79356b2e ("panfrost: Skip rasterizer discard draws without side effects")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5839
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6136
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15365>
There is an out-of-sync approach regarding the location of the results
folder: some scripts refer to it via $CI_PROJECT_DIR/results, while
others just assume it is located in the current working directory.
Usually $PWD points to $CI_PROJECT_DIR, but in some cases this is not
the case, hence let's ensure the 'results' folder can always be found
in the current working directory.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
In order to run a VM (e.g. crosvm) through HWCI_TEST_SCRIPT on a LAVA
target, it's necessary to download a kernel image on the target device.
When HWCI_KVM is set to 'true', we can safely assume HWCI_TEST_SCRIPT
contains a command or the path to a script which expects the kernel
image to be available under /lava-files/${KERNEL_IMAGE_NAME}.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
Having CI_JOB_JWT_FILE pointing to path on /tmp makes it difficult to be
managed in VM contexts (e.g. crosvm) because the /tmp mountpoint usually
refers to a local filesystem rather than the host one.
Additionally, there is another restriction for 'piglit-traces-test' job
to have the file available on the root filesystem.
To avoid amending all the jobs that might be affected, let's just set
the variable to a fixed path '/minio_jwt'. Note we also need to do this
in the 'variables:' section instead of 'before_script:' in order to be
able to reference it in CI job variables, e.g. PIGLIT_REPLAY_EXTRA_ARGS
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
Use a dedicated DEQP_RUNNER_CARGO_ARGS variable instead of
EXTRA_CARGO_ARGS in build-deqp-runner.sh to pass custom arguments when
invoking 'cargo install'.
This is to avoid modifications of EXTRA_CARGO_ARGS which might have
negative side-effects in the scripts which rely on this variable and
import build-deqp-runner.sh instead of executing it in a subshell.
Fixes: 8729c6e981 ("ci: Support building and installing deqp-runner from source")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
R8G8 have a different block width/height and height alignment from other
formats that would normally be compatible (like R16), and so if we are
trying to, for example, sample R16 as R8G8 we need to demote to linear.
Follows the fix in Freedreno: b97e3bb2e1
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15465>
Replaced the existing internal TGSI compute shader, which clears
a read-modify-write buffer, with its NIR equivalent. The disassembly
shader generated by the new NIR variant is identical to the previous
implementation. These changes remove the additional conversion step
from TGSI to NIR for the shader at runtime. Tested on a Navi 23 card.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15356>
When CCS compression first came out on Skylake, we referred to it as
"renderbuffer compression", or RBC for short. However, that name has
long since fallen out of favor, and we refer to it as CCS nearly
everywhere.
This patch renames DEBUG_NO_RBC to DEBUG_NO_CCS inside the codebase
for clarity, and adds INTEL_DEBUG=noccs. The legacy INTEL_DEBUG=norbc
name continues to work, because it's one line of code and having both
names makes our lives easier in the interim.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15447>
by disabling color and depth write, the side effects of force-disabling discard can
be mitigated
fixes:
KHR-GL46.tessellation_shader.single.isolines_tessellation
KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.data_pass_through
KHR-GL46.tessellation_shader.tessellation_invariance.invariance_rule3
KHR-GL46.tessellation_shader.tessellation_shader_point_mode.points_verification
KHR-GL46.tessellation_shader.tessellation_shader_quads_tessellation.degenerate_case
KHR-GL46.tessellation_shader.tessellation_shader_quads_tessellation.inner_tessellation_level_rounding
KHR-GL46.tessellation_shader.tessellation_shader_tessellation.gl_InvocationID_PatchVerticesIn_PrimitiveID
KHR-GL46.tessellation_shader.vertex.vertex_spacing
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15392>
From docs:
The PVS Instruction which uses the Input Vertex Memory for the last
time. This value is used to free up the Input Vertex Slots ASAP.
This field must be set to a valid instruction.
Right now it is set to the last instruction. When the last read is
inside a loop, set it on the outhermost ENDLOOP. This could in theory
help performance, but none of my usual benchmarks including GLmark,
Unigine Sanctuary or Lightsmark show any measurable performance difference.
Suggested in: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6045
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15252>
this is a lot of machinery to propagate the block sizes down from the
descriptor layout to the pipeline layout to the rendering_state
block data is appended to ubo0 immediately following push constant
data (if it exists), which requires that a new buffer be created and
filled any time either type of data changes
shader handling is done by propagating the offset of each block relative
to the start of its descriptor set, then accumulating the sizes of
every uniform block in each preceding descriptor set into the offset,
then adding on the push constant size, and finally adding that on to
the existing load_ubo deref offset
update-after-bind is no longer an issue since each instance of pc+block
data is its own immutable buffer that can never be modified
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15457>
now instead of having static per-stage buffer regions and letting llvmpipe
do the upload, lavapipe creates a new pipe_resource and chucks it away
with take_ownership=true to allow it to be destroyed once it's no longer
in use
this also alters ubo0 mechanics such that the buffer is now sized exactly to
the size of the push constants in the pipeline and push constants are only
updated when the appropriate shader stage is flagged
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15457>
the converted number of vertices is 2x that of the "real" number of vertices,
so divide the results by 2
this works nicely since zink performs cpu readback for all queries, and shader
aggregation should be simple in the future as well
fixes:
KHR-GL46.pipeline_statistics_query_tests_ARB.functional_primitives_vertices_submitted_and_clipping_input_output_primitives
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15401>
Without this, 128-bit or 64-bit register spills can generate unaligned
loads and stores if tlsBase is unaligned.
Fixes glsl-1.50/execution/variable-indexing/gs-input-array-vec3-index-rd
with NV50_PROG_USE_NIR=1 on kepler
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13693>
ir3_compile_shader_nir() should set local_size[] and local_size_variable
fields not only for compute shaders, but for the OpenCL kernels too.
v2: use gl_shader_stage_is_compute() instead of explicit comparison with
MESA_SHADER_[COMPUTE,KERNEL].
Signed-off-by: Andrey Konovalov <andrey.konovalov@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14863>
bare-metal can reboot boards into an existing rootfs on intermittent
device failure, but traces-db doesn't do any sanity-checking of the local
downloads of traces and would proceed to just trying to replay them.
Nuke any existing trace db so that it re-downloads every time, same as
LAVA or docker container tests do.
Fixes: #5585
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15440>
We get a lot of useful coverage from running graphicsfuzz with spilling
enabled, but it's also pretty slow and can cause intermittent hangcheck
failures. I thought I'd categorized them when merging !14839 (device loss
on reset), but it looks like not all of them and we're now more likely to
have flakes take out the whole test run when a single flake makes the rest
of the caselist a flake.
This is a little unfortunate in that it means our test environment is not
the same as a stock system you would want to run deqp on to submit
conformance, but I think it's an improvement in the test maintenance work
vs needing to fix things up later.
We have some other tests besides turnip that can trigger hangchecks which
we might also like this increase for (some disabled traces, for example).
However, freedreno GL has a 5-second timeout waiting for idle when
mapping, and a couple of 2-second timeouts in a row can result in spurious
failures in other tests!
Fixes: #6163
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15435>
We use bi_dontcare() to specify any encoding where we don't care about
the value, with a preference for power-efficient encodings. On Bifrost,
a (possibly nonexistant) FAU read is the best encoding. On Valhall, that
encoding doesn't exist so just use a zero. That should be good enough in
practice.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15461>
When emitting code during or after register allocation, we need to be able to
emit constants without running the constant->{LUT, move, uniform} pass running
after. In particular, we need to access the constant 0 to implement spill code.
Add a Valhall-specific zero for this purpose.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15461>
Instead of one MANUAL_COMMANDS, we now have two deny-lists:
MANUAL_COMMANDS and NO_ENQUEUE_COMMANDS. The former is for things which
have a manually typed implementation in vk_cmd_enqueue.c and the later
is for things we want to ignore entirely. This lets us auto-generate
vk_cmd_enqueue_unless_primary_Cmd* entrypoints for the manually typed
vk_cmd_enqueue_Cmd* entrypoints.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14406>
This fixes bugs with lavapipe's hand-rolled pCount handling. The driver
is supposed to set *pCount to the number of queues actually written in
the case where it's initialized to a value that's too large. It's also
supposed to handle *pCount being too small.
Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15459>
The Vulkan spec was recently clerified to say that transitions only
happen to the bound layers:
"Automatic layout transitions apply to the entire image subresource
attached to the framebuffer. If multiview is not enabled and the
attachment is a view of a 1D or 2D image, the automatic layout
transitions apply to the number of layers specified by
VkFramebufferCreateInfo::layers. If multiview is enabled and the
attachment is a view of a 1D or 2D image, the automatic layout
transitions apply to the layers corresponding to views which are
used by some subpass in the render pass, even if that subpass does
not reference the given attachment."
This is in the context of render passes but it applies to dynamic
rendering because the implicit layout transition stuff is a Mesa pseudo-
extension and inherits those rules.
For clears, the Vulkan spec says:
"renderArea is the render area that is affected by the render pass
instance. The effects of attachment load, store and multisample
resolve operations are restricted to the pixels whose x and y
coordinates fall within the render area on all attachments. The
render area extends to all layers of framebuffer."
Again, this is in the context of render passes but the same principals
apply to dynamic rendering where the layerCount and renderArea are
specified as part of the vkCmdBeginRendering() call.
Fixes: 3501a3f9ed ("anv: Convert to 100% dynamic rendering")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15441>
Any thread we create may end up creating/submitting at least a
noop job, which is a shared object. Before multisync, this was
an issue only for the creation of the job itself, but with
multisync we can also modify parameters of the noop job
every time it is used (for signaling and serialization
configuration).
This change adds a noop mutex that all threads (main, wait and
master) take before submitting a noop job to ensure concurrent
access is not an issue.
Fixes flakyness observed with multisync with the following test:
dEQP-VK.api.command_buffers.secondary_execute_twice
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
If a CPU job comes first in a command buffer with a semaphore wait operation
we need to wait on the CPU for the semaphore to be signaled before we process
the job.
We have been doing this with a WaitForIdle operation, but that only works
if the semaphore has been submitted for signaling from the same instance
of the driver. If we have an imported payload from another instance in our
semaphore however, waitForIdle may return too early since the submission
to signal the semaphore may have been submitted by a different instance
of the driver as well, and our wait for idle checks only know about this
instance submissions.
To fix this, we always submit a noop job from our instance that waits on
the semaphores on the GPU and follow up with WaitForIdle to wait for that
to complete.
Fixes test failures and/or assert crashes in:
dEQP-VK.synchronization.cross_instance.*
(when enabling support for semaphore imports)
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
When we have a wait thread we can't ensure that the last job in the last
command buffer will be the one to signal semaphores because in this case
there is no gurantee that jobs from command buffers in the batch will be
submitted to the GPU in order, as those put in a wait thread will be
submitted later when the event wait operation is completed.
Instead, we need to wait for all outstanding wait threads to complete
and only then we should signal any semaphores or fences.
This also fixes a bug where the wait for events was the last job in
the command buffer. In this case, once the event wait is completed
we have no additional jobs to submit and thus would never try to
signal semaphores or fences.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
This is preparatory work to expose support for importing semaphores, which
was waiting on kernel multisync support.
When we implemented user-space multisync support we didn't handle
temporary fence/semaphore payload imports at all, so we fix that here.
Also, we add a has_temp boolean flag to identify the case where we have
a temporary payload in a fence/sempahore instead of just checking if
temp_sync is not 0. This is necessary to support semaphore imports
(for which we are not exposing support yet) because these need to drop
the temporary payload when they are used as wait semaphores in a submit,
but we can't destroy the underlying temp_sync at that point because it
needs to survive at least until the submit is finished, so instead
we use a flag to tell if we have an active temporary payload or not,
and we simply destroy any temp_sync on a semaphore destroy or any new
import on the same semaphore. We only strictly need this flag for
semaphores because fences drop the temporary payload when they are
reset, which happens in the CPU and can only be done if the GPU is not
using the fence, but we add the same flag for the fence for consistency.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
This path uses a shader blit to implement the copy which is only
supported for tiled images (except 1D). While blit_shader() already
checks for this, this path does a lot of heavy lifting to prepare for
the blit_shader call so we rather avoid that if possible when we know
blit_shader won't be able to implement the blit.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
We had some code that considered the possibility that the destination
might be linear when configuring TFU jobs, but we never actually allow
for this to happen since we avoid hitting these paths in that case, as
the TFU always produces UIF results. Instead, add an assert when
producing the TFU packet to ensure we are expecting a UIF result.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
this fixes desync+crash when:
1. usage is added for bs A
2. tracking is added for bs B
3. tracking is removed for bs B
4. context is destroyed
5. usage A is now dangling and will crash if accessed
as seen in glmark2
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15429>
In addition to use the helper, we also remove some of the lowering we
had at preprocess_nir, as they are called now by the helper.
As we are here we also move the call to nir_lower_sysvals_to_varyings,
that for some reason we were calling it before preprocess_nir.
It is worth to note that with this change we lose the ability to debug
the NIR just after spirv_to_nir using V3D_DEBUG, as now this is done
on vk_spirv_to_nir, and as mentioned that includes several lowerings
now. The workaround to that is to use NIR_DEBUG.
We also needed to change how to check the entrypoint on the broadcom
compiler, checking just if it is an entrypoint, instead of assuming
that the name will be "main".
v2: tweak comment, squash v3dv and compiler change (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15449>
This fixes the test_resolve_non_issued_query_data vkd3d-proton test.
This change is required on TGL+ (maybe ICL?) because on all platforms
3D pipeline writes are not coherent with CS. On previous platform we
fixed this by flushing the render cache to make sure data is visble to
CS before it writes to memory. But on more recently platforms,
flushing the render cache leaves the data in the tile cache which is
still not coherent with CS, so we need to flush that one too.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14552>
It stores the compiled shaders on disk so further executions will load
them from disk instead of compiling, improving the overall performance.
This is noticeable in games where they suddenly get stuck for a while
because they start to compile one or more shaders.
v2:
- Remove comment (Alejandro)
- Use malloc() + helper to simplify code (Alejandro)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15380>
This has been implemented for a while but we could not expose it on
Vulkan 1.0 because the extension declares a dependency on
VK_KHR_sampler_ycbcr_conversion, which we don't implement, and
CTS would complain.
On Vulkan 1.1 however, VK_KHR_sampler_ycbcr_conversion was promoted
to core as an optional feature, and this is enough for the the
dependency to be satisfied, even if the feature is not supported,
meaning that we can now expose the extension.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15426>
Most of the time, this doesn't matter. On the versions with _sat, if
the destination type is incorrect, the clamping will not happen
correctly.
Fixes the following CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_uu_v4i8_out32
v2: Update anv-tgl-fails.txt.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Fixes: 0f809dbf40 ("intel/compiler: Basic support for DP4A instruction")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15417>
The previous alignment of 64 bytes, which we got from the blob,
indicates that single-texel alignment isn't supported. So just do a
trivial no-op implementation that returns the same alignment as before.
This matches what newer blobs that expose this extension do.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15427>
Now all special FAU handling is unified, which makes both assembler and
disassembler considerably nicer. This adds some more special FAU indices from
page 0 that were previously missing, allowing them to be assembled and
disasembled.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15364>
FAU pages do not need to be specified explicitly in the assembly. Rather, they
should be inferred by the assembler by the instructions used. Rewrite the code
handling this in alignment with new information about the hardware.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15364>
This reuses the same UBO analysis to do the pushing in the shader
preamble via the ldc.k instruction instead of in the driver via
CP_LOAD_STATE6. The const_data UBO is exempted as it uses a different
codepath that isn't as critical.
Don't do this on gallium because there are some regressions. Aztec Ruins
in particular regresses a bit, and nothing I've benchmarked benefits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
Since the NIR pass runs very late, it needs to be aware of preambles,
and when creating the instruction we need to move it to the start block
so that RA doesn't overwrite it in the preamble.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
Add in the type, even though it turns out to not be that useful. Add
in support for assembling it. Add some notes based on computerator
experiments. And add support for the indirect a1.x mode that's needed
for storing c64.x and later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
These will be used to implement the ir3-specific shader preamble
lowering in NIR. shps is conceptually similar to getone (although it
technically can't be duplicated) and shpe is similar to other barriers,
since it has to happen after any stores to the constant file in the
preamble. Add NIR intrinsics and plumbs them through ir3.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
Previously we included the reserved user consts (for Vulkan push
constants) as part of the pushed UBO contents, but that led to a problem
because when calculating the worst-case space for UBOs we didn't factor
in the reserved user consts. We'll have the same problem when doing the
same thing in the preamble optimization pass. Stop including the
reserved size in ubo_state::size, and have ir3_setup_consts() add it in
instead, so we won't forget to add it anywhere.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
This pass tries to move computations that are uniform for the entire
draw to the preamble. There's also an API for backends to insert their
own instructions into the preamble, for porting existing UBO pushing
passes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
These are functions that run before the entrypoint at least once per
draw and write their results via store_preamble, and then are loaded in
the rest of the shader via load_preamble.
We will add users in the following commits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
If you hadn't already called wsi_GetPhysicalDeviceDisplayProperties2KHR or
wsi_GetDrmDisplayEXT before calling
GetPhysicalDeviceDisplayPlaneProperties2KHR, then the connectors list
wouldn't be populated and you'd get no plane properties. Fixes failure of
dEQP-VK.wsi.display.get_display_plane_capabilities when run on its own.
Fixes: #4575
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15353>
ANGLE-on-venus-on-turnip and zink-on-turnip want real data here for EGL's
reset tests.
This required moving the remaining GPU-reset-causing tests from flakes or
xfails to skips. Otherwise, the rest of the caselist associated with them
ends up being marked as fails as well. The alternative would be to put
these tests in their own test groups with tests_per_group = 1, but that
didn't seem worth the effort. Or, we could finally do something with
https://gitlab.freedesktop.org/anholt/deqp-runner/-/issues/14.
Fixes: #5955
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14839>
It's tricky to always get the render area to the viewport code. In
particular, it's not provided to secondary command buffers as part of
the inheritance info so we have to bend over backwards and look for a
framebuffer. With VK_KHR_dynamic_rendering, there is no framebuffer and
this approach won't work and we'll need something better if we want
competent guardbands in secondary command buffers.
The good news is that any client that's sloppily rendering and trusting
the clipper to keep things inside the render area will set a scissor and
that's something they have to set inside the secondary. We can dig
through the scissor state and also include the corresponding scissor (if
any) and use that for our render area. This should give us the same
secondary command buffer performance with VK_KHR_dynamic_rendering.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
This is about the only "small" change we can make in the process of
converting from render-pass-based to dynamic-rendering-based. Make
everything in pipeline creation work in terms of dynamic rendering and
create the dynamic rendering structs from the render pass as-needed.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
This was ill-conceived at best. Yes, it checks for a few error
conditions but it doesn't check much and what checks it has are very far
away from the code that relies on those invariants. If we care about
these invariants, we should add asserts near the code that makes those
assumptions rather than pretending to be the validation layers.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
These helpers are used by vkCreateGraphicsPipelines to get the
VkPipelineRenderingCreateInfo and in vkCmdBeginCommandBuffer to get the
VkCommandBufferInheritanceRenderingInfo. This is required because the
Vulkan runtime code can't yet hook and modify calls made to driver-
provided functions. Instead, we just provide a helper to be used in leu
of vk_find_struct_const(). The structs themselves are stored in the
render pass so we can pass back a pointer and there's no need to
construct one on the stack or stuff it in the pipeline.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
This implements vkCmdBeginRenderPass, vkCmdEndRenderPass, and
vkCmdNextSubpass in terms of the new vkCmdBegin/EndRendering included in
VK_KHR_dynamic_rendering and Vulkan 1.3. All subpass dependencies and
implicit layout transitions are turned into actual barriers. It does
require VK_KHR_synchronization2 because it always uses the 64-bit
version of the pipeline stage and access bitfields.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
this draw mode in particular requires driver-specific conversions
for queries (e.g., number of vertices), so pass that info through
the only limitation is that it doesn't work for dlists,
but I have yet to see a real use case of a statistics query being used with dlists
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15326>
If the base register is killed, it may be reused as the destination of a
ldp. In that case we should just skip resetting it afterwards.
Fixes regressions in dEQP-VK.ssbo.layout.random.scalar.38 later.
Fixes: 9912c61362 ("ir3/spill: Support larger spill slot offset")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15288>
We need to prepare for storage buffers having different sizes from
uniform buffers. This switches dynamic_offset_offset to have units of
bytes, the same as offset, and as a nice bonus we can more easily
combine the dynamic and non-dynamic paths in various different places.
This also entails rewriting the code that patches dynamic descriptors,
since we can no longer assume a linear mapping between indices in
dynamicOffsets and descriptor locations which the previous approach
heavily relied on.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15288>
From the Vulkan spec:
"drmFormatModifierTilingFeatures is a bitmask of
VkFormatFeatureFlagBits that are supported by any image created
with format and drmFormatModifier. The returned
drmFormatModifierTilingFeatures must contain at least one bit."
This fixes recent CTS dEQP-VK.drm_format_modifiers.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15383>
We're nowhere close to even having Vulkan 1.0 working yet, there's no
reason to get too excited about 1.1. It just means piles more test
crashes for features we're claiming to support but don't. If we want to
enable more tests, we can turn on the extensions for those features once
we actually have them working.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15334>
cmd_buffer->pool should never be NULL. Even if AllocateCommandBuffers()
fails, the successfully created cmdbuffers would have it set correctly.
From the Vulkan spec:
"VUID-vkFreeCommandBuffers-pCommandBuffers-parent
Each element of pCommandBuffers that is a valid handle must have
been created, allocated, or retrieved from commandPool."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15164>
For weird reasons, the COPY_DATA packet doesn't seem to copy anything
while on the compute queue. Instead, use PKT3_LOAD_SH_REG_INDEX which
seems to work as expected.
Note that LOAD_SH_REG_INDEX on the compute queue is only supported by
the CP on GFX10.3, so we need to implement a different solution (load
from the indirect BO in the shader) for older generations.
This should fix the Control RT GPU hang.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15053>
Handle arrays generically by using the last component of the coordinate source
as the array index. That works for both 2D arrays and cube arrays, fixing cube
arrays. Cube arrays were already handled correctly in core Panfrost code.
This code path is not tested in dEQP-GLES31 without exposing OES_cube_map_array,
which depends on OES_geometry_shader, which we don't have. Yet we do expose
PIPE_CAP_CUBE_ARRAY, so ARB_cube_map_array is exposed.
Disabling PIPE_CAP_CUBE_ARRAY would be an easy band-aid fix, but it's easy
enough to handle correctly.
dEQP-GLES31 passes with a hack enabling OES_cube_map_array [without geometry
shaders].
Also fixes 1D arrays on Bifrost for the same reasons.
Fixes: 70d6c5675d ("pan/bi: Emit TEXC with builder")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15254>
Hardware support was removed with Midgard. Use mesa/st to emulate GL_CLAMP with
nir_lower_tex automatically (the Zink lowering), and disable GL_MIRROR_CLAMP
which isn't lowered correctly.
Fixes *texwrap* Piglit tests on G52.
Fixes: f9ceab7b23 ("panfrost: Fix CLAMP wrap mode")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15253>
Instead of two magic ternary operations, define a new `axis` temporary
which is 1 for X and 2 for Y. Then define everything else in terms of
this variable. In particular, the mask operation we do on LANE_ID is a
mask so it makes more sense to use ~axis than 1/2 but in the other
order.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15352>
The Vulkan spec for VK_KHR_depth_stencil_resolve allows a format
mismatch between the primary attachment and the resolve attachment
within certain limits. In particular,
VUID-VkSubpassDescriptionDepthStencilResolve-pDepthStencilResolveAttachment-03181
If pDepthStencilResolveAttachment is not NULL and does not have the
value VK_ATTACHMENT_UNUSED and VkFormat of
pDepthStencilResolveAttachment has a depth component, then the
VkFormat of pDepthStencilAttachment must have a depth component with
the same number of bits and numerical type
VUID-VkSubpassDescriptionDepthStencilResolve-pDepthStencilResolveAttachment-03182
If pDepthStencilResolveAttachment is not NULL and does not have the
value VK_ATTACHMENT_UNUSED, and VkFormat of
pDepthStencilResolveAttachment has a stencil component, then the
VkFormat of pDepthStencilAttachment must have a stencil component
with the same number of bits and numerical type
So you can resolve from a depth/stencil format to a depth-only or
stencil-only format so long as the number of bits matches.
Unfortunately, this has never been tested because the CTS tests which
purport to test this are broken and actually test with a destination
combined depth/stencil format.
Fixes: 5e4f9ea363 ("anv: Implement VK_KHR_depth_stencil_resolve")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15333>
This changes our error handling to be compatible with upcoming
specification change that unifies glTex[Sub]Image error handling
between OpenGL ES 2.0 vs ES 3.0+ specifications, see:
https://gitlab.khronos.org/opengl/API/-/issues/147
OpenGL ES 2.0.25 spec states:
"Specifying a value for internalformat that is not one of
the above values generates the error INVALID_VALUE. If
internalformat does not match format, the error
INVALID_OPERATION is generated."
This fixes following new tests:
KHR-GLES31.core.compressed_format.*
KHR-GLES32.core.compressed_format.*
v2: GL_INVALID_OPERATION -> GL_INVALID_VALUE in extension
checks, remove (now overlapping) extension checks from
_mesa_gles_error_check_format_and_type (Eric Anholt)
v3: take GLES version in to account in internalformat checks
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12936>
In 4001d9ce1a we started lazily allocating surface states in the
descriptor sets rather than upfront in the descriptor pool. This was
to workaround vkd3d-proton allocating more than we could handle at the
HW level.
The issue introduced in that change is that we didn't protect the
descriptor pool free list as well as the anv_state_stream which are
now potentially used from different threads through the descriptor set
write functions.
This reverts the lazy allocation part of that change. Host only
descriptor sets changes remain.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4001d9ce1a ("anv: Handle VK_DESCRIPTOR_POOL_CREATE_HOST_ONLY_BIT_VALVE for descriptor sets")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15241>
When I switched iris over to use 3DSTATE_BINDING_TABLE_POOL_ALLOC, I
stopped flagging things dirty when allocating a new binder, because
the contents of the binding table were still valid, thanks to us not
having to subtract Surface State Base Address anymore.
This unfortunately missed the point that the old binding table is in the
old buffer, which is no longer what the binder pool base address points
to. So we'd either need to copy it over, or just flag it dirty and
re-emit it on the next draw.
Fixes misrendering in Ryujinx.
Fixes: 8b9045e7a4 ("intel: Use 3DSTATE_BINDING_TABLE_POOL_ALLOC exclusively on Gfx11+")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15314>
If we introduce another queue type (video decode) we can have a
disconnect between the RADV_QUEUE_ enum and the API queue_family_index.
currently the driver has
GENERAL, COMPUTE, TRANSFER which would end up at QFI 0, 1, <nothing>
since we don't create transfer.
Now if I add VDEC we get
GENERAL, COMPUTE, TRANSFER, VDEC at QFI 0, 1, <nothing>, 2
or if you do nocompute
GENERAL, COMPUTE, TRANSFER, VDEC at QFI 0, <nothing>, <nothing>, 1
This means we have to add a remapping table between the API qfi
and the internal qf.
This patches tries to do that, in theory right now it just adds
overhead, but I'd like to exercise these paths.
v2: add radv_queue_ring abstraction, and pass physical device in,
as it makes adding uvd later easier.
v3: rename, and drop one direction as unneeded now, drop queue_family_index
from cmd_buffers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13687>
in the initial implementation, a stream like:
* CmdBeginTransformFeedbackEXT
* CmdSetRasterizerDiscardEnableEXT
* CmdDraw
* CmdEndTransformFeedbackEXT
* CmdBeginTransformFeedbackEXT
* CmdDraw
* CmdEndTransformFeedbackEXT
would never enable transform feedback, as it only checked for the change
in rasterizer_discard state
Fixes: 4d531c67df ("anv: support rasterizer discard dynamic state")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15269>
for genuine early depth tests, the samplecount must be updated after depth
test but before samplemask is applied
for inferred-early or regular depth tests, the samplemask can be applied
before the depth test
Fixes: d9276ae965 ("llvmpipe: handle gl_SampleMask writing.")
fixes:
dEQP-VK.fragment_operations.early_fragment.sample_count_early_fragment_tests_depth_samples_4
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15319>
If a driver sets driver_data but not driver_free_cb, driver_data will
get freed along with the command. If a driver sets driver_free_cb,
driver_data will not get automatically freed but the callback will get
called before the rest of the data structure is freed.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15329>
we can effectively skip any kind of checks here and just assume that one
of two scenarios is in effect:
* the user is about to attempt some incredibly illegal behavior that VVL will catch
* the user is about to attempt a pro gamer move and we'll be fine
in either case, it's EXTENDED_USAGE, so hopefully we're about to make a texture
view from a compatible and supported format
cc: mesa-stable
fixes:
dEQP-VK.image.extended_usage_bit_compatibility.image_format_properties*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15320>
subpass->color_count is (obviously) not set yet, so this would just clobber
the color attachments any time resolves were used
Fixes: 8a6160a354 ("lavapipe: VK_KHR_dynamic_rendering")
fixes:
dEQP-VK.draw.dynamic_rendering.multiple_interpolation.structured.with_sample_decoration.4_samples
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15330>
Normally this wouldn't matter, but it will matter for the upcoming scan
macro because the running tally is communicated through a shared
register across a physical edge. It may also matter if a live-range
split occurs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
We weren't considering the other destinations when allocating a
destination, so we could allocate overlapping destinations. This wasn't
done before because we never had a need for it, but the subgroup
reduction macros will need it.
The trickiest part of this is that we have to rewrite the
compress_regs_left fallback, because we may have to move around the
other already-allocated destinations. We now have a list of destinations
to (re)allocate in addition to the popped live intervals. For the rest
of the destination handling, we can just bail out if the proposed spot
for something overlaps another destination, but for the fallback we have
to handle all the cases gracefully. I also added support for odd
combinations of multiple destinations where some of them are tied, which
we'll use in the next commit to handle early-clobber destinations and
which will actually be used because one of the destinations of the
subgroup reduction macro will be early-clobber. The result is that the
order of intervals to allocate is now a lot more complicated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
For pcopies we only care about the register's type, i.e. whether its a
half-register and whether it's an array (plus its size). Copying over
other flags like IR3_REG_RELATIV just leads to sadness and validator
assertions.
Fixes: 0ffcb19b9d ("ir3: Rewrite register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
Before, we were careful to
1. Get the source physreg.
2. Allocate the destination.
3. Insert a copy with the source being the physreg from step 1.
and this guaranteed that if the tied source were moved in step 2 we'd
still insert a copy from the correct place. However this won't work with
multiple destinations because an earlier destination could've already
moved the tied source around. Instead flip steps 2 and 3 (we'll insert
the copy before we allocate the interval, but that's ok) and run the
first two steps in a separate loop before any destinations are
allocated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
Note: this is a behavior change for arrays, because it will count the
entire array instead of just the components written in the register
pressure calculation. However this is more accurate since this matches
how RA works.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
Re-use auto-generated vk_cmd_enqueue entrypoints instead of generating
our own version doing the same thing. In order to effectively do this,
we also add an allow-list of which entrypoints lavapipe actually handles
to avoid issues where the autogenerated one stomps a vkCmdFoo2 wrapper.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15311>
The test suite is full of flakes around transform feedback, atomics, and
tess. But, I hope it can be useful for regression testing core Mesa
reworks.
This required updating the kernel to 5.16.12 to get a more stable boot
process. That kernel rebuild caused an update of the container with
piglit which that was missed in a previous MR, so we got new xfails in x86
swrast.
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> (nouveau)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15201>
This required updating the kernel to 5.16.12 to get a more stable boot
process. That kernel rebuild caused an update of the container with
piglit which that was missed in a previous MR, so we got new xfails in x86
swrast. Also, including modules on arm64 exposed a bug in v3d's
poe-powered.sh rsyncing of modules.
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15201>
this hardware won't return the correct value from dmod instructions,
so lower it to ensure that cts passes
nobody else will ever hit this, so perf isn't an issue and regular fmod
can be left alone
fixes (amd):
KHR-GL46.gpu_shader_fp64.builtin.mod_d*
Fixes: 5fae35fb17 ('zink: fix 64bit float shader ops ')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15306>
From the VK_KHR_shader_float_controls extension:
"5) Do any of the “Pack” GLSL.std.450 instructions count as
conversion instructions and have the rounding mode applied?"
"RESOLVED: No, only instructions listed in “section 3.32.11.
Conversion Instructions” of the SPIR-V specification count as
conversion instructions."
This is also the same logic as the LLVM backend.
No fossils-db changes on Sienna Cichlid.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15301>
The multiViewport feature isn't required for GL 4.3, it's required for
GL 4.1. Technically speaking, we could have just dropped it because we
already list the maxViewports requirement. But it seems better to be
very clear here to me.
Fixes: 29f8f21bff ("docs: document zink GL 4.3 requirements")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15300>
We have twice the registers in this case so it makes sense to double
this as well. While this causes slight regressions in shader-db
stats (due to additional register pressure), it helps us hide latency
of memory reads better on 2-thread compiles, where the thread switch
mechanism will be less effective. This shows a ~3% performance
improvement on the UE4 SunTemple demo.
total instructions in shared programs: 12642413 -> 12656164 (0.11%)
instructions in affected programs: 2272652 -> 2286403 (0.61%)
helped: 2924
HURT: 3389
total uniforms in shared programs: 3703861 -> 3704776 (0.02%)
uniforms in affected programs: 213729 -> 214644 (0.43%)
helped: 823
HURT: 1272
total max-temps in shared programs: 2150686 -> 2153505 (0.13%)
max-temps in affected programs: 191332 -> 194151 (1.47%)
helped: 1900
HURT: 1891
total spills in shared programs: 3255 -> 3274 (0.58%)
spills in affected programs: 166 -> 185 (11.45%)
helped: 3
HURT: 6
total fills in shared programs: 4630 -> 4656 (0.56%)
fills in affected programs: 367 -> 393 (7.08%)
helped: 7
HURT: 15
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
This can add quite a bit of register pressure so it makes sense to disable it
to prevent us from dropping to 2 threads or increase spills:
total instructions in shared programs: 12672813 -> 12642413 (-0.24%)
instructions in affected programs: 256721 -> 226321 (-11.84%)
helped: 719
HURT: 77
total threads in shared programs: 415534 -> 416322 (0.19%)
threads in affected programs: 788 -> 1576 (100.00%)
helped: 394
HURT: 0
total uniforms in shared programs: 3711370 -> 3703861 (-0.20%)
uniforms in affected programs: 28859 -> 21350 (-26.02%)
helped: 204
HURT: 455
total max-temps in shared programs: 2159439 -> 2150686 (-0.41%)
max-temps in affected programs: 32945 -> 24192 (-26.57%)
helped: 585
HURT: 47
total spills in shared programs: 5966 -> 3255 (-45.44%)
spills in affected programs: 2933 -> 222 (-92.43%)
helped: 192
HURT: 4
total fills in shared programs: 9328 -> 4630 (-50.36%)
fills in affected programs: 5184 -> 486 (-90.62%)
helped: 196
HURT: 0
Compared to the stats before adding scheduling of non-filtered
memory reads we see we that we have now gotten back all that was
lost and then some:
total instructions in shared programs: 12663186 -> 12642413 (-0.16%)
instructions in affected programs: 2051803 -> 2031030 (-1.01%)
helped: 4885
HURT: 3338
total threads in shared programs: 415870 -> 416322 (0.11%)
threads in affected programs: 896 -> 1348 (50.45%)
helped: 300
HURT: 74
total uniforms in shared programs: 3711629 -> 3703861 (-0.21%)
uniforms in affected programs: 158766 -> 150998 (-4.89%)
helped: 1973
HURT: 499
total max-temps in shared programs: 2138857 -> 2150686 (0.55%)
max-temps in affected programs: 177920 -> 189749 (6.65%)
helped: 2666
HURT: 2035
total spills in shared programs: 3860 -> 3255 (-15.67%)
spills in affected programs: 2653 -> 2048 (-22.80%)
helped: 77
HURT: 21
total fills in shared programs: 5573 -> 4630 (-16.92%)
fills in affected programs: 3839 -> 2896 (-24.56%)
helped: 81
HURT: 15
total sfu-stalls in shared programs: 39583 -> 38154 (-3.61%)
sfu-stalls in affected programs: 8993 -> 7564 (-15.89%)
helped: 1808
HURT: 1038
total nops in shared programs: 324894 -> 323685 (-0.37%)
nops in affected programs: 30362 -> 29153 (-3.98%)
helped: 2513
HURT: 2077
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
We do a few changes over NIR's defaults:
1. Lower delay for texture reads. Empirically, we don't observe any
benefits with delays over 50 and since this delay value is still
used by the scheduler in the "favor register pressure" case it is
benefitial to avoid overestimating it too much.
2. Adjust delay for non-filtered TMU reads to the delay selected for
texture reads.
3. In our case, UBO reads from dynamically uniform addresses don't
use the TMU and have a latency of 1 instruction in the best case
scenario or 4 at worse, so we go with 1 so we don't try to move
this early.
This helps us get back some of what we lost when updating the
default scheduler configuration to add a delay for non-filtered
memory reads:
total instructions in shared programs: 13126587 -> 12671765 (-3.46%)
instructions in affected programs: 3764097 -> 3309275 (-12.08%)
helped: 14664
HURT: 4244
total threads in shared programs: 407208 -> 415522 (2.04%)
threads in affected programs: 8716 -> 17030 (95.39%)
helped: 4224
HURT: 67
total uniforms in shared programs: 3812698 -> 3711224 (-2.66%)
uniforms in affected programs: 335170 -> 233696 (-30.28%)
helped: 2816
HURT: 3551
total max-temps in shared programs: 2318430 -> 2159345 (-6.86%)
max-temps in affected programs: 539991 -> 380906 (-29.46%)
helped: 13173
HURT: 1440
total spills in shared programs: 49086 -> 5966 (-87.85%)
spills in affected programs: 48306 -> 5186 (-89.26%)
helped: 1655
HURT: 28
total fills in shared programs: 55810 -> 9328 (-83.29%)
fills in affected programs: 54821 -> 8339 (-84.79%)
helped: 1659
HURT: 22
LOST: 0
GAINED: 3
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
This has been pending for a long time. It is not very consistent to
add a significant delay for textures and not do it for UBOs, etc
The reason we have not been doing this so far is the accumulated effect
on register pressure for V3D as shown by shader-db results below, but
from the point of view of a generic scheduler it makes sense to do this.
Later patches will address V3D specific issues with register pressure
derived from this by letting the driver control its instruction delay
settings.
total instructions in shared programs: 12662138 -> 13126587 (3.67%)
instructions in affected programs: 1813091 -> 2277540 (25.62%)
helped: 2410
HURT: 10499
total threads in shared programs: 415858 -> 407208 (-2.08%)
threads in affected programs: 17348 -> 8698 (-49.86%)
helped: 8
HURT: 4333
total uniforms in shared programs: 3711483 -> 3812698 (2.73%)
uniforms in affected programs: 128012 -> 229227 (79.07%)
helped: 3474
HURT: 2143
total max-temps in shared programs: 2138763 -> 2318430 (8.40%)
max-temps in affected programs: 318780 -> 498447 (56.36%)
helped: 588
HURT: 11997
total spills in shared programs: 3860 -> 49086 (1171.66%)
spills in affected programs: 709 -> 45935 (6378.84%)
helped: 23
HURT: 1595
total fills in shared programs: 5573 -> 55810 (901.44%)
fills in affected programs: 1067 -> 51304 (4708.25%)
helped: 23
HURT: 1595
LOST: 3
GAINED: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
We can get a generic nir_intrinsic_memory_barrier to represent a
barrier involving multiple semantics (instead of getting individual
specific barriers for each semantic). This means that we need to
consider these as potentially affecting shared memory access as well.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
This doesn't have any significant impact shader-db stats and would
reduce our capacity to hide latency from the loads, so it is probably
undesirable:
total instructions in shared programs: 12663189 -> 12663186 (<.01%)
instructions in affected programs: 4222 -> 4219 (-0.07%)
helped: 9
HURT: 4
total uniforms in shared programs: 3711624 -> 3711629 (<.01%)
uniforms in affected programs: 186 -> 191 (2.69%)
helped: 0
HURT: 2
total max-temps in shared programs: 2138822 -> 2138857 (<.01%)
max-temps in affected programs: 569 -> 604 (6.15%)
helped: 1
HURT: 9
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
On Icelake and later, we can use a new 3DSTATE_BINDING_TABLE_POOL_ALLOC
command to update the location of the binder (buffer containing binding
table entries), rather than having to move Surface State Base Address
via a STATE_BASE_ADDRESS command. This has less stalling and also means
our surface addresses can remain relative to a fixed 4GB address range,
meaning we don't have to re-stream them any time the binder changes.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
This workaround is needed on all Gfx12.0 parts, but doesn't appear to be
necessary on XeHP. The other drivers do not appear to be applying this
workaround on those parts. As further evidence, we accidentally added
the 3DSTATE_BINDING_TABLE_POOL_ALLOC commands after switching back to
GPGPU mode, which would be an incorrect way to implement the workaround,
and things seem to be working.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
On Gfx11+, we're going to stop changing Surface State Base Address
and instead start changing the Binding Table Pool address instead.
So, rename a few things to track the last binder address, which is
what we're actually changing, regardless of how we program it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
Skylake and older use a 15:5 binding table pointer format, which means
our binder can be at most 64kB in size. Each binding table within the
binder must be aligned to 32B.
XeHP uses a new 20:5 binding table format, which allows us to increase
the binder size to 1MB while retaining the nice 32B alignment. Larger
binders mean fewer stalls as we update the base address for the binder.
Icelake and Tigerlake can either use the 15:5 format or an 18:8 format.
18:8 mode requires the base of each binding table to be aligned to 256B
instead of 32B, but it gives us a maximum binder size of 512kB.
We can store 64 binding table entries in a 256B chunk (256B / 4B = 64),
but only 8 entries in a 32B chunk (32B / 4B = 8). Assuming that most
binding tables have fewer than 64 entries, this means that with the 18:8
format, we're likely to be able to fit 2048 (512KB / 256B) tables into a
a buffer before needing to allocate a new one and stall.
Technically, the old format could also store 2048 binding tables per
buffer as well (64KB / 32B = 2048). However, tables that needed more
than 8 entries would need multiple 32B chunks. A single table would
take multiple aligned chunks, while with the larger 256B format, it
could fit in a single one.
This cuts binder resets by 6.3% on a Shadow of Mordor benchmark trace.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
this comes in two flavors:
* streamout of array<struct/block>
* partial streamout of array/struct/block
for the former:
* arrays of structs can just be blasted out in the initial var declaration (easy)
* arrays of blocks must be output to separate xfb buffers for each array block,
which requires skipping initial xfb blast-off and instead propagating the values
using tmp variables at a later point
for the latter:
* the optimal way to do this is to unwrap the struct first to figure out what's being
emitted, at which point the value can be extracted and exported
fixes the rest of spec@arb_gl_spirv@execution@xfb
...except spec@arb_gl_spirv@execution@xfb@vs_block_array, which I'm suspecting is broken
due to vtn bugs
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>
this now handles inlining of stupid types (dvec3, dmatX) and complex
types (goku) as seen in cts
fixes:
KHR-Single-GL46.enhanced_layouts.xfb_explicit_location
KHR-Single-GL46.enhanced_layouts.xfb_struct_explicit_location
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>
This commit fixes the following flaws in the implementation:
* when a resource was re-allocated, the guest side storage
was also allocated
* when a source needs a readback before being written to, then
the call would go through vws->transfer_get, thereby bypassing the
staging resource, and this would fail on the host, because no
the allocated IOV was too small (just one byte)
* if the texture write would need neither flush nor readback, the
old code path would be used expecting that guest side backing stogage
for the texture.
v2: - actually do a readback to the stageing resource when it is required
- fix typo (Lepton)
v3: Don't use stageing transfers if the host can't read back the data
by rendering to an FBO or calling getTexImage, because in this case
we rely on the IOV to hold the date.
v4: Also don't use staging transfers if the format is no readback
format. Otherwise we have to deal with the resolve blit, and
this is currently not working correctly.
v5: add a new flag that indicates whether non-renderable textures can
be read back (either via glGetTexImage or GBM)
v6: Restrict the use of staging texture transfers to textures that can
be read back, and on GLES also if the they are bound to scanout and
the host uses minigbm to allocate such textures.
For that replace the flag indicating the capability to read back
non-renderable textures with a cap that indicates whether scanout
textures can be read back.
v7: update virglrenderer version in the CI
v8: update use of stageing (Chia-I)
v9: remove superflous check and assignment (Chia-I)
v10: disable stageing textures for arrays with stencil format. This is a
workaround for failures of the CI.
Fixes: cdc480585c
virgl/drm: New optimization for uploading textures
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14495>
this was being set from back before zink actually supported 64bit
natively and only 32bit was functional, but it breaks 64bit support
cc: mesa-stable
fixes (lavapipe):
KHR-GL46.gpu_shader_fp64.builtin.mod_dvec2
KHR-GL46.gpu_shader_fp64.builtin.mod_dvec3
KHR-GL46.gpu_shader_fp64.builtin.mod_dvec4
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15274>
Run crosvm as a background process in order to allow intercepting
interrupt signals (INT, TERM) and properly release/cleanup any allocated
resources.
This is particularly helpful when one or more crosvm tasks hang, which
will eventually prevent subsequent instances to be started - currently
we can handle up to 128 concurrent crosvm instances per runner.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15238>
When using indirect textures, some lanes may not be active,
particularly in a loop, so as with some other areas, extracting
the correct lane is needed here. This extracts the last valid one.
KHR-GL45.texture_barrier.* on zink.
Fixes: e168d148d7 ("gallivm/nir: handle non-uniform texture offsets")
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15259>
Experiments suggest that e.g.
add.u r0.y, hr0.x, hr0.y
will result in the summed value in both the high and low words of r0.y.
This only happens with odd registers, not even ones (r0.x works fine).
Seen in the bit_count lowering (which turns out to be unnecessary, but
this is still a larger problem).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15251>
Along with fixing the grammar, this allows it to be deduplicated since
the properly worded description exists in later generations' XMLs.
Cuts 96 B from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
917613 0 0 917613 e006d meson-generated_.._intel_perf_metrics.c.o (before)
917511 0 0 917511 e0007 meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14131044 365708 210004 14706756 e06844 iris_dri.so (before)
14130948 365708 210004 14706660 e067e4 iris_dri.so (after)
text data bss dec hex filename
8124321 214264 22820 8361405 7f95bd libvulkan_intel.so (before)
8124225 214264 22820 8361309 7f955d libvulkan_intel.so (after)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
The compiler does a good job of deduplicating strings already, but we
can eliminate the pointers to each string by combining the strings into
a single char array and storing only an index into that array.
The longest of the char arrays is the descriptions array, which is a
little over 45 KiB, so still under MSVC's 64 KiB string literal limit
[0]. Because the string length is under 64 KiB we can use uint16_t as
the index type, which roughly doubles our savings as compared to an int.
This cuts 77 KiB from iris_dri.so (0.5%) and libvulkan_intel.so (0.9%).
text data bss dec hex filename
926811 25920 0 952731 e899b meson-generated_.._intel_perf_metrics.c.o (before)
924401 0 0 924401 e1af1 meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14190852 391628 210004 14792484 e1b724 iris_dri.so (before)
14137732 365708 210004 14713444 e08264 iris_dri.so (after)
text data bss dec hex filename
8184097 240184 22820 8447101 80e47d libvulkan_intel.so (before)
8131009 214264 22820 8368093 7fafdd libvulkan_intel.so (after)
relinfo:
iris_dri.so (before): 17765 relocations, 17545 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
iris_dri.so (after) : 15605 relocations, 15385 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (before): 10720 relocations, 6989 relative (65%), 355 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (after) : 8560 relocations, 4829 relative (56%), 355 PLT entries, 1 for local syms (0%), 0 users
[0] https://docs.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-170&viewFallbackFrom=vs-2019
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
intel_perf_query_counter contains fields for things we can't or don't
want to store in our static data (like runtime-determined max values) or
oa_read_counter function pointers which are dependent on the GPU gen and
would make deduplication very ineffective.
Cuts 16 KiB from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
926811 43200 0 970011 ecd1b meson-generated_.._intel_perf_metrics.c.o (before)
926811 25920 0 952731 e899b meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14190852 408908 210004 14809764 e1faa4 iris_dri.so (before)
14190852 391628 210004 14792484 e1b724 iris_dri.so (after)
text data bss dec hex filename
8184097 257464 22820 8464381 8127fd libvulkan_intel.so (before)
8184097 240184 22820 8447101 80e47d libvulkan_intel.so (after)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
And specifically mark it with ATTRIBUTE_NOINLINE. Otherwise it will be
inlined and actually slightly increase code size.
Cuts 505 KiB from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
1538720 0 0 1538720 177aa0 meson-generated_.._intel_perf_metrics.c.o (before)
926811 43200 0 970011 ecd1b meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14751700 365708 210004 15327412 e9e0b4 iris_dri.so (before)
14190852 408908 210004 14809764 e1faa4 iris_dri.so (after)
text data bss dec hex filename
8744913 214264 22820 8981997 890ded libvulkan_intel.so (before)
8184097 257464 22820 8464381 8127fd libvulkan_intel.so (after)
Relocations increase because the counter initializations are moved from
code (in .text) to pointers (in .text) to .rodata, which require
relocations.
relinfo:
iris_dri.so (before): 15605 relocations, 15385 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
iris_dri.so (after) : 17765 relocations, 17545 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (before): 8560 relocations, 4829 relative (56%), 355 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (after) : 10720 relocations, 6989 relative (65%), 355 PLT entries, 1 for local syms (0%), 0 users
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
No changes in resulting code (yes, seriously!). GCC constant propagates
the static const arrays into the code, yielding bit for bit identical
results. This will however enable further cleanups.
Before this patch, we emit 11916 different initializations of
intel_perf_query_counter. With this patch we emit an array of 539 and
initialize the intel_perf_query_counters in terms of those.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
Instead of allocating cpu_storage in threaded_resource_init, defer the
allocation to first use (in tc_buffer_map).
This avoids needless memory allocation if tc_buffer_disable_cpu_storage is
called before tc_buffer_map.
map_buffer_alignment is stored and serves as a "can cpu_storage be used" flag.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15074>
When "dri2_wl_formats_init" fails in "dri2_initialize_wayland_swrast",
the "dri2_display_destroy" function is called for clean up. However, the
"dri2_egl_display" was not associated with the display in its
"DriverData" field yet.
The following cast in "dri2_display_destroy":
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
Expands to:
_EGL_DRIVER_TYPECAST(drvname ## _display, _EGLDisplay, obj->DriverData)
Crashing.
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13972>
When "dri2_wl_formats_init" fails in "dri2_initialize_wayland_drm", the
"dri2_display_destroy" function is called for clean up. However, the
"dri2_egl_display" was not associated with the display in its
"DriverData" field yet.
The following cast in "dri2_display_destroy":
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
Expands to:
_EGL_DRIVER_TYPECAST(drvname ## _display, _EGLDisplay, obj->DriverData)
Crashing.
Addresses-Coverity-ID: 1494541 ("Resource leak")
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13972>
when migrating a recycled set here, the set was previously invalid and
in the recycled table, meaning it can be reused directly so long as
it's first invalidated
the previous code would instead pop a different set off the allocation array,
leaking this one
cc: mesa-stable
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15226>
this got mixed up during some refactor and started indexing based on the
number of bindings instead of the number of descriptors, which means
that array descriptor bindings would have overlapping array memory
cc: mesa-stable
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15226>
the variant data may have changed in the meanwhile, so do a pass to
ensure that the correct variant is used
cc: mesa-stable
fixes caselist:
dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_two_samples.multisample_texture_4
dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_two_samples.singlesample_rbo
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15226>
Another instruction might write to the second source, and then an
INSTR_INVALID_ENC fault will be raised because the tuple will write to
and read from the register at the same time.
Fixes: 795638767d ("pan/bi: Use fused dual source blending")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
BITSET_MASK returns ~0 when given an input of zero, when we need it to
return 0 instead.
Fixes shaders with only sysvals but no UBOs when push constants are
disabled.
This breaks when 31 or 32 UBOs are used, but PAN_MAX_CONST_BUFFERS is
currently set to 16.
Fixes: c246af0dd8 ("panfrost: Only upload UBOs when needed")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
This could be made slightly more efficient by only setting the dirty
state that is needed, but eventually you reach a point where it's
cheaper to re-emit everything than work out what can or can't be kept.
Fixes rendering issues in Duckstation.
Fixes: cd2c1ef9da ("panfrost: Dirty track textures/samplers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
TEXC can have two destinations; the value for neither of them can be
used in the same bundle, so extend the code to check for this to
iterate over both destinations.
Fixes artefacts in the game "LIMBO".
Fixes: a303076c1a ("pan/bi: Add bi_instr_schedulable predicate")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
Trying to write to overlapping register ranges from a single
instruction is undefined behaviour, so add interference between the
nodes to avoid this.
Hit in a dual-texture instruction in LIMBO.
Fixes: 9146bafbb4 ("pan/bi: Add dual texture fusing pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
When discarding the whole resource to create a new one, if this resource
is used by a sampler view, a rebind must be done to use the new
resource.
But this must be done when setting the sampler views, because we don't
have access to those samplers before.
v2:
- Pack shader state on setting sampler views (Iago)
- Use a serial ID to know when to rebind sampler views (Juan)
v3:
- Move check to caller (Iago)
- Keep rebind sampler view on BO change (Iago)
v4:
- Rename "serial_bo" to "serial_id" (Iago)
- Add comments (Iago)
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6027
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15171>
The Gallium pipe video "frame_num" variable is internally used as a
counter of elapsed reference frames since the last IDR. The incoming
frame_num field from VA picture parameters is not equivalent; the VA
value may wrap to zero prematurely, as it is a 16-bit struct field with
a documented max value of 2^(log2_max_frame_num_minus4 + 4)-1.
This change improves "infinite GOP" single-client live streaming, where
it is reasonable for the server to desire an endless series of P-frames
without IDR. Without this change, it is difficult/impossible for an
application to encode a P- or B-frame after the VA frame_num field wraps
around to zero, depending on the backend encoder implementation.
This change has no effect on existing applications that always signal an
IDR frame and reset the VA frame_num to zero before it wraps around. For
example, the FFmpeg vaapi encoder ignores the VA documentation and sends
an un-wrapped VA frame_num, which results in identical computation of
the internal frame_num (as long as each GOP is less than 65536 frames).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5768
Reviewed-by: Thong Thai <thong.thai@amd.com>
patch revision 3: correctly avoid incrementing frame_num when the encoded
frame is not a reference, per h264 spec and ffmpeg behavior
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14332>
Only allow this in situations where we know it's safe. In particular, this
stops removal of unconditional branches like with
block_kind_continue_or_break.
Fixes dEQP-VK.graphicsfuzz.fragcoord-control-flow hang.
fossil-db (Sienna Cichlid):
Totals from 34 (0.02% of 162293) affected shaders:
Instrs: 84115 -> 84178 (+0.07%); split: -0.00%, +0.08%
CodeSize: 463372 -> 463624 (+0.05%); split: -0.00%, +0.06%
Latency: 3467316 -> 3467652 (+0.01%)
InvThroughput: 3085493 -> 3085578 (+0.00%)
Branches: 3221 -> 3284 (+1.96%); split: -0.03%, +1.99%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: f030b75b7d ("aco: relax condition to remove branches in case of few instructions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15214>
vecN instructions which are only used by other ALU
will now get duplicate channels removed.
i915g:
total instructions in shared programs: 396309 -> 396294 (<.01%)
instructions in affected programs: 186 -> 171 (-8.06%)
r300:
total instructions in shared programs: 1165059 -> 1164354 (-0.06%)
instructions in affected programs: 35884 -> 35179 (-1.96%)
total temps in shared programs: 165497 -> 165326 (-0.10%)
temps in affected programs: 2990 -> 2819 (-5.72%)
softpipe:
total instructions in shared programs: 2860028 -> 2859084 (-0.03%)
instructions in affected programs: 55539 -> 54595 (-1.70%)
total temps in shared programs: 516939 -> 516546 (-0.08%)
temps in affected programs: 6623 -> 6230 (-5.93%)
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12468>
This gives
MESA-VIRTIO: debug: stuck in ring seqno wait with iter at 4096
MESA-VIRTIO: debug: stuck in ring seqno wait with iter at 8192
MESA-VIRTIO: debug: stuck in ring seqno wait with iter at 12288
MESA-VIRTIO: debug: stuck in ring seqno wait with iter at 16384
MESA-VIRTIO: debug: aborting
Aborted
which should be more friendly than printing the messages forever.
On my i7-7820HQ, this aborts after roughly 4+8+16+32=60 seconds
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15200>
NIR-to-TGSI produces partial output writes contrary to the old paths
that always wrote the full outputs. Therefore if there is now a partial
output write ready to be scheduled and nothing else besides a tex
is ready, we would schedule the output write first. This was not a
problem before as usually at last some component of the full output write
depended on the tex result.
This is not optimal from the performance point of view and resulted in
~20% slowdown in the Unigine demos. The docs say:
The first OUTPUT instruction will reserve space in the output register
fifo. This space is limited, therefore issuing an OUTPUT earlier than
necessary may cause threads to stall earlier than necessary. You
should not set an ALU instruction as type OUTPUT unless it is actually
writing to an output register, or it is the last instruction of
the program.
Fix it by explicitly prefering a TEX before OUT and restore the
performance: 9.66 -> 12.12 fps (as compared to 11.83 with the old
glsl-to-TGSI path) in Unigine Sanctuary. No change in Lightsmark or
GLmark.
This is also a win from the intructions point of view as we are usually
able to schedule the partial output writes in a single pair at the end.
total instructions in shared programs: 106009 -> 105891 (-0.11%)
instructions in affected programs: 10153 -> 10035 (-1.16%)
helped: 118
HURT: 0
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5840
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15165>
The max_score == -1 condition is already before so this
will never trigger. Its unclear what was the intention anyway. Now we
emit either:
- if we have accumulated enough tex intructions for a full block
- if we have nothing else to emit
- or if we can emit all remaining tex instructions already.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15165>
New extensions properties/feature are being put in the `vn_physical_device`
which is not ideal from an organization point of view.
Here the `vn_physical_device_{features,properties}` are two new struct to
help the `vn_physical_device` organzation.
Signed-off-by: Igor Torrente <igor.torrente@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15170>
Something is slightly off in the integer values returned. It passes many
tests without the fixup, but the dEQP-GLES31 tests complain. The blob
ends up doing 3x gathers, and selects between them based on getinfo
results. Since we already have a per-sampler key with some spare bits,
just stick the bit-size info in there. And we can derive signedness from
the associated type info.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14670>
It can't be done. This just provides bad results. The blob had a
comparable approach where they fixed up coordinates, but that also can't
work with a separate texture definition with nearest filtering. By then,
might as well provide a unswizzled variant instead, and using native
functionality.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14670>
From Section 4.4.1 (Input Layout Qualifiers) of the GLSL 4.50 spec:
"For some blocks declared as arrays, the location can only be applied
at the block level: When a block is declared as an array where
additional locations are needed for each member for each block array
element, it is a compile-time error to specify locations on the block
members. That is, when locations would be under specified by applying
them on block members, they are not allowed on block members. For
arrayed interfaces (those generally having an extra level of
arrayness due to interface expansion), the outer array is stripped
before applying this rule"
From Section 1.2.1 (Changes from Revision 6 of GLSL Version) of the GLSL 4.50 spec:
"Private Bug 15678: Don’t allow location = on block members where
the block needs an array of locations"
From Section 4.4.1 (Input Layout Qualifiers) of the GLSL ES 3.20 spec
"If an input is declared as an array of blocks, excluding per-vertex-arrays
as required for tessellation, it is an error to declare a member of
the block with a location qualifier"
From Section 1.1.3 (Changes from GLSL ES 3.2 revision 3) of the GLSL ES 3.20 spec:
"Arrayed blocks cannot have layout location qualifiers on members"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11522>
if one of these states change then it affects which result needs to be
used for that query, so split it up over multiple query ids to make sure
the correct result is obtained
fixes (lavapipe):
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_pause_resume
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_states
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15227>
When the AOS/linear code was added it only worked with TGSI which
meant nothing in mesa upstream was really using it.
This adds support to analyse NIR shaders, and adds aos support
to the backend.
AOS support is limited to mov,vec,fmul,tex sampling in order to
accelerate mostly compositing operations. I've tested weston uses
the fast path. gnome-shell can't use it yet as we can't optimise
the depth test paths.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15140>
For Bifrost, we model load/store segments, for example for thread local storage.
We need something similar on Valhall -- access modifiers. There are four access
modifiers on Valhall, controlling memory subsystem optimizations for the access:
none: Nothing may be assumed. Corresponds to "global".
istream: Internally streaming within the GPU. Corresponds to "pos", as it's
used for position stores.
estream: Externally streaming outside the GPU. Corresponds to "vary", as it's
used for varying stores.
force: Force access in discarded threads. Corresponds to "tl", as it's required
for correct behaviour of helper invocations that use the stack.
If these access modifiers end up being useful outside these fixed purposes, we
may need to rework this part of the IR. For now, this should suffice.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15216>
This can avoid some cases where a constant has to be loaded into a
temporary register.
v2: Update i915-g33-fails.txt.
total instructions in shared programs: 788625 -> 782376 (-0.79%)
instructions in affected programs: 166269 -> 160020 (-3.76%)
helped: 1578
HURT: 0
helped stats (abs) min: 3 max: 21 x̄: 3.96 x̃: 3
helped stats (rel) min: 1.56% max: 33.33% x̄: 4.82% x̃: 3.45%
95% mean confidence interval for instructions value: -4.06 -3.86
95% mean confidence interval for instructions %-change: -5.00% -4.64%
Instructions are helped.
LOST: 0
GAINED: 35
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15210>
I noticed the SGE case while looking at the output of
shaders/closed/steam/trine-2/fp-3.shader_test on i915g. These are
especially bad on i915 that needs two instructions to implement SNE.
An alternative would be to duplicate the sne(sXX(a, b), 0.0) rules in an
algebraic pass that occurs after bool_to_float. Doing the work earlier
seems preferable.
i915
total instructions in shared programs: 788274 -> 788223 (<.01%)
instructions in affected programs: 666 -> 615 (-7.66%)
helped: 5
HURT: 0
helped stats (abs) min: 9 max: 12 x̄: 10.20 x̃: 9
helped stats (rel) min: 5.00% max: 11.11% x̄: 8.12% x̃: 8.16%
95% mean confidence interval for instructions value: -12.24 -8.16
95% mean confidence interval for instructions %-change: -10.81% -5.43%
Instructions are helped.
LOST: 0
GAINED: 2
The two gained shaders are assembly fragment programs in Euro Truck
Simulator 2.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15210>
Many users of nir_vec() do so by nir_channel()-ing a new ssa defs as movs
from other vectors to put the new vector together, which then just have to
get copy-propagated into the ALU srcs and DCEed away the temporary movs.
If they instead take nir_ssa_scalar, we can avoid that extra work.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14865>
"queueFamilyIndexCount is the number of queue families having access to the image(s) of the
swapchain when imageSharingMode is VK_SHARING_MODE_CONCURRENT.
pQueueFamilyIndices is a pointer to an array of queue family indices having access to the
images(s) of the swapchain when imageSharingMode is VK_SHARING_MODE_CONCURRENT."
If the type isn't concurrent, don't attempt to access the arrays.
dEQP-VK.wsi.xlib.swapchain.create.exclusive_nonzero_queues on lavapipe.
Fixes: 5b13d74583 ("vulkan/wsi/drm: Break create_native_image in pieces")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15101>
We were just emitting the bad reloc (either an assert fail on a debug
build or for a release build likely a GPU hang from the resulting fault).
Given that the GLES 3.2 spec's robust context requirement says we should
return undefined data but not terminate for element indices outside of the
VB, ignoring the offset in this case seems like a better behavior to have
in all cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15198>
We only have a single prog_data::total_scratch for all shader variants
(SIMD 8, 16, 32). Therefore we should always max the total_scratch on
top of existing variant.
We probably haven't run into that issue before because we compile by
increasing SIMD size and higher SIMD size is more likely to spill. But
for bindless shaders with return shaders, if the last return part
doesn't spill, we completely ignore the previous parts' scratch
computation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15193>
Now that we don't sort our nodes we can arrange them so we can
easily translate between nodes and temps without a mapping table,
just applying an offset.
To do this we have a single array of nodes where twe put first the nodes
for accumulators and then the nodes for temps. With this setup we can
ensure that for any given temp T, its node is always T + ACC_COUNT.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
Nodes are allocated in order to registers so initially sorting
was used to ensure that nodes with smaller life ranges would
be assigned first and therefore be more likely to get
accumulators.
However, since d81a6e5f1d now we don't rely on order to make
decisions about accumulators and instead we make policy decisions
based on actual liveness, so sorting is no longer strictly
relevant to this decision.
Furthermore, we are not re-sorting nodes after each spill either,
since that would probably require that we rebuild the interference
graph after each spill (the graph identifies nodes by their index).
Shader-db results show a significant improvement in instruction
counts, due to more optimal accumulator assignments. The reason for
this is that we use a round-robin policy for choosing the next
accumulator to assign. The idea behind this is preventing nearby
temps to be assigned to the same accumulator so that QPU scheduling
is more flexible, but if we sort our nodes, we are basically not
assigning temps in program order any more and the round-robin policy
becomes less effective:
total instructions in shared programs: 13000420 -> 12663189 (-2.59%)
instructions in affected programs: 11791267 -> 11454036 (-2.86%)
helped: 62890
HURT: 19987
total threads in shared programs: 415874 -> 415870 (<.01%)
threads in affected programs: 20 -> 16 (-20.00%)
helped: 2
HURT: 4
total uniforms in shared programs: 3711652 -> 3711624 (<.01%)
uniforms in affected programs: 43430 -> 43402 (-0.06%)
helped: 134
HURT: 173
total max-temps in shared programs: 2144876 -> 2138822 (-0.28%)
max-temps in affected programs: 123334 -> 117280 (-4.91%)
helped: 4112
HURT: 1195
total spills in shared programs: 3870 -> 3860 (-0.26%)
spills in affected programs: 1013 -> 1003 (-0.99%)
helped: 14
HURT: 12
total fills in shared programs: 5560 -> 5573 (0.23%)
fills in affected programs: 1765 -> 1778 (0.74%)
helped: 14
HURT: 17
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
For us they are basically uniforms too so we want to make their
lifespans short to facilitate allocating them to accumulators.
total instructions in shared programs: 13043585 -> 13015385 (-0.22%)
instructions in affected programs: 8326040 -> 8297840 (-0.34%)
helped: 24939
HURT: 19894
total threads in shared programs: 415860 -> 415858 (<.01%)
threads in affected programs: 4 -> 2 (-50.00%)
helped: 0
HURT: 1
total uniforms in shared programs: 3721953 -> 3720451 (-0.04%)
uniforms in affected programs: 96134 -> 94632 (-1.56%)
helped: 744
HURT: 435
total max-temps in shared programs: 2173431 -> 2154260 (-0.88%)
max-temps in affected programs: 264598 -> 245427 (-7.25%)
helped: 10858
HURT: 841
total spills in shared programs: 4005 -> 4010 (0.12%)
spills in affected programs: 700 -> 705 (0.71%)
helped: 5
HURT: 10
total fills in shared programs: 5801 -> 5817 (0.28%)
fills in affected programs: 1346 -> 1362 (1.19%)
helped: 6
HURT: 11
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
If we are compiling with a strategy that does not allow TMU spills
we should not allow spilling anything that is not a uniform.
Otherwise the RA cost/benefit algorithm may choose to spill a
temp that is not uniform and that will cause us to immediately
fail the strategy and fallback to the next one, even if we
could've instead chosen to spill more uniforms to compile the
program successfully with that strategy.
Some relevant shader-db stats:
total instructions in shared programs: 13040711 -> 13043585 (0.02%)
instructions in affected programs: 234238 -> 237112 (1.23%)
helped: 73
HURT: 172
total threads in shared programs: 415664 -> 415860 (0.05%)
threads in affected programs: 196 -> 392 (100.00%)
helped: 98
HURT: 0
total uniforms in shared programs: 3717266 -> 3721953 (0.13%)
uniforms in affected programs: 12831 -> 17518 (36.53%)
helped: 6
HURT: 100
total max-temps in shared programs: 2174177 -> 2173431 (-0.03%)
max-temps in affected programs: 4597 -> 3851 (-16.23%)
helped: 79
HURT: 21
total spills in shared programs: 4010 -> 4005 (-0.12%)
spills in affected programs: 55 -> 50 (-9.09%)
helped: 5
HURT: 0
total fills in shared programs: 5820 -> 5801 (-0.33%)
fills in affected programs: 186 -> 167 (-10.22%)
helped: 5
HURT: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
Our cost was 5 which matches the number of instructions we have to
add for a TMU spill (a fill is 4 instructions).
Uniform spills on the other hand add an extra instruction for each
fill and remove one instruction for the spill itself. These have
a cost of 1.
Therefore, if we have a single spill+fill, we end up with +9
instructions if it is a TMU spill and +0 instructions with a uniform
spill, so making the former only 5 times more costly is probably
not a good idea, and this is without even considering the added
latency of the TMU accesses.
Relevant shader-db changes show this causes as a marginal instruction
count increase in a few shaders but better thread counts and lower
TMU spilling overall:
total instructions in shared programs: 13037315 -> 13040711 (0.03%)
instructions in affected programs: 370106 -> 373502 (0.92%)
helped: 187
HURT: 321
total threads in shared programs: 415090 -> 415664 (0.14%)
threads in affected programs: 574 -> 1148 (100.00%)
helped: 287
HURT: 0
total uniforms in shared programs: 3706674 -> 3717266 (0.29%)
uniforms in affected programs: 63075 -> 73667 (16.79%)
helped: 40
HURT: 395
total max-temps in shared programs: 2176080 -> 2174177 (-0.09%)
max-temps in affected programs: 15838 -> 13935 (-12.02%)
helped: 316
HURT: 34
total spills in shared programs: 4247 -> 4010 (-5.58%)
spills in affected programs: 2599 -> 2362 (-9.12%)
helped: 107
HURT: 14
total fills in shared programs: 6121 -> 5820 (-4.92%)
fills in affected programs: 3622 -> 3321 (-8.31%)
helped: 108
HURT: 13
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
This is for drivers that have separate store instructions for varyings,
system value outputs (such as clip distances), and transform feedback.
The flags tell the driver not to store the output to those locations.
This will be used by radeonsi initially, and then maybe by a new linker.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14388>
NIR now fully contains pipe_stream_output_info in shader_info and IO
intrinsics if lower_io_variables is true. radeonsi will not use
pipe_stream_output_info after this, and other drivers are free to follow
that.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14388>
This will allow compaction of transform feedback varyings because they
are no longer tied to varying slots with this information.
It will also make transform feedback info available to all NIR passes
after IO is lowered. It's meant to replace pipe_stream_output_info.
Other intrinsics are not used with transform feedback.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14388>
These are unified in the hardware, so let's unify them in pan_shader_info.
Hoisting this logic to pan_shader.c avoids the need to duplicate this logic for
Midgard/Bifrost (RSD packing) and Valhall (SPD packing).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15204>
Instead of specifying the tiling on the texture descriptor, Valhall specifies it
on the plane descriptor. There is a new flag on the texture descriptor
specifying only whether the planes are interleaved or not.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15204>
Add support of GUEST_VRAM type of blob. These are dedicated heap memory
allocations required for vk support on hypervisors that don't support
runtime injections of host memory into guest physical address space.
The flow of usage:
1) Host VM reserves dedicated heap memory
2) Device get info about memory reservations and report it to guest
using mmio registers
3) Guest virtio-gpu driver on starts checks mmio registers for
physical address and length of reserved region. Then it reserves it
in guest.
4) On each call of vkAllocateMemory() guest driver gets chunk of
required memory and send it to host using sg list. It uses one sg
entry for 1 blob call. Heap is managed on guest using drm memory
manager (drm_mm).
Signed-off-by: Oleksandr.Gabrylchuk <Oleksandr.Gabrylchuk@opensynergy.com>
Signed-off-by: Andrii Pauk <Andrii.Pauk@opensynergy.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14536>
v2.
- Build the runner image as part of the CI for the boot2container
project, rather than as a manually step using the build instructions
in valve-trigger.dockerfile.
- Depend on a non-default kernel build hosted in the valve-infra
package repository. This does reduce the current caching feature of
local artifacts, but makes it easier to chop and change kernels on a
per-project or even per-test basis.
v3.
- Depend on a kernel built and stored in the valve-infra generic
package repo.
- Build the runner container using ci-templates as part of the CI in
valve-infra.
- Now that the runner container is built in the valve-infra CI, I
dropped the source import of client.py and message.py. They are
built in the runner container.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14660>
- Add the scripts to the prepared Mesa artifacts for use in later
runner stages.
- Add a template generator (generate_b2c.py) which reads and
validates (very lightly for now) the Gitlab job environment and then
spits out a YAML file describing the necessary test workload to be
sent to a Valve CI gateway.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14660>
nir_var_shader_out writes are only used for later TES invocations, so I
don't think there's any need for the TCS workgroup to wait for them.
fossil-db (Sienna Cichlid):
Totals from 1691 (1.04% of 162293) affected shaders:
Instrs: 710699 -> 709008 (-0.24%)
CodeSize: 3830168 -> 3823404 (-0.18%)
Latency: 3396997 -> 3007934 (-11.45%)
InvThroughput: 1212094 -> 1082823 (-10.67%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15195>
This reverts commit 382f6ccda8.
The spec requires that all color images created with the same tiling
(and a few other properties) support the same memoryTypeBits. So this
wasn't a valid change. It also wasn't necessary - we already have a
mechanism in anv_BindImageMemory2 for disabling compression if the BO
doesn't support it.
With this, XeHP passes the tests in
dEQP-VK.memory.requirements.*tiling_optimal
Fixes: 382f6ccd ("anv: Require the local heap for CCS on XeHP")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15068>
When an image configured for HIZ_CCS/HIZ_CCS_WT is bound to a BO lacking
implicit CCS, we disable any compression it may have had. Such images
are still compatible with ISL_AUX_USAGE_HIZ however. Fall back to that
aux usage to retain the performance benefit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15068>
When an image is bound to a BO lacking implicit CCS, we disable any
compression it may have had. This is unnecessary in the cases where the
compression type doesn't depend on the BO having implicit CCS support.
Avoid this disabling for ISL_AUX_USAGE_MCS and ISL_AUX_USAGE_HIZ.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15068>
In the bindless case, we don't have to keep any shadow descriptors and
can just reuse the IBO descriptor as a texture descriptor. Now that
we're emitting the swizzle we can just flip this on.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
ncoords includes the array index, and the NIR source has the array index
as its last component, so we have to insert the extra y coordinate in
the middle in this case.
Fixes: 0bb0cac ("freedreno/ir3: handle image buffer")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
Since these were reverse-engineered, it's become clear that IBO
descriptors are just a subset of texture descriptors, and bindless reads
of readonly images actually use isam on the IBO descriptor, further
confirming that the two are always compatible, even if not all of the
texture fields exist for IBOs. It's pointless to have a separate type
for IBOs, and just leads to things getting out-of-sync unnecessarily
which has already happened. Just remove it and use TEX_CONST insted.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
CAN_REORDER takes volatile into account, and is closer to what we
actually require to use texture instructions, which is that we can
arbitrarily reorder loads.
Fixes: aa93896 ("freedreno/ir3: adjust condition for when to use ldib")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
From Vulkan spec for VkDrmFormatModifierProperties2EXT:
"drmFormatModifierTilingFeatures is a bitmask of VkFormatFeatureFlagBits
that are supported by any image created with format and drmFormatModifier."
"The returned drmFormatModifierTilingFeatures must contain at least one bit."
"Therefore, if the returned drmFormatModifier is DRM_FORMAT_MOD_LINEAR,
then drmFormatModifierPlaneCount must equal the format planecount, and
drmFormatModifierTilingFeatures must be identical to the
VkFormatProperties2::linearTilingFeatures returned in the same pNext chain."
Relevant tests: dEQP-VK.drm_format_modifiers.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15032>
now when gl_PointSize and gl_PointSizeMESA are both present, the former
will be used for xfb with a new location and the latter will be
exported by the shader
fixes (zink):
GTF-GL46.gtf30.GL3Tests.transform_feedback.transform_feedback_builtins
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15184>
creating this at the start of the shader means it will get optimized out
when the pass is used to overwrite existing psiz values, and creating it
at the end means it will get optimized out in geometry shaders, so instead
just walk the instructions and create another store right after the existing one
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15184>
On mips64, the compiler does not allow use of non-zero argument with
__builtin_frame_address(). However, the returned frame address is only
used when PIPE_ARCH_X86 is defined. The compile error can be avoided
by making #ifdef PIPE_ARCH_X86 cover the getting of frame address too.
The argument checking of __builtin_frame_address() has been present
as a debug assert in clang 8. In clang 10, there is a proper runtime
check for the argument. This is why the build has not failed before.
Fixes: dc94a0506f ("gallium: Do not add -Wframe-address option for gcc <= 4.4.")
from Visa Hankala
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6511>
Make this build on clang architectures that don't have 64-bit atomic
instructions. Clang doesn't allow redeclaration (and therefore
redefinition) of the __sync_* builtins. Use #pragma redefine_extname
to work around that restriction. Clang also turns __sync_add_and_fetch
into __sync_fetch_and_add (and __sync_sub_and_fetch into
__sync_fetch_and_sub) in certain cases, so provide these functions as
well.
Fixes: a6a38a038b ("util/u_atomic: provide 64bit atomics where they're missing")
patch from Mark Kettenis
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6511>
The structure wrapped around the rb tree node was being freed, but not the node
itself, which caused a segmentation fault when accessing its parent node.
Add rb tree node remove call to fix it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15188>
Since this text was written, VirGL has become a shipping, production
quality solution. It's no longer a research project. Let's update the
text to reflect that.
While we're at it, let's drop the project from the page title, as this
is no longer the docs for the entire project.
Acked-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13836>
need to iterate over the descriptors in the binding to invalidate the whole
thing here
=================================================================
==546534==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a0000ae6c0 at pc 0x7fe20e26fd9d bp 0x7ffd92be6bc0 sp 0x7ffd92be6bb8
READ of size 8 at 0x61a0000ae6c0 thread T0
#0 0x7fe20e26fd9c in zink_descriptor_set_refs_clear ../src/gallium/drivers/zink/zink_descriptors.c:950
#1 0x7fe20e401304 in zink_destroy_surface ../src/gallium/drivers/zink/zink_surface.c:340
#2 0x7fe20e21311b in zink_surface_reference ../src/gallium/drivers/zink/zink_surface.h:106
#3 0x7fe20e21a5b9 in zink_sampler_view_destroy ../src/gallium/drivers/zink/zink_context.c:835
#4 0x7fe20c41d35f in tc_sampler_view_destroy ../src/gallium/auxiliary/util/u_threaded_context.c:1848
#5 0x7fe20e210ff7 in pipe_sampler_view_reference ../src/gallium/auxiliary/util/u_inlines.h:216
#6 0x7fe20e22d592 in zink_set_sampler_views ../src/gallium/drivers/zink/zink_context.c:1532
#7 0x7fe20c41a3d8 in tc_call_set_sampler_views ../src/gallium/auxiliary/util/u_threaded_context.c:1393
#8 0x7fe20c411706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
#9 0x7fe20c4124ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
#10 0x7fe20c42b728 in tc_destroy ../src/gallium/auxiliary/util/u_threaded_context.c:4250
#11 0x7fe20b65176a in st_destroy_context_priv ../src/mesa/state_tracker/st_context.c:387
#12 0x7fe20b65669f in st_destroy_context ../src/mesa/state_tracker/st_context.c:1009
#13 0x7fe20b7055ab in st_context_destroy ../src/mesa/state_tracker/st_manager.c:944
#14 0x7fe20a9c75bd in dri_destroy_context ../src/gallium/frontends/dri/dri_context.c:256
#15 0x7fe20a9d4bef in driDestroyContext ../src/gallium/frontends/dri/dri_util.c:534
#16 0x7fe22361f25c in drisw_destroy_context ../src/glx/drisw_glx.c:429
#17 0x7fe223625d95 in glXDestroyContext ../src/glx/glxcmds.c:523
#18 0x7fe22636aaeb in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GLX/libglx.c:332
#19 0x7fe2269d9e7d in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GL/g_libglglxwrapper.c:384
#20 0x41b88a in tcu::lnx::x11::glx::GlxRenderContext::~GlxRenderContext() /home/zmike/src/VK-GL-CTS/framework/platform/lnx/X11/tcuLnxX11GlxPlatform.cpp:734
#21 0x41b8e9 in tcu::lnx::x11::glx::GlxRenderContext::~GlxRenderContext() /home/zmike/src/VK-GL-CTS/framework/platform/lnx/X11/tcuLnxX11GlxPlatform.cpp:735
#22 0x2323aa7 in deqp::gles31::Context::destroyRenderContext() /home/zmike/src/VK-GL-CTS/modules/gles31/tes31Context.cpp:77
#23 0x2323969 in deqp::gles31::Context::~Context() /home/zmike/src/VK-GL-CTS/modules/gles31/tes31Context.cpp:55
#24 0x232278e in deqp::gles31::TestPackage::deinit() /home/zmike/src/VK-GL-CTS/modules/gles31/tes31TestPackage.cpp:102
#25 0x2c866c2 in tcu::DefaultHierarchyInflater::leaveTestPackage(tcu::TestPackage*) /home/zmike/src/VK-GL-CTS/framework/common/tcuTestHierarchyIterator.cpp:75
#26 0x2c87058 in tcu::TestHierarchyIterator::next() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestHierarchyIterator.cpp:252
#27 0x2c365da in tcu::TestSessionExecutor::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:122
#28 0x2c00b0c in tcu::App::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuApp.cpp:221
#29 0x4141b7 in main /home/zmike/src/VK-GL-CTS/framework/platform/tcuMain.cpp:58
#30 0x7fe2263e155f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
#31 0x7fe2263e160b in __libc_start_main_impl (/lib64/libc.so.6+0x2d60b)
#32 0x413fa4 in _start (/home/zmike/src/VK-GL-CTS/build/external/openglcts/modules/glcts+0x413fa4)
0x61a0000ae6c0 is located 64 bytes inside of 1328-byte region [0x61a0000ae680,0x61a0000aebb0)
freed by thread T0 here:
#0 0x7fe226cb6627 in free (/usr/lib64/libasan.so.6+0xae627)
#1 0x7fe20aab1751 in unsafe_free ../src/util/ralloc.c:302
#2 0x7fe20aab16c8 in unsafe_free ../src/util/ralloc.c:295
#3 0x7fe20aab13c3 in ralloc_free ../src/util/ralloc.c:265
#4 0x7fe20e269234 in descriptor_pool_free ../src/gallium/drivers/zink/zink_descriptors.c:286
#5 0x7fe20e26937d in descriptor_pool_delete ../src/gallium/drivers/zink/zink_descriptors.c:296
#6 0x7fe20e26ff53 in zink_descriptor_pool_reference ../src/gallium/drivers/zink/zink_descriptors.c:967
#7 0x7fe20e270db2 in zink_descriptor_program_deinit ../src/gallium/drivers/zink/zink_descriptors.c:1071
#8 0x7fe20e3b6536 in zink_destroy_gfx_program ../src/gallium/drivers/zink/zink_program.c:695
#9 0x7fe20e1eaaf9 in zink_gfx_program_reference ../src/gallium/drivers/zink/zink_program.h:242
#10 0x7fe20e20d386 in zink_shader_free ../src/gallium/drivers/zink/zink_compiler.c:2099
#11 0x7fe20e3b9f0b in zink_delete_shader_state ../src/gallium/drivers/zink/zink_program.c:1074
#12 0x7fe20c3e29ad in util_shader_reference ../src/gallium/auxiliary/util/u_live_shader_cache.c:188
#13 0x7fe20e3ba11e in zink_delete_cached_shader_state ../src/gallium/drivers/zink/zink_program.c:1093
#14 0x7fe20c41709e in tc_call_delete_fs_state ../src/gallium/auxiliary/util/u_threaded_context.c:998
#15 0x7fe20c411706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
#16 0x7fe20c4124ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
#17 0x7fe20c423683 in tc_flush ../src/gallium/auxiliary/util/u_threaded_context.c:3003
#18 0x7fe20b62d996 in st_flush ../src/mesa/state_tracker/st_cb_flush.c:60
#19 0x7fe20b62dbe3 in st_glFlush ../src/mesa/state_tracker/st_cb_flush.c:94
#20 0x7fe20ae4bded in _mesa_make_current ../src/mesa/main/context.c:1493
#21 0x7fe20ae49702 in _mesa_free_context_data ../src/mesa/main/context.c:1187
#22 0x7fe20b65668b in st_destroy_context ../src/mesa/state_tracker/st_context.c:1005
#23 0x7fe20b7055ab in st_context_destroy ../src/mesa/state_tracker/st_manager.c:944
#24 0x7fe20a9c75bd in dri_destroy_context ../src/gallium/frontends/dri/dri_context.c:256
#25 0x7fe20a9d4bef in driDestroyContext ../src/gallium/frontends/dri/dri_util.c:534
#26 0x7fe22361f25c in drisw_destroy_context ../src/glx/drisw_glx.c:429
#27 0x7fe223625d95 in glXDestroyContext ../src/glx/glxcmds.c:523
#28 0x7fe22636aaeb in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GLX/libglx.c:332
#29 0x7fe2269d9e7d in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GL/g_libglglxwrapper.c:384
previously allocated by thread T0 here:
#0 0x7fe226cb691f in __interceptor_malloc (/usr/lib64/libasan.so.6+0xae91f)
#1 0x7fe20aab0c81 in ralloc_size ../src/util/ralloc.c:120
#2 0x7fe20aab0e33 in rzalloc_size ../src/util/ralloc.c:153
#3 0x7fe20aab12c8 in rzalloc_array_size ../src/util/ralloc.c:233
#4 0x7fe20e26c76d in allocate_desc_set ../src/gallium/drivers/zink/zink_descriptors.c:657
#5 0x7fe20e26e9cb in zink_descriptor_set_get ../src/gallium/drivers/zink/zink_descriptors.c:840
#6 0x7fe20e2747aa in zink_descriptors_update ../src/gallium/drivers/zink/zink_descriptors.c:1424
#7 0x7fe20e36fc48 in void zink_draw<(zink_multidraw)1, (zink_dynamic_state)2, true, false>(pipe_context*, pipe_draw_info const*, unsigned int, pipe_draw_indirect_info const*, pipe_draw_start_count_bias const*, unsigned int, pipe_vertex_state*, unsigned int) ../src/gallium/drivers/zink/zink_draw.cpp:788
#8 0x7fe20e29166d in zink_draw_vbo<(zink_multidraw)1, (zink_dynamic_state)2, true> ../src/gallium/drivers/zink/zink_draw.cpp:907
#9 0x7fe20c424982 in tc_call_draw_single ../src/gallium/auxiliary/util/u_threaded_context.c:3155
#10 0x7fe20c411706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
#11 0x7fe20c4124ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
#12 0x7fe20c41f7a9 in tc_texture_map ../src/gallium/auxiliary/util/u_threaded_context.c:2279
#13 0x7fe20b630757 in pipe_texture_map_3d ../src/gallium/auxiliary/util/u_inlines.h:572
#14 0x7fe20b6341f6 in st_ReadPixels ../src/mesa/state_tracker/st_cb_readpixels.c:546
#15 0x7fe20b42fea7 in read_pixels ../src/mesa/main/readpix.c:1178
#16 0x7fe20b42fea7 in _mesa_ReadnPixelsARB ../src/mesa/main/readpix.c:1195
#17 0x7fe20b42ffc0 in _mesa_ReadPixels ../src/mesa/main/readpix.c:1210
#18 0x2a6d094 in glu::readPixels(glu::RenderContext const&, int, int, tcu::PixelBufferAccess const&) /home/zmike/src/VK-GL-CTS/framework/opengl/gluPixelTransfer.cpp:61
#19 0x29eaa06 in deqp::gls::ShaderExecUtil::FragmentOutExecutor::execute(int, void const* const*, void* const*) /home/zmike/src/VK-GL-CTS/modules/glshared/glsShaderExecUtil.cpp:677
#20 0x25a600b in iterate /home/zmike/src/VK-GL-CTS/modules/gles31/functional/es31fOpaqueTypeIndexingTests.cpp:585
#21 0x2322b53 in deqp::gles31::TestCaseWrapper<deqp::gles31::TestPackage>::iterate(tcu::TestCase*) /home/zmike/src/VK-GL-CTS/modules/gles31/tes31TestCaseWrapper.hpp:86
#22 0x2c376fd in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:302
#23 0x2c366e3 in tcu::TestSessionExecutor::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:139
#24 0x2c00b0c in tcu::App::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuApp.cpp:221
#25 0x4141b7 in main /home/zmike/src/VK-GL-CTS/framework/platform/tcuMain.cpp:58
#26 0x7fe2263e155f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
cc: mesa-stable
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15173>
this ensures that swapchain images will have been acquired before potentially
accessing swapchain images bound as descriptors
fixes caselist like:
dEQP-GLES31.functional.fbo.color.texcubearray.r8ui
dEQP-GLES31.functional.primitive_bounding_box.blit_fbo.blit_default_to_fbo
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15173>
Correct type for sysctl argument to fix the build.
../src/util/u_cpu_detect.c:631:29: error: incompatible pointer types passing 'int *' to parameter of type 'size_t *' (aka 'unsigned long *') [-Werror,-Wincompatible-pointer-types]
sysctl(mib, 2, &ncpu, &len, NULL, 0);
^~~~
Fixes: 5623c75e40 ("util: Fix setting nr_cpus on some BSD variants")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13448>
move include so va_list will be picked up via stdarg.h
In file included from ../src/util/u_printf.cpp:24:
../src/util/u_printf.h:43:41: error: unknown type name 'va_list'; did you mean '__va_list'?
size_t u_printf_length(const char *fmt, va_list untouched_args);
^~~~~~~
__va_list
/usr/include/machine/_types.h:126:27: note: '__va_list' declared here
typedef __builtin_va_list __va_list;
^
and add includes to u_printf.h as suggested by Ilia Mirkin
stdarg.h for va_list and stddef.h for size_t
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13448>
This required relaxing a core NIR assertion which I don't think is doing
any important validation.
The shader-db effects here are small, but they're important for avoiding a
regression when we start doing per-component DCE in opt_shrink_vectors
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12468)
softpipe shader-db:
total instructions in shared programs: 2859777 -> 2859454 (-0.01%)
instructions in affected programs: 18881 -> 18558 (-1.71%)
total temps in shared programs: 293994 -> 293914 (-0.03%)
temps in affected programs: 418 -> 338 (-19.14%)
i915g:
total instructions in shared programs: 407562 -> 407544 (<.01%)
instructions in affected programs: 570 -> 552 (-3.16%)
r300:
total instructions in shared programs: 1414450 -> 1414459 (<.01%)
instructions in affected programs: 44494 -> 44503 (0.02%)
total vinst in shared programs: 473782 -> 473727 (-0.01%)
vinst in affected programs: 1102 -> 1047 (-4.99%)
total sinst in shared programs: 231224 -> 231216 (<.01%)
sinst in affected programs: 432 -> 424 (-1.85%)
total temps in shared programs: 197605 -> 197607 (<.01%)
temps in affected programs: 103 -> 105 (1.94%)
crocus hsw:
total instructions in shared programs: 8158185 -> 8158134 (<.01%)
instructions in affected programs: 10927 -> 10876 (-0.47%)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15178>
Because we have to maintain two different packages of Mesa, one
specific to RADV and another one for RadeonSI and such, it's a bit
annoying to have to synchronize the drirc entries. Currently, only our
Mesa package installs 00-mesa-defaults.conf which means we have to
backport the drirc RADV changes.
This splits 00-mesa-defaults.conf in two to move the drirc RADV entries
to src/amd/vulkan/00-radv-defaults.conf. Meson will install the file
only if RADV is built.
There is still a caveat for common drirc workarounds like for WSI but
they are rare enough and we could still duplicate them if needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15152>
Task shader outputs work differently than other shaders, so they
need special consideration. Essentially, they have two kinds of
outputs:
1. Number of mesh shader workgroups to launch.
Will be still represented by a shader output.
2. Optional payload of up to (at least) 16K bytes.
These payload variables behave similarly to shared memory, but
the spec doesn't actually define them as shared memory (also, they
may be implemented differently by each backend), so we need to add
a new NIR variable mode for them.
These payload variables can't be represented by shader outputs
because the 16K bytes don't fit the 32x vec4 model that NIR uses
for its output variables.
This patch adds a new NIR variable mode: nir_var_mem_task_payload
and corresponding explicit I/O intrinsics, as well as support for
this new mode in nir_lower_io.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14930>
The problem is that the real workgroup launched on NGG HW
can be larger than the size specified by the API, and the
extra waves need to keep up with barriers in the API waves.
There are 2 different cases:
1. The whole API workgroup fits in a single wave.
We can shrink the barriers to subgroup scope and
don't need to insert any extra ones.
2. The API workgroup occupies multiple waves, but not
all. In this case, we emit code that consumes every
barrier on the extra waves.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15034>
This commit adds support for Vulkan backend on a630_skqp job.
= Needed changes
- Needed to install libvulkan-dev package on system
- Refactored the way the available skqp reports are printed
tested in development builds with skia tools
Piglit expectations had to be updated in various drivers due to !14750 not
having bumped the tags when it tried to uprev.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14686>
The Android CTS 10 version is relative old when compared with skia main
branch, which was being used before. Some modifications in the skqp
build/runner scripts were needed to make it run on CI.
- skqp versions from android-cts have already all assets inside
platform_tools folder.
- along with the assets, are the render and unit files which are
expected to pass in the Android CTS execution.
- removed custom test files from the a630 folder, to make it comply
with the CTS expectations.
- include new patches to remove Python2 dependencies and avoid the
installation of it in rootfs.
- strip binariesthe built binaries `skqp` and `list_gpu_unit_tests`, as
`is_debug = false` gn argument did not work, maybe it is not well
tested in development builds with skia tools
- use Clang instead of GCC. The GCC support is not so graceful as it is
in the skia main branch, some NEON instructions needs to be turned off
in the GCC compilation, causing different tests result. This change
does not imply a bigger rootfs, since the built skqp binary uses GCC
libc++ and other library runtimes. So clang is just a build
dependency.
= Changes in skqp results =
Some errors were found for GL backend and unit tests. GLES and VK tests are green.
All the failed tests were classified as expected to fail in the render and unit tests list.
```
gl_blur2rectsnonninepatch
gl_bug339297_as_clip
gl_bug6083
gl_dashtextcaps
```
```
SRGBReadWritePixels (../../tests/SRGBReadWritePixelsTest.cpp:214 Could not create sRGB surface context. [OpenGL])
```
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14686>
This extension isn't wired up in Gallium, so there's just no way a
Gallium driver like Panfrost exposes it.
While there were support in i965 for this for the cancelled Broxton GPU,
thre's no such support in the Iris driver. And since Broxton has been
cancelled, it's unlikely to be wired up any time soon.
Fixes: da23a31726 ("docs/features: Update ASTC entries for Panfrost")
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15145>
When we shadow a resource, the backing BO is changed; as such,
existing references to the resource become invalid. So batches accessing the
resource need to be flushed (or otherwise have their references invalidated).
The wrong behaviour change (not flushing) was introduced when we started
tracking resources instead of BOs. The issue manifested as a severe performance
regression in glmark2's -bbuffer test, particular the subdata subtest. The issue
is magnified on slow CPUs; without the fix, the test becomes completely CPU
bound
Relevant glmark2 -bbuffer test from 43fps to 84fps.
Apparently, this causes functional issues too -- this performance-minded change
also fixes a few piglits.
Fixes: cecb889481 ("panfrost: Do tracking of resources, not BOs")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Chris Healy <cphealy@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13502>
The MI_FLUSH_DW post-sync operation uses the same encoding as the
PIPE_CONTROL one so we can use the same helper. Write PS Depth Count
is not supported, of course, as the blitter has no depth pipeline.
This means that we can write the timestamp register from the blitter.
Fixes: 604d97671b ("iris: Add support for flushing the blitter (hackily)")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15157>
To make sure it is applied in the cases we expect it to be, to avoid code
generation regressions. Functional regressions are expected to be caught by
integration-testing, so that is not focused on here.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9438>
If a message-passing instruction like LD_VAR is preloaded, it will no longer be
counted in the shader cycle counts. Add a special message preload counter that
approximates the cost of preloading, so this information doesn't get a lost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9438>
The compiler will soon produce preloaded messages, but it should not pack them
itself, as this would require depending on GenXML or handcoding bitfields / bit
packs in the compiler. Instead, add a struct encoding the unpacked form of the
message, used as ABI between the compiler and the common driver.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9438>
On V3D the quality of the code we generate is significantly affected by
how we decide to assign accumulators during register allocation, which
is determined by liveness, favoring short-lived temps.
There are many shaders that end up doing a whole lot of uniform loads
first, and using them later, which is very inconvenient for our register
allocation process because this increases uniform liveness and causes
us to use accumulators less efficientely, leading to significant churn.
To fix this, we move uniforms right before their first use in the same
block, but we need to do this after NIR scheduling, which means we are
doing it in non-SSA form, since the scheduler has a tendency to undo
this optimization and it is not easy to modify it to avoid it, since it
works in more abstract terms, using instruction dependencies, estimated
register pressure and instruction delay information to do its work,
which are very different concepts.
total instructions in shared programs: 13316738 -> 13033613 (-2.13%)
instructions in affected programs: 10389172 -> 10106047 (-2.73%)
helped: 55442
HURT: 16144
total threads in shared programs: 413722 -> 415048 (0.32%)
threads in affected programs: 1428 -> 2754 (92.86%)
helped: 680
HURT: 17
total loops in shared programs: 1716 -> 1690 (-1.52%)
loops in affected programs: 26 -> 0
helped: 26
HURT: 0
total uniforms in shared programs: 3704313 -> 3705181 (0.02%)
uniforms in affected programs: 687730 -> 688598 (0.13%)
helped: 2920
HURT: 7384
total max-temps in shared programs: 2364785 -> 2175190 (-8.02%)
max-temps in affected programs: 1215387 -> 1025792 (-15.60%)
helped: 49667
HURT: 1556
total spills in shared programs: 4241 -> 4248 (0.17%)
spills in affected programs: 642 -> 649 (1.09%)
helped: 11
HURT: 19
total fills in shared programs: 6115 -> 6125 (0.16%)
fills in affected programs: 1276 -> 1286 (0.78%)
helped: 11
HURT: 21
total sfu-stalls in shared programs: 34381 -> 36578 (6.39%)
sfu-stalls in affected programs: 16055 -> 18252 (13.68%)
helped: 3647
HURT: 5206
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15056>
On Bifrost and Valhall, push uniforms are loaded into Fast Access Uniform
Random Access Memory (FAU-RAM). FAU-RAM is organized as an array of 64-bit
slots. A given tuple (Bifrost) or instruction (Valhall) may access at most a
single 64-bit slot. If an instruction requires uniforms from multiple 64-bit
slots, a uniform-to-register move must be inserted to avoid the hazard. However,
if an instruction requires a pair of 32-bit uniforms from the same 64-bit slot,
no move is required.
To reduce the number of moves we emit, this commit adds an optimization pass
that reorders pushed uniforms, trying to group uniforms used by the same
instruction. The pass works by creating a graph of pushed uniforms, where edges
denote the "both 32-bit uniforms required by the same instruction" relationship.
We perform depth-first search on this graph to find the connected components,
where each connected component is a cluster of uniforms that are used together.
We then select pairs of uniforms from each connected component. The remaining
unpaired uniforms (from components of odd sizes) are paired together
arbitrarily.
In principle, we should weight the graph by number of occurences and choose
pairs that maximize the total selected edge weight. This is left for
future work, as it is nontrivial -- selecting these edges optimally appears to
be NP-hard at first blush.
Implementation note: As position and varying shaders share FAU on Bifrost, extra
care is taken with a `push_offset` shader stage info parameter that ensures
varying shaders do not reorder uniforms selected by the previous position
shader.
total instructions in shared programs: 2503343 -> 2451758 (-2.06%)
instructions in affected programs: 1553309 -> 1501724 (-3.32%)
helped: 14256
HURT: 8
helped stats (abs) min: 1.0 max: 80.0 x̄: 3.62 x̃: 3
helped stats (rel) min: 0.06% max: 36.36% x̄: 7.31% x̃: 6.67%
HURT stats (abs) min: 1.0 max: 2.0 x̄: 1.38 x̃: 1
HURT stats (rel) min: 1.30% max: 12.50% x̄: 4.99% x̃: 3.85%
95% mean confidence interval for instructions value: -3.66 -3.58
95% mean confidence interval for instructions %-change: -7.41% -7.20%
Instructions are helped.
total tuples in shared programs: 2008399 -> 1969627 (-1.93%)
tuples in affected programs: 1146344 -> 1107572 (-3.38%)
helped: 12867
HURT: 147
helped stats (abs) min: 1.0 max: 61.0 x̄: 3.03 x̃: 2
helped stats (rel) min: 0.17% max: 42.86% x̄: 6.79% x̃: 4.65%
HURT stats (abs) min: 1.0 max: 3.0 x̄: 1.20 x̃: 1
HURT stats (rel) min: 0.29% max: 20.00% x̄: 2.12% x̃: 1.19%
95% mean confidence interval for tuples value: -3.03 -2.93
95% mean confidence interval for tuples %-change: -6.82% -6.57%
Tuples are helped.
total clauses in shared programs: 408005 -> 401708 (-1.54%)
clauses in affected programs: 90760 -> 84463 (-6.94%)
helped: 6006
HURT: 164
helped stats (abs) min: 1.0 max: 9.0 x̄: 1.08 x̃: 1
helped stats (rel) min: 0.45% max: 33.33% x̄: 12.44% x̃: 14.29%
HURT stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
HURT stats (rel) min: 1.64% max: 25.00% x̄: 9.81% x̃: 5.26%
95% mean confidence interval for clauses value: -1.03 -1.01
95% mean confidence interval for clauses %-change: -12.03% -11.66%
Clauses are helped.
total cycles in shared programs: 203308.37 -> 202737.83 (-0.28%)
cycles in affected programs: 19264.71 -> 18694.17 (-2.96%)
helped: 3024
HURT: 41
helped stats (abs) min: 0.041665999999999315 max: 2.5416680000000014 x̄: 0.19 x̃: 0
helped stats (rel) min: 0.17% max: 33.33% x̄: 3.83% x̃: 2.83%
HURT stats (abs) min: 0.041665999999999315 max: 0.125 x̄: 0.06 x̃: 0
HURT stats (rel) min: 0.30% max: 5.88% x̄: 1.41% x̃: 0.93%
95% mean confidence interval for cycles value: -0.19 -0.18
95% mean confidence interval for cycles %-change: -3.89% -3.64%
Cycles are helped.
total arith in shared programs: 76265.67 -> 74669.25 (-2.09%)
arith in affected programs: 45001.50 -> 43405.08 (-3.55%)
helped: 12945
HURT: 97
helped stats (abs) min: 0.041665999999999315 max: 2.5416680000000014 x̄: 0.12 x̃: 0
helped stats (rel) min: 0.17% max: 50.00% x̄: 8.06% x̃: 4.88%
HURT stats (abs) min: 0.041665999999999315 max: 0.125 x̄: 0.05 x̃: 0
HURT stats (rel) min: 0.21% max: 33.33% x̄: 2.16% x̃: 0.96%
95% mean confidence interval for arith value: -0.12 -0.12
95% mean confidence interval for arith %-change: -8.16% -7.81%
Arith are helped.
total quadwords in shared programs: 1796563 -> 1766803 (-1.66%)
quadwords in affected programs: 948830 -> 919070 (-3.14%)
helped: 12078
HURT: 219
helped stats (abs) min: 1.0 max: 42.0 x̄: 2.49 x̃: 2
helped stats (rel) min: 0.10% max: 33.33% x̄: 5.57% x̃: 5.26%
HURT stats (abs) min: 1.0 max: 4.0 x̄: 1.21 x̃: 1
HURT stats (rel) min: 0.33% max: 6.67% x̄: 2.00% x̃: 1.14%
95% mean confidence interval for quadwords value: -2.46 -2.38
95% mean confidence interval for quadwords %-change: -5.52% -5.36%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14163>
Gives us memory back faster which is useful for pathalogical CTS
tests.
The GLSL IR was previously used after converting to NIR for things
like building the GL resource list but we have had a NIR version
for this for some time and I don't believe there are any other
use cases left for keeping the old IR hanging around this long.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15127>
These runners are configured to have a single job take up the whole
runner, which means we get to use threads to our hearts content. The pile
of cores means we don't need to spawn separate jobs to try to load-balance
across fdo's shared runner capacity. Having dedicated runners means we
won't get our MRs blocked as much waiting on non-Mesa testing happening on
fd.o.
We manage to complete all of this llvmpipe testing in about 6:15.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14962>
When handle->type is WINSYS_HANDLE_TYPE_FD, the caller wants a file descriptor
for the BO backing the resource. We previously had two paths for this:
1. If rsrc->scanout is available, we prime the GEM handle from the KMS device
(rsrc->scanout->handle) to a file descriptor via the KMS device.
2. If rsrc->scanout is not available, we prime the GEM handle from the GPU
(bo->gem_handle) to a file descriptor via the GPU device.
In both cases, the caller passes in a resource (with BO) and expects out a file
descriptor. There are no direct GEM handles in the function signature; the
caller doesn't care which GEM handle we prime to get the file descriptor. In
principle, both paths produce the same file descriptor for the same BO, since
both GEM handles represent the same underlying resource (viewed from different
devices).
On grounds of redundancy alone, it makes sense to remove the rsrc->scanout path.
Why have a path that only works sometimes, when we have another path that works
always?
In fact, the issues with the rsrc->scanout path are deeper. rsrc->scanout is
populated by renderonly_create_gpu_import_for_resource, which does the
following:
1. Get a file descriptor for the resource by resource_get_handle with
WINSYS_HANDLE_TYPE_FD
2. Prime the file descriptor to a GEM handle via the KMS device.
Here comes strike number 2: in order to get a file descriptor via the KMS
device, we had to /already/ get a file descriptor via the GPU device. If we go
down the KMS device path, we effectively round trip:
GPU handle -> fd -> KMS handle -> fd
There is no good reason to do this; if everything works, the fd is the same in
each case. If everything works. If.
The lifetimes of the GPU handle and the KMS handle are not necessarily bound. In
principle, a resource can be created with scanout (constructing a KMS handle).
Then the KMS view can be destroyed (invalidating the GEM handle for the KMS
device), even though the underlying resource is still valid. Notice the GPU
handle is still valid; its lifetime is tied to the resource itself. Then a
caller can ask for the FD for the resource; as the resource is still valid, this
is sensible. Under the scanout path, we try to get the FD by priming the GEM
handle on the KMS device... but that GEM handle is no longer valid, causing the
PRIME ioctl to fail with ENOENT. On the other hand, if we primed the GPU GEM
handle, everything works as expected.
These edge cases are not theoretical; recent versions of Xwayland trigger this
ENOENT, causing issue #5758 on all Panfrost devices. As far as I can tell, no
other kmsro driver has this 'special' kmsro path; the only part of
resource_get_handle that needs special handling for kmsro is getting a KMS
handle.
Let's remove the broken, useless path, fix Xwayland, bring us in line with other
drivers, and delete some code.
Thank you for coming to my ted talk.
Closes: #5758
Fixes: 7da251fc72 ("panfrost: Check in sources for command stream")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-and-tested-by: Jan Palus <jpalus@fastmail.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: James Jones <jajones@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Tested-by: Dan Johansen <strit@manjaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15120>
We always blit from a particular level, so it's a waste to compute the LOD.
This corresponds to a simple texture instruction with implement 0 LOD, which is
the optimal texturing path on Bifrost -- it maps to TEXS_2D but does not require
helper invocations.
Functional change on Bifrost: Blit shaders no longer set .computed_lod or
shader_contains_barrier.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15123>
There are always set to true. Don't pollute the driver code with them, make
their existence a local detail to pre-Valhall XML and that's it.
Functional change: "four components per vertex" is now set on vertex job DCDs.
This should be a no-op.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15123>
This reduces code duplication and will ease Valhall porting. Functional changes
on v7:
* Shader contains barrier is now set (perf loss, fixed later in series)
* Shader register allocation is now set (perf win)
* Point sprite inverted, no-op for blit shaders
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15123>
Trivially implemented by using A6XX_GRAS_SC_CNTL_SINGLE_PRIM_MODE.
This extension is useful for emulators e.g. AetherSX2 PS2 emulator and
could drastically improve performance when blending is emulated.
Relevant tests:
dEQP-VK.rasterization.rasterization_order_attachment_access.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15106>
Otherwise a shader invocation would read the value which should have
been set AFTER this shader invocation.
Fixes tests:
dEQP-VK.rasterization.rasterization_order_attachment_access.depth.samples_1.multi_draw_barriers
dEQP-VK.rasterization.rasterization_order_attachment_access.stencil.samples_1.multi_draw_barriers
Fixes: 71595a189a
("tu: Fix feedback loops in sysmem mode")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15106>
XeHP scratch space is handled differently. Commit ae18e1e707
implemented support for it, but handled it differently between render
and compute shaders: it calculates scratch_addr differently and
doesn't pin the buffer on compute. Make it work on compute shaders by
calling pin_scratch_space() from iris_compute_walker(), which fixes
both the address and the pinning.
This commit can be verified by the two-year-old-but-still-unreviewed
Piglit MR 234. You can also verify this by running a very simple
compute shader with INTEL_DEBUG=spill_fs.
References: https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/234
Fixes: ae18e1e707 ("iris: Add support for scratch on XeHP")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15070>
This raises our vertex input bindings limit from 28 to 31, and our
vertex input attribute limit from 28 to 29. We could theoretically
go higher, but it will take additional work.
The 3DSTATE_VERTEX_BUFFERS and 3DSTATE_VERTEX_ELEMENTS limits are 33
vertex buffers, and 34 vertex elements. But we need up to two vertex
elements for system values (FirstVertex, BaseVertex, BaseInstance,
DrawID), and we currently use two vertex bindings for those.
There is another hidden limit: our compiler backend only supports the
push model for VS inputs currently. 3DSTATE_VS only allows URB Read
Lengths between [0, 15], which is measured in pairs of inputs, which
means we can theoretically push no more than 32 vertex elements. This
is no artifical limit either, as a vec4 element takes up 4 registers
in the payload, and 32 * 4 = 128, the entire size of our register file.
Plus, the VS Thread payload needs at least g0 and g1 for other things,
so we can really only push 31.
We can theoretically support one additional binding, by combining our
two SGV bindings into a single upload. In order to support additional
vertex elements, we would need to add support to the backend compiler
for the pull model for VS inputs.
References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5917
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14991>
I got confused while writing this somehow because of the null descriptor
feature, which enables drivers to consume a null descriptor, which has no
relation to a descriptor layout containing no descriptors
failing to accurately use zero descriptors can put layouts over the maximum
per-stage limits, which causes tests to crash
fixes (lavapipe):
KHR-GL46.shading_language_420pack.binding_uniform_block_array
KHR-GL46.multi_bind.dispatch_bind_buffers_base
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15067>
I chose to implement this as a global flag in the device, because
otherwise we would end up with extra draw overhead trying to avoid it in
the implicit-sync WSI case, and you're probably going to end up needing
implicit sync anyway because you used one of the BOs in any of the
submitted cmdbufs. To do better than this, we would probably want a
skip-implicit-sync flag on the BOs in the BO list, rather than global on
the submit.
Reports about venus on turnip say that this flag reduces worst-case
QueueSubmit time in a game workload from ~10ms to ~4ms.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14838>
Bifrost post-RA dead code elimination can cull the destinations of
regular ALU instructions, by weakening from a register write to a
temporary write. However, there is no way to suppress staging writes, so
culling the destinations will result in invalid code generation.
Fixes a regression in
dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_static_vertex
with scoreboarding. The root cause there is the backend dead code
elimination not being sufficiently aggressive in the presence of control
flow. Usually this does not matter, since the backend optimizations are
intended to be local with global optimizations happening in NIR.
Unfortunately, our implementation of IDVS hits this hard. That will need
to be optimized (probably by specializing IDVS shaders in NIR instead of
the backend). In the mean time, let's fix the actual bug affecting
scoreboarding.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14298>
"raw" (IDXEN=0) and "structured" (IDXEN=1) do bounds checking differently.
From `si_make_buffer_descriptor`:
* - For VMEM and inst.IDXEN == 0 or STRIDE == 0, it's in byte units.
* - For VMEM and inst.IDXEN == 1 and STRIDE != 0, it's in units of STRIDE.
so there is a difference between setting vindex = i32_0 and vindex = NULL.
Instead of having the `structured` flag, we can just check if vindex is NULL.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15098>
"raw" (IDXEN=0) and "structured" (IDXEN=1) do bounds checking differently.
From `si_make_buffer_descriptor`:
* - For VMEM and inst.IDXEN == 0 or STRIDE == 0, it's in byte units.
* - For VMEM and inst.IDXEN == 1 and STRIDE != 0, it's in units of STRIDE.
so there is a difference between setting vindex = i32_0 and vindex = NULL.
Instead of having the `structured` flag, we can just check if vindex is NULL.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15098>
If we have a postponed spill, the temp we create at ip is no longer
the spilled temp and therefore is affected by the thrsw injection.
Fixes corruption in the additive blending animation demo from
Three.js.
Fixes: f3c3228522 ('broadcom/compiler: do not rebuild the interference graph after each spill')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15112>
Not all Vulkan implementations allows rendering to linear images, so in
order to support scanning out from these on Windows we might have to copy
through a buffer like we do in the PRIME path.
To avoid reimplementing the same, let's instead generalize the code a
bit so it doesn't have to specfy any PRIME-specific details.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12210>
Application may exit without freeing created display list, this
may leave the cache not empty.
This is triggered by Abaqus which just close X11 display without
calling any of GLX cleanup functions like glXDestroyContext. But
GLX hook to X11 display close function to free GLX screen resource.
So display list as a context resource has not been freed, but
cache as a screen resource is freed.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14926>
DRI3 window back buffer is a client resource, so it's destroyed
when context switch drawable for native window.
But some application like Abaqus may leave a dirty back buffer
and reuse it when switch back. So add a driconf option for these
kind of app to keep the entire GLX drawable for native window.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14926>
When we spill we add new temps. We should be careful not to access
liveness for these until we have re-computed it after all spills and
fill for that the spilled temp have been processed so as to avoid
out-of-bounds accesses to the c->temp_start and c->temp_end arrays.
This fixes a crash in a Three.js demo when we try to patch register
classes after a TMU spill that was caused because we would incorrectly
try to patch the same temps we had just added for the spill itself,
which is not only unnecessary but also incorrect since we these temps
would not have liveness information available yet and thus would
cause out of bounds accesses.
Fixes: f3c3228522 ('broadcom/compiler: do not rebuild the interference graph after each spill')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15107>
We need to take ownership of shader keys before we can insert them into
the hash table. Caught by ASan.
==6343==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x00016bc51410 at pc 0x00010498d6cc bp 0x00016bc50240 sp 0x00016bc4f9d0
READ of size 592 at 0x00016bc51410 thread T0
#0 0x10498d6c8 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long)+0x208 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x196c8)
#1 0x10498da08 in wrap_memcmp+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x19a08)
#2 0x10b7f3f18 in asahi_shader_key_equal agx_state.c:867
#3 0x10a482e7c in hash_table_search hash_table.c:325
#4 0x10b7f4e94 in agx_update_shader agx_state.c:899
#5 0x10b7f0dc4 in agx_draw_vbo agx_state.c:1590
#6 0x10a7c28c4 in u_vbuf_draw_vbo u_vbuf.c:1498
#7 0x10a5db03c in cso_multi_draw cso_context.c:1639
#8 0x10aed03d0 in _mesa_validated_drawrangeelements draw.c:1812
#9 0x10aed08d4 in _mesa_DrawElements draw.c:1945
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15109>
This removes the block at the end of fragment processing and allow
multiple scenes to be queued to the rasterizer up to MAX_SCENES.
Running heaven with this shows up to 39 scenes been allocated in the
first few renders, but it also removes all stalls in rendering until
the present stalls for completion. It reduces frame rendering times
(not enough to make it > 1 fps) but looks to be about 10% increase.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
When llvmpipe is allowed execute fragment shaders overlapping
with other stuff, we have to start using the pipe fences.
With presentation, the acquire path needs to signal a semaphore
that can be waited on by the user, so add support for passing
signal/wait semaphores for non-timeline in, and just put a
fence pointer in the semaphore for that case.
This fixes rendering once we allow overlapping rasterization.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
Currently neither llvmpipe or softpipe ever leave any drawing in
the pipeline, but I'd like to change that for llvmpipe.
This makes drisw block for completed rendering before sending
data to the X server.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
If the luminance formats aren't renderable, they back out to R*
formats, but those will end up with a 1 in alpha rather than 0 when
textured. So instead make them explicitly renderable, which will cause
the correct texture format swizzle to be applied.
Fixes query-rgba-signed-components and probably others.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15097>
Until now we have lived without a refcount mechanism in the driver
because in Vulkan the user is responsible for handling the life
span of memory allocations for all Vulkan objects, however,
imported BOs are tricky because the kernel doesn't refcount
so user-space needs to make sure that:
1. When importing a BO into the same device used to create it
(self-importing) it does not double free the same BO.
2. Frees imported BOs that were not allocated through the same
device.
Our initial implementation always freed BOs when requested,
so we handled 2) correctly but not 1) on drm and we would
double-free self-imported BOs because kernel doesn't return
a unique gem_handle on each import.
Beside this the submit ioctl checks for duplicates in the
BO list and returns an error if there is one.
This fixes the problem for good by adding refcounts to BOs
so that self-imported BOs have a refcnt > 1 and are only freed
when all references are freed.
KGSL on the other hand does not have the same problems,
at least not with ION buffers which are used for exportable
BOs on pre 5.10 android kernels.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5936
Fixes CTS tests: dEQP-VK.drm_format_modifiers.export_import.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15031>
It is good to know that we run out of ring space and have to wait. This
happens easily with fossilize-replay because encoding a
vkCreateGraphicsPipeline takes microseconds while executing it can take
milliseconds, >100ms sometimes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14966>
This reverts commit 29d319c767.
Now that we use nir_lower_bool_to_bitsize, we don't see 1-bit booleans
anymore, so the issue this fixed doesn't apply. Actually, that issue was
(in part) why I started looking into boolean handling in the first
place.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14576>
This lets us avoid generating SWZ instructions. Those instructions could
be constant folded but that complicates the replication analysis
introduced in the next commit.
Almost no shader-db changes.
quadwords HURT: shaders/glmark/1-22.shader_test MESA_SHADER_FRAGMENT: 718 -> 722 (0.56%)
total quadwords in shared programs: 68169 -> 68173 (<.01%)
quadwords in affected programs: 718 -> 722 (0.56%)
helped: 0
HURT: 1
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14576>
We've got lots of VK coverage on 618, so take some of the load off (but
leave a little bit of testing just to make sure we don't totally break
630). This should help with our Marge times since we've added some other
coverage to 630 that's started overloading the runners.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15085>
The old calculation for mat3 was clever, but it turns out that a
straightforward application of subdeterminants similar to how mat4 is
handled is more efficient: on a scalar architecture with some sort of
combined multiply+add instruction with a negate modifier (both fairly
common), the new determinant is 9 instructions vs. 15 for the old one,
and without the multiply-add it's 14 instructions vs. 18 for the old
one. When used as a routine for inverse() the savings are compounded,
because we now use the same method as used to compute the adjucate
matrix and so CSE can combine most of the calculations with the adjucate
matrix ones.
Once mat3 and mat4 use the same method for computing determinants, we
can combine them into a single recursive function. I also pulled up the
mat_subdet() function because it was doing basically what we need, so
it's now shared between determinant and inverse. This shrinks the
implementation significantly, as can be seen from the diffstat.
The real reason I want to change this, though, is that it fixes
dEQP-VK.glsl.builtin.precision_fp16_storage16b.inverse.compute.mat3 with
turnip. Qualcomm uses round-to-zero for 16-bit frcp, which combined with
some inaccuracy in the old method of calculating the determinant led us
to fail. Qualcomm's driver uses something like the new method to
calculate the determinant in the inverse. We could argue that Mesa's
method should be allowed, because round-to-zero for floating-point
division is within spec and there are no precision guarantees given for
determinant() or inverse(). However we might as well use the more
efficient method.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14652>
It won't work if the blob is fixed-size and we overrun the size, which
will be the case with the Vulkan pipeline cache.
This gets a bit tricky for the repeated-header optimization, because we
can't read the header from the blob. Instead we have to store the header
itself.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15028>
Label IDVS variants as being MESA_SHADER_{POSITION, VARYING} stages;
reserve the MESA_SHADER_VERTEX label for non-IDVS shaders. This reduces
confusion where a single shader compiles to two MESA_SHADER_VERTEX
shaders with different stats.
While we're at it, de-vendor the blend shader stage name; these stats
are internal anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15086>
C++ requires explicit casts from integers to enums. Fixes errors like
the following when trying to use Asahi GenXML from a GTest unit test.
src/asahi/lib/agx_pack.h:554:23: error: assigning to 'enum agx_channels' from incompatible type 'uint64_t' (aka 'unsigned long long')
values->channels = __gen_unpack_uint(cl, 0, 6);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
The texture descriptor we construct for reloading needs to respect the
surface's texture/layer selection. Fix exactly the same bug as
b8c31ac06d ("lima: fix glCopyTexSubImage2D").
Fixes:
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.2d_rgb
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.2d_rgba
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.cube_rgb
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.cube_rgba
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
We can only use linear for 2D images, not even 2D arrays. Even for 2D
images, we only want to use linear if:
* We are required to use linear due to window system requirements.
* The texture is streaming.
Otherwise, we want to use tiled textures. (Or better, compressed, but we
don't support that yet.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
nir_alu_instr_is_comparison needs to consider all comparison opcodes regardless
of size. Otherwise, they will be missed by nir_opt_move/sink.
Without this change, lowering booleans to integers regresses register
pressure (and spills/fills) significantly in certain shaders on Panfrost,
like android/com.miHoYo.GenshinImpact/1420.shader_test.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15073>
The Vulkan 1.3 spec says:
"The implementation-dependent limit bufferImageGranularity specifies
a page-like granularity at which linear and non-linear resources
must be placed in adjacent memory locations to avoid aliasing. Two
resources which do not satisfy this granularity requirement are said
to alias. bufferImageGranularity is specified in bytes, and must be
a power of two. Implementations which do not impose a granularity
restriction may report a bufferImageGranularity value of one.
Note: Despite its name, bufferImageGranularity is really a
granularity between "linear" and "non-linear" resources."
We set this limit to 64 bytes (a cacheline) at the dawn of time, without
any real rationale attached. There shouldn't be any restrictions here.
Our tile sizes are typically 4K, and tiled resource addresses are
aligned to the tile size, and the extent is also a multiple of the tile
sized. So if a linear resource occurs before a tiled one, there will
naturally be some space due to the alignment of the tiled resource's
starting address. If a linear resource occurs after a tiled one, the
tiled resource's ending address is already 4K aligned, which is already
guaranteeing that they won't share a cacheline.
So I think it should be fine to reduce this to 1. The other Vulkan
driver for our hardware seems to advertise 1 here as well.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15066>
Instead, we only recompute liveness and we add new nodes and
interferences to the graph manually (we also need to patch
register classes in some cases).
To assist in this process, we also add an ip counter to our
instructions that we also recompute after each spill, which we use
to identify registers that cross thrsw boundries introduced with
TMU spills and fills and adjust their register classes accordingly
(removing their capacity to use accumulators).
This significantly reduces the CPU cost of spills. Using
shaders/closed/gputest/piano/7.shader_test as reference:
Compile time up to the first successful compile strategy in main is
~24s and with this change it is ~11s. With this speed up, we can now
try all 2-thread compile strategies (including the fallback scheduler)
in only ~15s.
A full shader-db run results in:
Total CPU time (seconds): 9904.67 -> 9087.98 (-8.25%)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
We may be pipelining TMU writes and reads, in which case we can
see both TMUWT and LDTMU at the end of a TMU sequence, so we should
not assume that a TMUWT always terminates a sequence.
Also, we had a bug where we were using inst instead of scan_inst
to check if we find another TMUWT after the curent instruction.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Instead of whether they are allowed to spill or not. This is more flexible.
Also, while we are not currently enabling spilling on any 4-thread strategies,
should we do that in the future, always prefer a 4-thread compile.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Until now we would only allow spilling as a last resort in the
last 2 strategies, however, it is possible that in some cases
earlier strategies may produce less spills if we allowed spilling
on them.
Likewise, the fallback scheduler can sometimes produce less spills
than 2 threads with optimizations disabled.
With this change, we start allowing all our 2-thread strategies to
spill, and instead of choosing the first strategy that is successful,
we choose the one that doesn't spill or the one with the least amount
of spilling.
It should be noted that this may incur in a significant increase
of compile times. We will address this in a follow-up patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Commit 38800b38 changed nir_opcodes.py, but that doesn't seem to have
triggered nir_opt_algebraic.py. The change in 75ef5991 depends on
opt_algebraic lowering 16-bit versions of slt, but if opt_algebraic is
not rebuilt, this may not happen. This resulted in some people seeing
assertion failures in, for example,
dEQP-VK.spirv_assembly.instruction.compute.float16.arithmetic_3.step,
due to the backend seeing nir_op_slt that it didn't know how to handle.
v2: Add nir_opcodes.py to nir_algebraic_py so that all the per-driver
algebraic passes pick up the dependency too. Rename it to
nir_algebraic_depends. Suggested by Emma.
Closes: #6047
Fixes: d1992255bb ("meson: Add build Intel "anv" vulkan driver")
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15050>
memcpy is divided into chunks that are vec4 sized max. The problem
here happens with a structure of 24 bytes :
struct {
float3 a;
float3 b;
}
If you memcpy that struct, the lowering will emit 2 load/store, one of
sized 8, next one sized 16. But both end up located at offset 0, so we
effectively drop 2 floats.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a3177cca99 ("nir: Add a lowering pass to lower memcpy")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15049>
The mechanism currently used to pass data from the dEQP child process
executed in a crosvm guest environment towards the deqp-runner wrapper
script that starts the crosvm instance is based on creating, writing
and reading regular files.
In addition to the main drawback of using the storage, this approach
is potentially unreliable because the data cannot be transferred in
real-time and there is no control on ending the transmission. It also
requires a forced sleep for syncing the content, while the minimum
amount of time necessary to wait cannot be easily and safely
determined.
Replace this with an IPC based on the virtio transport for virtual
sockets (virtio-vsock).
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14995>
The img->deferred_info will out-live vn_CreateImage, so we need a deep
copy of the VkImageFormatListCreateInfo struct.
This change also avoids tracking VkImageFormatListCreateInfo struct with
a zero viewFormatCount.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15017>
lava_job_submitter.py unit tests are written in pytest and uses
freezegun in order to simulate timeouts in some tests scenarios. So,
this commit adds the packages `python3-pytest` and `python3-freezegun`
to fulfill this dependencies.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14876>
When the lava_job_submitter.py retry loop finishes normally (without
falling through break-loop) it means that the submitter has exceeded the
retry count limit. However, when it happens the script
finishes normally. This patch adds a treatment to this case, warning the
user what happened and forcing the job to fail.
Moreover, this commit will make retry configurations configurable by
CI job, as it can take the default value from the following variables:
- LAVA_DEVICE_HANGING_TIMEOUT_SEC
- LAVA_WAIT_FOR_DEVICE_POLLING_TIME_SEC
- LAVA_LOG_POLLING_TIME_SEC
- LAVA_NUMBER_OF_RETRIES_TIMEOUT_DETECTION
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14876>
If a secondary command buffer is used and the client provides a
framebuffer and that framebuffer has a stencil-only attchment, we would
try to get the aux usage for the depth component of that attachment and
crash. Check the aspects of the image before looking at aux usage.
This fixes at least the following SkQP tests on my Tigerlake:
- vk_circular-clips
- vk_filterfastbounds
- vk_innershapes_bw
- vk_lineclosepath
- vk_multipicturedraw_rrectclip_simple
- vk_pathinvfill
- vk_quadclosepath
- vk_rrect_clip_bw
- vk_windowrectangles
Fixes: 0d8b9c529c ("anv: Allow PMA optimization to be enabled in secondary command buffers")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15048>
Valhall has a new twist on Mali's task splitting voodoo, plus compute offset
support.
On Bifrost + Vulkan, compute offsets needed lowering on Bifrost (gl_GlobalID).
Valhall saves a few instructions here.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15047>
I was doing some RE on freedreno and we had some questions about when the
hardware might need non-uniform or non-constant array access for various
descriptor types, so let's leave some notes for the next person.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13621>
shader-db results (note that we expect many more loops unrolled, so instr
count is probably understating the win):
nv30:
total instructions in shared programs: 16535069 -> 14299105 (-13.52%)
instructions in affected programs: 16377286 -> 14141322 (-13.65%)
total gpr in shared programs: 81255 -> 67268 (-17.21%)
gpr in affected programs: 56714 -> 42727 (-24.66%)
LOST: 0
GAINED: 824
nv40:
total instructions in shared programs: 20907673 -> 18428749 (-11.86%)
instructions in affected programs: 20755510 -> 18276586 (-11.94%)
total gpr in shared programs: 104200 -> 82831 (-20.51%)
gpr in affected programs: 80278 -> 58909 (-26.62%)
14130
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14130>
With the recent addition of the shortcuts aiming to avoid atomic
operations, the reference count on resources can become unbalanced
in the Tegra driver since they are wrapped and then proxied to the
Nouveau driver.
Fix this by keeping a private reference count.
Fixes: 7688b8ae98 ("st/mesa: eliminate all atomic ops when setting vertex buffers")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
With the recent addition of the shortcuts aiming to avoid atomic
operations, the reference count on sampler views can become unbalanced
in the Tegra driver since they are wrapped and then proxied to the
Nouveau driver.
Fix this by keeping a private reference count.
Fixes: ef5d427413 ("st/mesa: add a mechanism to bypass atomics when binding sampler views")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
I had missed a int -> enum conversion in one recently added function and
it's probably nice to also dump the target value also in
trace_dump_resource_template() so let's do just that.
Signed-off-by: Matti Hamalainen <ccr@tnsp.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14980>
Once we simplified a phi node, we never updated the definition it points
to, which meant that it could become out of date if that definition were
also simplified, and we didn't check that when rewriting sources. That
could happen when there are multiple nested loops with phi nodes at the
header.
Fix it by updating the phi's pointer. Since we always update sources
after visiting the definition it points to, when we go to rewrite a
source, if that source points to a simplified phi, the phi's pointer
can't be pointing to a simplified phi because we already visited the phi
earlier in the pass and updated it, or else it's been simplified in the
meantime and this isn't the last pass. This way we don't need to
keep recursing when rewriting sources.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15035>
This fixes artifacts seen in games when using ASTC transcoding,
we need to use DXT5 for proper alpha channel support.
Number of components is a block specific property, there is no easy
way to see if we will require >1bit alpha support or not, so simply
use DXT5 to have support in place.
Fixes: 91cbe8d855 ("gallium: Add a transcode_astc driconf option")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15029>
This introduces inotify support in RADV to handle changes from the
RADV_FORCE_VRS_CONFIG_FILE. This is similar to MangoHUD. I'm personally
not sure it's the best solution but let's try this way and change it
later if we have issues (or if we have a lightweight solution).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14713>
When I originally added vk_image_view, I was overly clever when it came
to the format field. I decided to make it only contain the bits of the
format contained in the selected aspects. However, this is confusing
(not generally a good thing) and it's also not always what you want.
The Vulkan 1.3.204 spec says:
"When using an image view of a depth/stencil image to populate a
descriptor set (e.g. for sampling in the shader, or for use as an
input attachment), the aspectMask must only include one bit, which
selects whether the image view is used for depth reads (i.e. using a
floating-point sampler or input attachment in the shader) or stencil
reads (i.e. using an unsigned integer sampler or input attachment in
the shader). When an image view of a depth/stencil image is used as
a depth/stencil framebuffer attachment, the aspectMask is ignored
and both depth and stencil image subresources are used."
So, while the restricted format makes sense for texturing, it doesn't
for when the image is being used as an attachment. What we probably
actually want is both versions of the format. We'll call the one given
by the VkImageViewCreateInfo vk_image_view::format and the restricted
one vk_image_view::view_format.
This is just the first commit which switches format to view_format so
the compiler will make sure we get them all. The next commit will
re-add vk_image_view::format but this time unmodified.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15007>
I suspect this double-check and comment was due to originally using ir.nir
as the condition, which might be uninitialized if !IR_NIR. You could only
take the branch if IR_NIR was set, and you should always not take if it
!IR_NIR, so it worked out in the end, but it would cause spurious valgrind
warnings if you hadn't zeroed out your TGSI shader's struct.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14896>
There were two paths in this code: One was transform_loops, which would
try to detect loops with an iteration count it understood and unroll that
many times. The other was emulate_loops, which would just figure out how
many instructions the program could have and still compile (hopefully),
and unroll this loop however many times would fit in that.
The transform_loops had no analysis as good as GLSL or NIR loop unrolling
have, so it shouldn't be missed -- any opportunity it found would only be
due to bugs in the unrolling code.
The emulate_loops path had an issue with computing the number of times it
should try to unroll -- if you had more instrs than ALUs available
already, you'd overflow and unroll approximately infinitely many times,
OOMing the system. But, also, it's better to throw a compiler error about
unsupported loops than to run the loop an incorrect number of times and
call it a success.
Fixes: #5883, #6018
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15004>
this otherwise treates begin/end/begin the same as begin/pause/resume
cc: mesa-stable
fixes:
KHR-GL46.texture_view.view_classes
KHR-GL46.transform_feedback.capture_geometry_separate_test
KHR-GL46.transform_feedback.capture_vertex_separate_test
KHR-GL46.transform_feedback.query_geometry_separate_test
KHR-GL46.transform_feedback.query_vertex_separate_test
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15020>
virglrenderer makes invalid shaders when faced with vector 64-bit
instructions, which GLSL-to-TGSI never produced. While this doesn't fix
everything, it does get more tests running, and virgl probably the primary
consumer of 64-bit TGSI. virgl may be deprecating its host 64-bit
support, at which point we can drop this workaround.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15014>
Prior to the noted MR, virglrenderer encoded the tex operands in a
limited-size temp buffer which we could easily overflow with TGSI
immediates. GLSL-to-TGSI always emitted an extra MOV, so keep that
behavior when nir-to-tgsi feeds us more optimized TGSI.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15014>
GLSL-to-TGSI would emit some MOVs that made things OK, but NTT
successfully copy-propagates the inputs and sysvals more, triggering bugs
in virglrenderer when the signedness of the input is different from that
of the ALU op.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15014>
Various workaround paths in virglrenderer assume that outputs are written
with a full writemask, which is not required by TGSI. To work around
that, store affected outputs in a temp and do a full write after each
writemasked write.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15014>
We really need offsets to be in dwords, not in vec4s.
The bug manifests as random failure of func.mesh.clipdistance.5 crucible
test, where stores to gl_MeshVerticesNV[x].gl_ClipDistance[4+n] actually write to
gl_MeshVerticesNV[x].gl_ClipDistance[1+n].
Fixes: 1f438eb033 ("intel/compiler: Implement Mesh Output")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14997>
Contains stuff needed for layered rendering. Unfortunately, there's no more
provoking vertex per draw -- ugh! That's fine for Vulkan (just don't set
provokingVertexModePerPipeline), but requires inserting extra flushes on desktop
OpenGL.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15003>
For some games that take like 400 MiB of shader binaries, the
number of shader arenas ends up going >1500. Cut that down a bit
by using larger arenas.
8 MiB should still be decent with small BAR and should still cut
things down from ~1500 to ~50 buffers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14591>
Some functions in ppir iterate the ppir_op_info slots arrays looking
for the PPIR_INSTR_SLOT_END token. The dummy/undef internal ops may
appear in the scheduling code and their slots arrays did not contain
that token, which could result in invalid array reads.
Reported by gcc -fsanitize=address.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Cc: 22.0 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14894>
Reported by gcc -fsanitize=address, sometimes gpir regalloc attempts to
handle an uninitialized node->value_reg (containing the value -1), which
results in an invalid array access.
Avoid it for now to prevent crashes, but more investigation may be
required later on.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Cc: 22.0 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14894>
Static analysis complains that spill_costs might be accessed in
non-initialized positions.
It does not seem to be an issue with the current code which initializes
it for every relevant register index, but we can also just initialize it
to not have to worry about that.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14894>
This was needed, so that in case of active helper lanes,
these contain the correct value. It is now handled implicitly.
Totals from 1004 (0.74% of 134913) affected shaders: (GFX10.3)
CodeSize: 7581020 -> 7580892 (-0.00%); split: -0.00%, +0.00%
Instrs: 1454940 -> 1454908 (-0.00%); split: -0.00%, +0.00%
Latency: 12984953 -> 12984894 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 3173037 -> 3173049 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 47498 -> 47273 (-0.47%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
Platforms that don't have flow control also don't have anything that
could be written that has a side effect. It should be safe to implement
these condition writes as
foo = csel(condition, bar, foo);
This should eliminate the last thing in the GLSL compiler that can
create new conditions on assignments. Everything else that can store
something in ir_assignment::condition derives it from a pre-existing
condition.
v2: Fix bad rebase.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14573>
When there are four or fewer elements left in the array partition, the
strategy changes from a binary search of nested flow control to sequence
of conditional assignments like
(assign, dest, src[constant_i+0], index == constant_i+0)
(assign, dest, src[constant_i+1], index == constant_i+1)
(assign, dest, src[constant_i+2], index == constant_i+2)
(assign, dest, src[constant_i+3], index == constant_i+3)
or
(assign, dest[constant_i+0], src, index == constant_i+0)
(assign, dest[constant_i+1], src, index == constant_i+1)
(assign, dest[constant_i+2], src, index == constant_i+2)
(assign, dest[constant_i+3], src, index == constant_i+3)
Realistically, the first case should use ir_triop_csel instead.
The second case will either get turned back into flow control like
if (index == constant_i+0)
(assign, dest[constant_i+0], src)
if (index == constant_i+1)
(assign, dest[constant_i+1], src)
if (index == constant_i+2)
(assign, dest[constant_i+2], src)
if (index == constant_i+3)
(assign, dest[constant_i+3], src)
or a sequence of conditional selects like
(assign, dest[constant_i+0], (csel, index == constant_i+0, src, dest[constant_i+0]))
(assign, dest[constant_i+1], (csel, index == constant_i+1, src, dest[constant_i+1]))
(assign, dest[constant_i+2], (csel, index == constant_i+2, src, dest[constant_i+2]))
(assign, dest[constant_i+3], (csel, index == constant_i+3, src, dest[constant_i+3]))
The former case should continue to use the binary search. The later
case could be generated from the binary search by other lowering passes.
At the end of the day, conditional assignments don't really help
anything here, so stop using them.
Radeon R430
total instructions in shared programs: 2398683 -> 2398419 (-0.01%)
instructions in affected programs: 5143 -> 4879 (-5.13%)
helped: 9
HURT: 8
total vinst in shared programs: 616292 -> 616010 (-0.05%)
vinst in affected programs: 4467 -> 4185 (-6.31%)
helped: 9
HURT: 8
total sinst in shared programs: 315417 -> 315667 (0.08%)
sinst in affected programs: 2568 -> 2818 (9.74%)
helped: 2
HURT: 15
total flowcontrol in shared programs: 1049 -> 1048 (-0.10%)
flowcontrol in affected programs: 7 -> 6 (-14.29%)
helped: 1
HURT: 0
total presub in shared programs: 47027 -> 47027 (0.00%)
presub in affected programs: 127 -> 127 (0.00%)
helped: 1
HURT: 1
total omod in shared programs: 3618 -> 3615 (-0.08%)
omod in affected programs: 8 -> 5 (-37.50%)
helped: 3
HURT: 0
total temps in shared programs: 450757 -> 451312 (0.12%)
temps in affected programs: 837 -> 1392 (66.31%)
helped: 8
HURT: 6
total consts in shared programs: 1031928 -> 1031920 (<.01%)
consts in affected programs: 1211 -> 1203 (-0.66%)
helped: 6
HURT: 7
The shaders that were hurt for temps... are all lies. None of those
shaders should have compiled as all 6 had more than 32 temps to begin
with.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14573>
Use if-statements instead. Any hardware that supports this sort of
tessellation has flow control, so it will probably emit the conditional
assignment using an if-statement anyway. This is definitely what
st_glsl_to_nir does.
v2: Fix copy-and-paste bug in the ir_type_swizzle handling. This bug
caused segfaults in tests/spec/arb_tessellation_shader/execution/variable-indexing/tcs-patch-vec4-swiz-index-wr.shader_test.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14573>
The nir approximation is a bit more precise so there is one more
instruction for the scalar version but if the shader actually uses
vector one or some other stuff like sin(x) followed by cos(x) we
save more.
This nir approximation importantly seems to have better precision
so this should also fix some piglits/dEQPs.
With my shader-db and faked R300:
total instructions in shared programs: 67751 -> 65978 (-2.62%)
instructions in affected programs: 8637 -> 6864 (-20.53%)
total temps in shared programs: 9191 -> 9137 (-0.59%)
temps in affected programs: 486 -> 432 (-11.11%)
total consts in shared programs: 45427 -> 45412 (-0.03%)
consts in affected programs: 856 -> 841 (-1.75%)
total lits in shared programs: 2317 -> 2346 (1.25%)
lits in affected programs: 69 -> 98 (42.03%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14957>
This patch moves the shrinking of store sources into
a separate pass.
The reasoning behind this is that this pass usually only
needs to be called once while nir_shrink_vectors might
better be called several times. This allows to move
the pass(es) out of the optimization loops.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14480>
This wasn't much of a problem before because vk_command_buffer_finish()
doesn't do much on an empty command buffer. However, it's about to be
responsible for managing the pool's list of command buffers so it will
be critical to get this right.
Fixes: c9189f4813 ("anv: Use a common vk_command_buffer structure")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14917>
Now that we have no overlapping memory addresses for different
contexts we can go ahead and also share the same VM for every context.
This should make VM binding much easier to implement once we have
Kernel support for it.
v2: We now have engines_ctx, set it there too.
v3: Fix indenting (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12028>
Have a single border color pool per bufmgr instead of per context.
We want to have a single VM shared among every context and the border
color pool is the last feature preventing us from having that.
Previously we had 1024 colors per context but once the buffer was full
we just waited for the buffer to be unused and restarted it. After
this patch we have 4096 colors for every single context and we can't
just flush buffers if they are full, so we simply return black.
There are many strategies we could try to implement to help alleviate
this new 4096 limit, none of which are implemented by this patch:
- We could just expand the buffer to the full 16MB we can use,
allowing 262144 colors.
- We could use multiple buffers and make the contexts refcount them,
so eventually older buffers would reach zero references and be
recycled, moving us to a working set maximum from a lifetime
maximum.
- We could also make the border color pool be a standard memzone and
then give smaller buffers to each context when they need, so the
limit would be in the number of contexts that can use border color
pools. This was my first implementation but Ken suggested I switch
to the one provided by this patch, which is simpler.
Keep it like this since border colors don't seem to be used very much
and other Mesa drivers such as radeonsi also seem to employ the
"return black once we reach the limit" strategy.
As a last note, we could also move the contents of iris_border_color.c
to iris_bufmgr.c in order to avoid breaking some abstractions we have
in Iris, like we do with iris_bufmgr_get_border_color_pool(). I can do
this in case we want it.
v2: Switch from standard memzone to a per-screen thing (see above).
v3: Actually make it per bufmgr. Just making it per screen is not
enough, since screens can share the same VM, an example being the
gputest benchmark suite.
v4: Rebase.
v5: Remove dead code, lock around hash table lookup (Ken).
v6: Simple rebase.
v7: Another rebase (for_each_batch).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12028>
We're moving towards a path where all contexts share the same virtual
memory - because this will make implementing vm_bind much easier - ,
and to achieve that we need to rework the binder memzone. As it is,
different contexts will choose overlapping addresses. So in this patch
we adjust the Binder to be 1GB - per Ken's suggestion - and use a real
vma_heap for it. As a bonus the code gets simpler since it just reuses
the same pattern we already have for the other memzones.
Credits to Kenneth Granunke for helping me with this change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12028>
this codepath needs to update literally one spirv word to get the right shader
(why can't this be a spec constant?)
so just update that word instead of running all the nir passes and doing a
full recompile
fixes:
KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_MaxPatchVertices_Position_PointSize
KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14976>
by initializing this on context creation, we can ensure that the correct
value is always here
cc: mesa-stable
fixes:
dEQP-GLES31.functional.texture.multisample.samples_1.sample_mask_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_1.sample_mask_and_sample_coverage
dEQP-GLES31.functional.texture.multisample.samples_1.sample_mask_and_sample_coverage_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_1.sample_mask_only
dEQP-GLES31.functional.texture.multisample.samples_2.sample_mask_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_2.sample_mask_and_sample_coverage
dEQP-GLES31.functional.texture.multisample.samples_2.sample_mask_and_sample_coverage_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_2.sample_mask_only
dEQP-GLES31.functional.texture.multisample.samples_3.sample_mask_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_3.sample_mask_and_sample_coverage
dEQP-GLES31.functional.texture.multisample.samples_3.sample_mask_and_sample_coverage_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_3.sample_mask_only
dEQP-GLES31.functional.texture.multisample.samples_4.sample_mask_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_4.sample_mask_and_sample_coverage
dEQP-GLES31.functional.texture.multisample.samples_4.sample_mask_and_sample_coverage_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_4.sample_mask_only
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14974>
There's already a single command queue for the screen, meaning that
all commands are being serialized implicitly into that queue. There's
no need to have separate fences for parallel contexts when those
fences would all share the same underlying timeline.
This adds an explicit lock to expand the scope of the implicit screen
command queue ordering to include fence signals.
Each context still gets its own submit sequence, which is used for 1
purpose right now: A uniqueness check in the state manager to see
if states are coming from separate command lists, to apply promotion
and decay logic.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14959>
The no-wait condition was wrong before. If the query was used in the
current batch (query->fence_value == context->fence_value), we'd
continue on with the operation instead of returning false. Then the
buffer map would see that the bo is referenced in the current batch,
and would flush and wait, even though we were asked not to wait.
This fixes the condition by simply using the (correct) buffer map
logic.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14959>
Properly handling NaN adversely affects several hundred shaders in
shader-db (lots of Skia and a few others from various synthetic
benchmarks) and fossil-db (mostly Talos and some Doom 2016). Only apply
the NaN handling work-around when the shader demands it.
v2: Add comment explaining the 1.0*y_over_x. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 2098ae16c8 ("nir/builder: Move nir_atan and nir_atan2 from SPIR-V translator")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
frexp_sig of ±0, ±Inf, or NaN should just return the input unmodified.
frexp_exp of ±Inf or NaN is undefined, and frexp_exp of ±0 should return
the input unmodified. This seems to already work.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 23d30f4099 ("spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
NOTE: This commit needs "nir: All set-on-comparison opcodes can take all
float types" or regressions will occur in other Vulkan SPIR-V tests.
No shader-db changes on any Intel platform.
NOTE: This commit depends on "nir: All set-on-comparison opcodes can
take all float types".
v2: Fix handling 16-bit (and presumably 64-bit) values.
About 280 shaders in Talos are hurt by a few instructions, and a couple
shaders in Doom 2016 are hurt by a few instructions.
Tiger Lake
Instructions in all programs: 159893290 -> 159895026 (+0.0%)
SENDs in all programs: 6936431 -> 6936431 (+0.0%)
Loops in all programs: 38385 -> 38385 (+0.0%)
Cycles in all programs: 7019260087 -> 7019254134 (-0.0%)
Spills in all programs: 101389 -> 101389 (+0.0%)
Fills in all programs: 131532 -> 131532 (+0.0%)
Ice Lake
Instructions in all programs: 143624235 -> 143625691 (+0.0%)
SENDs in all programs: 6980289 -> 6980289 (+0.0%)
Loops in all programs: 38383 -> 38383 (+0.0%)
Cycles in all programs: 8440083238 -> 8440090702 (+0.0%)
Spills in all programs: 102246 -> 102246 (+0.0%)
Fills in all programs: 131908 -> 131908 (+0.0%)
Skylake
Instructions in all programs: 134185495 -> 134186618 (+0.0%)
SENDs in all programs: 6938790 -> 6938790 (+0.0%)
Loops in all programs: 38356 -> 38356 (+0.0%)
Cycles in all programs: 8222366923 -> 8222365826 (-0.0%)
Spills in all programs: 98821 -> 98821 (+0.0%)
Fills in all programs: 125218 -> 125218 (+0.0%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 1feeee9cf4 ("nir/spirv: Add initial support for GLSL 4.50 builtins")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
This (sort of) matches the behavior of nir_opt_algebraic. This ensures
that subnormal values are properly flushed to zero.
With the aid of "nir/search: Float sources of texture instructions are
float users" and "nir/search: Transitively apply is_only_used_as_float",
there would have been no shader-db regressions on Intel platforms.
However, those caused a significant increase in compile time. Since the
instruction regressions were so small, I just dropped those commits
rather than improve them.
All Haswell and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20125042 -> 20125094 (<.01%)
instructions in affected programs: 7184 -> 7236 (0.72%)
helped: 0
HURT: 32
HURT stats (abs) min: 1 max: 4 x̄: 1.62 x̃: 2
HURT stats (rel) min: 0.11% max: 1.49% x̄: 0.85% x̃: 0.78%
95% mean confidence interval for instructions value: 1.39 1.86
95% mean confidence interval for instructions %-change: 0.74% 0.96%
Instructions are HURT.
total cycles in shared programs: 862745586 -> 862746551 (<.01%)
cycles in affected programs: 109872 -> 110837 (0.88%)
helped: 12
HURT: 23
helped stats (abs) min: 2 max: 774 x̄: 90.83 x̃: 19
helped stats (rel) min: 0.07% max: 25.23% x̄: 3.06% x̃: 0.40%
HURT stats (abs) min: 2 max: 1106 x̄: 89.35 x̃: 12
HURT stats (rel) min: 0.08% max: 45.40% x̄: 3.01% x̃: 0.47%
95% mean confidence interval for cycles value: -60.09 115.23
95% mean confidence interval for cycles %-change: -2.21% 4.07%
Inconclusive result (value mean confidence interval includes 0).
All of the shaders hurt are in either UE4 shooter-game or shooter_demo.
Tiger Lake
Instructions in all programs: 159893213 -> 159893290 (+0.0%)
SENDs in all programs: 6936431 -> 6936431 (+0.0%)
Loops in all programs: 38385 -> 38385 (+0.0%)
Cycles in all programs: 7019259514 -> 7019260087 (+0.0%)
Spills in all programs: 101389 -> 101389 (+0.0%)
Fills in all programs: 131532 -> 131532 (+0.0%)
Ice Lake
Instructions in all programs: 143624164 -> 143624235 (+0.0%)
SENDs in all programs: 6980289 -> 6980289 (+0.0%)
Loops in all programs: 38383 -> 38383 (+0.0%)
Cycles in all programs: 8440082767 -> 8440083238 (+0.0%)
Spills in all programs: 102246 -> 102246 (+0.0%)
Fills in all programs: 131908 -> 131908 (+0.0%)
Skylake
Instructions in all programs: 134185424 -> 134185495 (+0.0%)
SENDs in all programs: 6938790 -> 6938790 (+0.0%)
Loops in all programs: 38356 -> 38356 (+0.0%)
Cycles in all programs: 8222366529 -> 8222366923 (+0.0%)
Spills in all programs: 98821 -> 98821 (+0.0%)
Fills in all programs: 125218 -> 125218 (+0.0%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: f5dd6dfe01 ("anv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controls")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
Extend 4195a9450b so that the next poor fool doesn't come along and
say, "sge does the right thing for 16-bit sources, but slt gives a NIR
validation failure. What the deuce?"
NOTE: This commit is necessary to prevent regressions in GLSLstd450Step
tests of 16-bit sources at "spriv: Produce correct result for
GLSLstd450Step with NaN".
Fixes: 4195a9450b ("nir: sge operation is defined for floating-point types")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
When slots is 64 only the first bit was being set, instead of setting
all 64 bits of the variable, so for that case the function
get_variable_io_mask() always returned 0.
This behaviour caused variables that are being used both on producer and
consumer to be considered unused and thus being removed on
nir_remove_unused_io_vars().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14955>
Expose most of the counters exposed by blob. By faking the value of
counters returned from kgsl I found the exact underlying counters and
constant coefficients being used.
Note, coefficients for counters that depend on time are NOT verified.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14323>
Although a resource may support CCS with its original format, a texture
view of that resource may have a format that doesn't support
compression. Don't create CCS surface states for such texture views.
This change affects iris' behavior when running piglit's
arb_texture_view-rendering-formats_gles3 test on SKL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14806>
Fast clear with the resource format instead. This is safe to do because
can_fast_clear_color ensures that the clear color generates the same
pixel with either the view format or the resource format.
On SKL, this prevents us from using an invalid surface state. This platform
doesn't support CCS_E with sRGB formats, but prior to this patch we allowed
fast-clearing with this combination. Piglit's fcc-write-after-clear test
can trigger this.
Fixes: 230952c210 ("iris: Don't support sRGB + Y_TILED_CCS on gen9")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14806>
the idx param for LLVMBuildInsertElement is zero-indexed based on the
value of 'vector_length' (always 4), and the vector length is (obviously)
sized to 'vector_length', so this should be the member of the vec that is being
inserted, not the invocation index
cc: mesa-stable
fixes (zink, but only on my one machine):
KHR-GL46.tessellation_shader.single.max_patch_vertices
KHR-GL46.tessellation_shader.tessellation_shader_tc_barriers.barrier_guarded_read_write_calls
dEQP-GLES31.functional.tessellation.shader_input_output.barrier
dEQP-GLES31.functional.tessellation.shader_input_output.patch_vertices_5_in_10_out
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_isolines_geometry_output_points
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_isolines_point_mode_geometry_output_triangles
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_quads_geometry_output_points
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_quads_point_mode_geometry_output_lines
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_triangles_geometry_output_points
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_triangles_point_mode_geometry_output_lines
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14949>
All of the opcodes in nir_opt_algebraic_late are the unsized (1-bit)
versions. If the lowering to int32 happens first, many of the
optimizations and lowerings won't happen.
Of particular importance is the lowering of fisfinite. If a shader
happens to contain fisfinite of an fp16 value, it will assert later
during compliation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 78b4e417d4 ("gallivm: handle fisfinite/fisnormal")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14942>
Allocating NIR registers ends up being required for drivers like r600 and
nv30, which don't do their own allocation (except in some cases on r600
where sb is used).
Rather than add a NIR register liveness impl
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14158), switch
from NIR-based liveness to just doing the same channel-based liveness
logic that the NIR registers needed at the TGSI level. The actual
liveness code here basically comes straight out of
brw_vec4_live_variables.cpp.
Since we do the liveness in TGSI now, it also means we don't need to be
careful about not reading SSA values from later TGSI instructions (which
may be useful for doing some greedy instruction selection in generating
TGSI instructions).
i915g:
total instructions in shared programs: 400719 -> 380730 (-4.99%)
instructions in affected programs: 284760 -> 264771 (-7.02%)
total tex_indirect in shared programs: 12289 -> 12290 (<.01%)
tex_indirect in affected programs: 4 -> 5 (25.00%)
total temps in shared programs: 32172 -> 22086 (-31.35%)
temps in affected programs: 30647 -> 20561 (-32.91%)
LOST: 0
GAINED: 148
r300:
total instructions in shared programs: 1472463 -> 1459286 (-0.89%)
instructions in affected programs: 507009 -> 493832 (-2.60%)
total temps in shared programs: 212143 -> 201678 (-4.93%)
temps in affected programs: 78007 -> 67542 (-13.42%)
softpipe:
total temps in shared programs: 517071 -> 294387 (-43.07%)
temps in affected programs: 509324 -> 286640 (-43.72%)
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14404>
To do register allocation well, we want to have a point before
ureg_insn_emit() to look at the liveness of the values and allocate them
to TGSI temporaries. In order to do that, we have to switch from
ureg_OPCODE() emitting TGSI tokens directly to a new ntt_OPCODE() that
stores the ureg args in a block structure.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14404>
PIPE_CAP_FORCE_PERSAMPLE_INTERP is broken for the no-attachment case, so
this is the only option
fixes (lavapipe):
KHR-GL46.sample_shading.render*
dEQP-GLES31.functional.sample_shading.min_sample_shading*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14852>
This cap bit only affects DRI_PRIME setups. Since iris now uses the
blitter to perform dGPU -> iGPU copies asynchronously, it's better to
always use at least two backbuffers so the 3D engine can start rendering
the next frame during the copy.
See commit d17e752857 where this change
was made for radeonsi.
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13877>
In a hybrid graphics setup, Mesa allocates two buffers for the window
surface. The first is what the discrete card renders to; it lives in
VRAM and is usually tiled and possibly compressed. The second is a
shadow copy that lives in system memory (readable by the integrated
card with the displays); it's usually linear and uncompressed.
Mesa's window system code schedules blits to update the shadow copy
when needed, typically at the end of a frame. These can be fairly
costly when running a full-screen application at high resolutions.
We'd like to use the blitter for these copies, as it lets us perform
the copy asynchronously, letting the 3D engine race ahead and start
rendering the next frame. If we used the 3D engine, the next frame
could not start rendering until the PRIME blit finishes, giving us
less time to draw the frame. Fortunately, Tigerlake introduced new
blitter commands which can operate at full memory bandwidth.
DRI PRIME blits happen via the Gallium blit() hook. We can detect that
case by looking for the PIPE_BIND_PRIME_BLIT_DST flag on the destination
resource. This patch detects that case and calls iris_copy_region() on
IRIS_BATCH_BLITTER to handle it. We know a priori that the blitter can
handle this operation (it's not a scaled blit, the formats match and
should not be 96bpp, there's no combined depth stencil, or other weird
edge cases). blorp_copy() will also assert that edge cases don't occur.
Together with the next patch, this improves performance on DG1 Hybrid
scenarios by about 5-6%.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13877>
Use drmSyncobjSignal to signal out_syncobjs when a GPU job submission
ends in the simulator. With this, we can enable multisync support in the
simulator and keep the multisync approach to process fence by submitting
a serialized no-op job that adds the fence to the array of out syncobjs,
i.e. syncobjs to be signaled in the kernel when a job completes (job
post deps).
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14768>
The existing logic would drop the low bit. Instead, let's drop the high
bit, do the conversion, and then add the fixed constant back in if the
value had the high bit set originally.
Fixes KHR-GL45.direct_state_access.vertex_arrays_attribute_format on
drivers that use this module to handle the format conversion.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Emma Anholt <emma@anholt.net>
Tested-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14922>
Saves instructions if the same fabs value is used multiple times.
i915g:
total instructions in shared programs: 397005 -> 396525 (-0.12%)
instructions in affected programs: 11061 -> 10581 (-4.34%)
LOST: 0
GAINED: 22
r300 (not r500):
total instructions in shared programs: 180286 -> 179767 (-0.29%)
instructions in affected programs: 27102 -> 26583 (-1.91%)
total temps in shared programs: 29692 -> 29638 (-0.18%)
temps in affected programs: 356 -> 302 (-15.17%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14938>
Given that our fcsels are on float-bools, we can emit the LRP directly and
save the backend having to emit a SLT to turn the CMP src[0] into a bool.
This required passing a codegen flags struct for nir-to-tgsi. I think
this is a good way forward for it, as the alternative I think has mostly
been adding flags to nir_shader_compiler_options (since adding
PIPE_SHADER_CAPs is an unreasonable amount of pain).
r300 shader-db:
total instructions in shared programs: 1484320 -> 1472463 (-0.80%)
instructions in affected programs: 243588 -> 231731 (-4.87%)
total temps in shared programs: 212485 -> 212143 (-0.16%)
temps in affected programs: 3845 -> 3503 (-8.89%)
Acked-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14886>
The main thing is VK 1.3 testing, but also includes test bugfixes. The
1.3 CTS required an uprev of deqp-runner to handle a new style of test
output, and that deqp-runner brings in some neat new features, too (piglit
in your deqp-runner suite, and extension list checking).
A bunch of VK tests got renamed, so I replaced panvk's custom test list
with simple include filters on the main test list.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (panvk)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14920>
Block compressed formats like ETC2 are now indicated in the plane descriptor,
rather than the pixel format descriptor. Various other minor formats were
removed in Valhall; remove them from the XML so we don't accidentally try to use
them.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14935>
Remove a few that no longer exist, and rename IDVS helper to Malloc Vertex. The
distinction between Malloc Vertex jobs and regular Indexed Vertex jobs is that
the hardware allocates varying buffers dynamically for Malloc Vertex jobs.
Regular IDVS and even legacy tiler jobs are also supported where desired.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14935>
Merged with the Buffer descriptor, hence why it shares a type nibble. However,
Bifrost uses a dedicated tiler heap descriptor, and I see no benefit to merging.
So pretending it's a dedicated descriptor on Valhall too allows us to reuse the
Bifrost code with no modifications.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14935>
The Entrypoint class already has utilities for gettingt he parameter
list as either declarations or as comma-separated argument names for a
call. Use that instead of hand-rolling it. The only modification we
need to make is to add the ability to start the list somewhere other
than at the beginning.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14919>
This fixed a problem for Zink where uniform buffer alignment varies by
GPU, e.g. 64 bytes for an RTX 2070 SUPER but 256 bytes for a GTX 1070
Ti.
Tested running Superposition on Windows 10 with Nvidia 1070 Ti with
496.13 driver. Without the fix Superposition soft locks on its splash
screen. With the fix Superposition runs through its benchmark.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14674>
The bit fields in zink_framebuffer_state cause a false positive with
MSVC's run-time checks enabled. setting state.num_attachments in
zink_get_framebuffer_imageless(). Writing some bits of num_attachments
involves reading bits from layers and samples that haven't been
initialized.
Fixed by assigning to num_attachments earlier in the function. Not
quite sure why that makes a difference but at a guess there's a
heuristic that considers assignment close to declaration as
initialization.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14566>
In the future we'll want to reuse this intrinsic to deal with ray
queries. Ray queries will use a different global pointer and
programmatically change the control/level arguments of the trace send
instruction.
v2: Comment on barrier after sync trace instruction (Caio)
Generalize lsc helper (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
We'll want different types of IDs based on topology. Let's make this
more flexible and also move the bit shifting code a layer above where
it's easier to do bitshifting operations, especially if you need to
stash things into temporary registers.
v2: Keep previous comment.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
spec requires that the number of timeline waits/signals matches the
base number of waits/signals if there are any timeline semaphores
being processed by the submit, so asserting here is in line with what
validation will yield
failure to match these will also hang every driver I've tested, so asserting
here potentially saves some people their desktop session
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14741>
these regions might not have the coords in the correct order, which will
cause them to fail intersection tests, resulting in clears that are never
applied
cc: mesa-stable
fixes:
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_all_buffer_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_color_and_depth_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_color_and_stencil_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_linear_filter_color_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_magnifying_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_minifying_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_missing_buffers_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_nearest_filter_color_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_negative_dimensions_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_negative_height_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_negative_width_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_scissor_blit
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14867>
e9c3dbd046 added PIPE_BIND_DRI_PRIME but it was only set when
importing a prime buffer.
This commit adds handling of this flag in the other codepath = the
one where the prime buffer is allocated by the render GPU.
With this change PIPE_BIND_DRI_PRIME is still only set for the
render GPU - the display GPU will never see this flag; a future
commit will rename it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14615>
To flush the blitter, we need to use MI_FLUSH_DW rather than the usual
PIPE_CONTROL we use on the 3D engine. Most of our code is set up to
suggest flushes via PIPE_CONTROL commands, however, so we hackily just
emit MI_FLUSH_DW when they ask for any kind of PIPE_CONTROL flush.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14912>
Graphics Flight Recorder is:
"The Graphics Flight Recorder (GFR) is a Vulkan layer to help
trackdown and identify the cause of GPU hangs and crashes.
It works by instrumenting command buffers with completion tags."
This is a nice little tool which could help quickly identify the call
which hanged. Or if command buffer is executed for too long.
The tiling nature of our GPU shouldn't be a big issue aside from
lower performance.
For non-segfault case, if:
- Hang happens at the same place in cmdbuf and draw/dispatch is not
finished at that point - it is likely that there is an infinite
loop in some of the shaders in this draw.
- Hang happens always in different place - likely there is nothing
wrong and command buffer just takes too long to execute and you
should try increasing hangcheck_period_ms. If it doesn't help
it is likely a synchronization issue.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13553>
If the surface already has a current backbuffer - say through a
buffer_age query - we do not want to replace it in get_buffers, because
it means the result we'd previously returned them is stale.
If we already have a backbuffer set on the surface, keep it locked in no
matter what until we hit SwapBuffers.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14873>
A buffer age of 0 means that the buffer is uninitialised or has unknown
content. We rely on the buffer age initially being 0 through zalloc when
the surface is first created; when they are first used for a swap, we
set their age to 1, and then we increment the age of every buffer in the
chain with a non-zero age when we swap.
Now that we can release buffers, both through dmabuf-feedback as well as
detecting when we're using a deeper swapchain than the compositor needs,
make sure to reset their age as they are released. Without doing this,
the age will stay as it was before it was released and be incremented,
returning the wrong age to the user the first time a previously-released
buffer slot has been reused.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5977
Fixes: 22d796feb8 ("egl/wayland: break double/tripple buffering feedback loops")
Fixes: b5848b2dac ("egl/wayland: use surface dma-buf feedback to allocate surface buffers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14873>
The spec states that descriptor set layouts can be destroyed almost
at any time:
"VkDescriptorSetLayout objects may be accessed by commands that
operate on descriptor sets allocated using that layout, and those
descriptor sets must not be updated with vkUpdateDescriptorSets
after the descriptor set layout has been destroyed. Otherwise,
descriptor set layouts can be destroyed any time they are not in
use by an API command."
Based on ANV.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5893
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14621>
If we have batch a + b, and writing to batch b, causes batch a
to flush, all the bo->index get reset, and we try to submit a -1
to the kernel.
Look the bo index up when creating relocations.
Fixes crash seen in KHR-GL46.compute_shader.pipeline-post-fs
and a trace from Wasteland 3
Fixes: f3630548f1 ("crocus: initial gallium driver for Intel gfx 4-7")
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14905>
Like explicit LODs, biases must be 16-bit, so add a lowering rule for
this. With the LOD mode selection updated for txb, we can then ingest
biases like explicit LODs and allowlist txb. Passes:
dEQP-GLES2.functional.shaders.texture_functions.fragment.texture2d_bias
dEQP-GLES2.functional.texture.mipmap.2d.bias.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14899>
In cases where the to-be-allocated node size with padding exceeds BLOCK_SIZE
but without padding doesn't, a new block is not created and no padding is done
to the previous instruction, causing a misaligned pointer to be returned.
v2: Per Ilia Mirkin's suggestion, remove the extra condition in the first
if statement, let it unconditionally pad the last instruction if needed.
The updated currentPos will then be taken into account in the
block size checking.
This fixes crash seen with lightsmark and Optuma apitraces
Fixes: 05605d7f53 (' mesa: remove display list OPCODE_NOP')
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Tested-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14871>
When new context was created, shared_mem_size was getting overwritten.
This fixes glretrace failure seen with manhattan, aztec and BASS2_intro
apitraces
Fixes: 247c61f2d0 ('svga: Add support for compute shader, shader buffers and image views')
Tested with glretrace, piglit
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
(cherry picked from commit dd6793ec9218782b1b716a87582d7219bae4e75f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14870>
The limit here is from the RENDER_SURFACE_STATE height/width/depth
fields - it's 2^30 for ISL_FORMAT_RAW buffers, and 2^27 otherwise.
anv_isl_format_for_descriptor_type() uses ISL_FORMAT_R32G32B32A32_FLOAT
for uniform buffers when compiler->indirect_ubos_use_sampler is set
(Icelake and earlier), but ISL_FORMAT_RAW when it isn't (Tigerlake+).
So we can increase the limit on Tigerlake and later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14892>
We are updating the deadcode state while walking the program backwards.
When encountering ENDLOOP, we scan the loop, mark everything in the loop
as used and than continue as usuall. We were previously trying to be
smart with the breaks. This was however not working as expected.
Instead, save the most pesimistic deadcode state from the ENDLOOP and
just restore it anytime we see a break.
This keeps the code simple and more importantly does not touch the flat
and IF(-ELSE)-ENDIF paths at all so reduces the chances of regression.
No changes with my shader-db.
Fixes piglits on RV530:
shaders/ssa/fs-if-def-else-break.shader_test
spec/glsl-1.10/execution/vs-loop-array-index-unroll.shader_test
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5832
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14661>
Compute shaders might do vec3(Xbits) loads which are translated
to LD.<next_pow2(3 * X)> by the midgard compiler. This might cause
out-of-bound accesses potentially leading to pagefaults if the
access is at the end of a BO. One solution to avoid that (maybe not
the best) is to split non-log2 loads to make sure we only read what's
requested.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14885>
Swizzling happens in 2 steps on Midgard:
1. Vector expansion/shuffling
2. Swizzling at the instruction-size granularity, but defined using
the source size. Those size are different if the source is expanded.
So, when we print 64 bit swizzles on an expanded source, we first need
to apply an offset if the high part of the 32bit vector was selected,
and then divide the result by 2 to account for vector expansion.
To sum-up, swizzling on midgard is complicated, and I'm not sure I got
it right, but it seems to print what I expect on the few compute
shaders using 64bit arithmetic I debugged.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14885>
Even though 8-bit ALUs are not supported, we can have [un]pack_32_4x8
instructions which translate to IMOVs, and those operate on 8-bit
vectors. The problem is, the swizzling granularity is 16 bit, which
means we don't support
MOV.i8 R0.xyzw, TMP0.xxxx, R1.zyxw
and the compiler doesn't even complain, it just applies 8 bit
swizzling directly, which obviously doesn't work.
This is probably not the right way to fix that, but I thought I'd
raised the issue with a hack to fix, so we can get the discussion
started.
(Found while debugging FB store lowering on Midgard).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14885>
The register allocator assumes instructions are executed sequentially
and allows one instruction to overwrite a portion of a register written
by a previous instruction if this portion is never used. But scalar and
vector ALUs might be executed in parallel if they are part of the same
bundle, and when such instructions write to the same portion of the
register file, the result is undefined.
Let's add intra-bundle interferences to avoid this situation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14885>
We didn't remove desc set from the pool's list if pool was
host_memory_base. On the other hand in there is no point in removing
desc set from the list in DestroyDescriptorPool/ResetDescriptorPool.
Fixes: da7a4751
("turnip: Drop references to layout of all sets on pool reset/destruction")
Fixes cts tests:
dEQP-VK.api.buffer_marker.graphics.default_mem.bottom_of_pipe.memory_dep.draw
dEQP-VK.api.buffer_marker.graphics.default_mem.bottom_of_pipe.memory_dep.dispatch
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14855>
Turn the context state and shader key into a bitmask. When lowering
the depth invert into the shader, scan for writes to viewport index.
If found, move position to the end of the function (or current vertex)
and check if the current viewport needs the depth invert before applying.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14881>
This pass (lowering discard_if to control flow and unconditional
discard) was originally written for Zink, but is useful for hardware
that lacks conditional discard instructions like AGX. In theory AGX
could implement a conditional discard with CSEL, but the vendor
compiler uses a lowering like this one. Since I like not writing code,
I'd like to use the pass that's already in tree.
v2: Don't preserve dominance (Jason). Assert we don't see demotes or
terminates (Jason). Add Mike's ack.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14217>
The bug here is that the DRI context "flags" are intended to alias the
GLX context flag values, and they don't, DRI's no-error flag is GLX's
reset-isolation flag. GLX (and EGL!) treat no-error as a context
attribute, and reset isolation predates Mesa's no-error implementation
by several years. The GL_KHR_no_error spec does describe it as a
"context flag", though, so maybe that's why we do it as a (DRI) context
flag.
In order to unalias these we need a new contract with the loader. We
remove the old __DRI_NO_ERROR extension, and add a new
__DRI_RENDERER_HAS_CONTEXT_NO_ERROR value to query. Loaders can key on
that to know to pass no-error-ness through as a context attribute,
matching the GLX/EGL calling convention. We go ahead and define
__DRI_CTX_FLAG_RESET_ISOLATION as well, and update the drivers to refuse
it since we don't support it yet.
This means mismatched drivers/loaders will not be able to create
no-error contexts. Too bad. If you want performance that badly you can
build both things at once.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12474>
Meson devenv is a feature added in meson 0.58 (thus the features is
version guarded) that allows creating a shell environment with
environment variables automatically setup for running the project inside
the build dir. Some variables (such as LD_LIBRARY_PATH and PATH) are set
automatically, others must be added by the project.
For vulkan is is relativley simple, we create a new, uninstalled, icd
file for each driver and set the VK_ICD_FILENAMES variable
appropriately. This can be used with:
```sh
meson devenv -C $builddir
```
then, vulkan applications will automatically use the uninstall vulkan
driver, no need to install.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14826>
The syntax we're using doesn't work when included into C++ sources. So
let's make it C++ compabible.
It turns out, nobody needs this extra definition which is what's causing
issues. Let's just make the initializer trivial without casting the
struct.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14850>
We're about to need including this header from a C++ source, so let's
add some explicit casts for C++ compatibility.
In one case we can make things a bit cleaner by moving the
char-pointer-ism to the place that needs it, so let's clean that up
while we're at it.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14850>
Moved nir_lower_compute_system_values to lower
load_local_invocation_index which could be emitted by
nir_zero_initialize_shared_memory.
Relevant CTS tests:
dEQP-VK.compute.zero_initialize_workgroup_memory.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14829>
IRIS_BATCH_BLITTER isn't supported prior to Tigerlake; in general,
batches may not be supported on all hardware. In most cases, querying
them is harmless (if useless): they reference nothing, have no commands
to flush, and so on. However, the fence code does need to know that
certain batches don't exist, so it can avoid adding inter-batch fences
involving them.
This patch introduces a new iris_foreach_batch() iterator macro that
walks over all batches that are actually supported on the platform,
while skipping the others. It provides a central place to update should
we add or reorder more batches in the future.
Fixes various tests in the piglit.spec.ext_external_objects.* category.
Thanks to Tapani Pälli for catching this.
Fixes: a90a1f15 ("iris: Create an IRIS_BATCH_BLITTER for using the BLT command streamer")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14834>
viewport_index won't be >= 16
layer should be < 2048
view_index should be < 2048
this leaves the last 64-bits as padding, which something
expects, but not having to write to it means we have to write
less memory every triangle.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14846>
the bboxpos = bbox copy was visible overhead on perf traces,
but we don't need to really do it.
Extract all the things from bbox needed early, then don't copy
it just overwrite it.
use boolean to fix power build.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14846>
The simple-varyings piglit test attempts to use GL_MAX_VARYING_FLOATS
varyings, PLUS one additional vector for position (which is not used
as input to the PS). "Reserve" that additional position vector by
removing it from the max varyings and max PS inputs.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Reviewed-By: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14837>
When dealing with a vertex input that takes multiple rows, the value of
nir_intrinsic_base points to a driver-location-based index, but we need
to emit a location-based index (or more specifically, an index that
increments once per input, not once per register). Add a mapping to
the module of base -> ID.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Reviewed-By: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14837>
This brings in some interesting new vulkan tests and fixes for the
spurious KHR-GL TF failures. Also, reduces the runtime of
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36 so that it
should stop timing out.
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
It triggered on uprevving VK-GL-CTS, and @airlied says it's tripped
apparently spuriously before. There seems to be some interesting logic
behind it, so leave the big comment for whoever can revisit the issue some
day.
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
This ran into OOM-kills with the CTS uprev. Looking at caselists at the
time of fail, some had 500MB of system memory used by the CTS (mostly
spirv string codegen), plus whatever BOs were allocated, and the lazors
are only 4GB it looks like.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
Between adding features and increased test coverage, we're hitting the
1-hour job limit. !13441 tried to increase the full run timeout for LAVA,
but by having not bumped the gitlab-ci timeout value it ended up just
letting the job keep running in LAVA after gitlab had given up on it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
In the autotune merge, we added another 1/15th run of a configuration
knob, thinking that was small enough to be in the noise. But actually the
main run is only 1/9th, so another 1/15th took us from nearly hitting the
job runtime target, to totally missing it. Crank things back down to keep
MRs flowing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
Interleave mode specified per-plane on Valhall. The texture descriptor proper
merely has a flag specifying whether planes are somehow interleaved
(u-interleaved, AFBC, or block compressed formats) or whether they are all
linear (and uncompressed).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14851>
Flesh out the ZS/CRC XML, adding fields required for AFBC. Valhall allows AFBC
compressing stencil buffers independent of depth buffers, which is a new feature
since Bifrost. That results in a shuffling of the descriptor.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14851>
This looks superficially like the Bifrost "Surface" descriptor, but it
additionally specifies the in-memory representation of blocks (clumps). If I
understand correctly, decompression is controlled by the plane descriptor,
rather than the texture descriptor level. This is a bit more flexible than
Bifrost.
Once the new fields here are wired up to Mesa, my
dEQP-GLES2.functional.texture.* failures should go away... I hope!
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14851>
After starting to use a new version of the simulator, it got
outdated.
We made some initial effort to update it, but it was not
working. Taking into account that no one is using it, it is better to
just remove it.
We keep the noop drm drivers, as they could have some value for
developers that doesn't have access to the v3dv3 simulator.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14682>
Without this change LLVM 12 hits this error:
"""
LLVM ERROR: Error while trying to spill SGPR0_SGPR1 from class SReg_64:
Cannot scavenge register without an emergency spill slot!
"""
when running glcts KHR-GL46.arrays_of_arrays_gl.AtomicUsage test.
Fixes: 9ff086052a ("radeonsi: unroll loops of up to 128 iterations")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14848>
B2C_JOB_SUCCESS_REGEX: '\[.*\]: Execution is over, pipeline status:0\r$'
B2C_JOB_WARN_REGEX:'\*ERROR\* ring .* timeout, but soft recovered'
B2C_LOG_LEVEL:6
B2C_POWEROFF_DELAY:15
B2C_SESSION_END_REGEX:'^.*It''s now safe to turn off your computer\r$'
B2C_SESSION_REBOOT_REGEX:'(GPU hang detected!|\*ERROR\* ring [^\s]+ timeout(?!, but soft recovered)|The CS has been cancelled because the context is lost)'
B2C_TIMEOUT_BOOT_MINUTES:45
B2C_TIMEOUT_BOOT_RETRIES:1
B2C_TIMEOUT_FIRST_MINUTES:5
B2C_TIMEOUT_FIRST_RETRIES:3
B2C_TIMEOUT_MINUTES:2
B2C_TIMEOUT_OVERALL_MINUTES:90
B2C_TIMEOUT_RETRIES:0
# As noted in the top description, we make a distinction between the
# container used by gitlab-runner to queue the work, and the container
# used by the DUTs/test machines. To make this distinction quite clear,
# we rename the MESA_IMAGE variable into IMAGE_UNDER_TEST.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.