If p_atomic_cmpxchg doesn't set the ray_query_shadow_bos[bucket] to new_bo
allocated by this thread, it returns the bucket BO allocated by the other
thread and we use it. But due to a mistake, we also release that BO, not
the candidate just allocated by this thread and never used again.
Fixes: 5d3e4193 ("anv: enable ray queries")
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30581>
(cherry picked from commit 1904fe1186)
We can't exceed c_int::MAX, because the CTS casts to ints in a few places.
We also need to take into account max pixel size when restricting to
max_mem_alloc as this cap is pixel based, not byte based.
Cc: mesa-stable
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30739>
(cherry picked from commit eef1af8128)
Make sure to preserve the depth or stencil components of D24S8 using the
fixed codepath just added. While we're here, fix the detection of
whether an attachment is bound.
Fixes: cb0f414b ("tu: Add support for suspending and resuming renderpasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26154>
(cherry picked from commit 812c8f6abe)
We need to make sure that we don't trash a passthrough depth/stencil
aspect if we need to store the whole attachment by loading it
beforehand.
Fixes: cb0f414b ("tu: Add support for suspending and resuming renderpasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26154>
(cherry picked from commit 5377219ca0)
This should be done after reads are checked and
sgpr_read_by_valu_as_lanemask_then_wr_by_salu is reset. The old version
also skipped checking the reads if the write check passed.
fossil-db (navi31):
Totals from 193 (0.24% of 79395) affected shaders:
Instrs: 3212435 -> 3212735 (+0.01%)
CodeSize: 16462868 -> 16463848 (+0.01%); split: -0.00%, +0.01%
Latency: 19492377 -> 19492462 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 4419705 -> 4419718 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Backport-to: 24.1
Backport-to: 24.2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30818>
(cherry picked from commit 61e73c2323)
At least one ray tracing shader in cp2077 is over 4MB on Xe2. There
isn't a memory pool large enough for the allocation, so the driver
crashes instead. This commit adds 8MB and 16MB pools.
I intend this as a stop gap fix. I would prefer to figure out why this
shader is so much larger than on previous platforms. The shader in
question has 3824 spills and 8625 fills. That is not good. I suspect
dealing with that will also solve the problem, but that will require a
bit more time.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11739
Suggested-by: Lionel Landwerlin
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30751>
(cherry picked from commit 09cf9fe8ab)
this pbo shader works by iterating over the framebuffer size
and storing a value to an offset for each source pixel. if the
number of pixels being written out does not correspond to fragcoord
to the extent that certain source pixels are not written at all, however,
then this method should not be used in order to avoid giving broken results
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30689>
(cherry picked from commit c2dcecffc5)
This commit solves the shortage-problem at the blit-functions by
checking the number of fence-registers after updating the batch.
If too many registers are used,
the batch-entries and relocs for the current blit function are
removed by setting batch->ptr and reloc_count to value before
the blit call and calling drm_intel_gem_bo_clear_relocs.
This truncated batch is flushed,
and the batch is updated again for the current blit function.
Cc: mesa-stable
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26769>
(cherry picked from commit ed2123158d)
out_args->scratch_offset and in_wg_id_x will alias on <gfx9.
To avoid the conversion code reading a garbage WG ID, move the
scratch/ring offset writing to the very end.
Fixes: 1e354172 ("radv,aco: Convert 1D ray launches to 2D")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30707>
(cherry picked from commit bd525f4282)
Similar to commit 6cef804067 ("nir/opt_if: fix opt_if_merge when
destination branch has a jump"), we shouldn't combine if statements when
the second if-then-else has a block that ends in a jump.
This fixes a case where opt_if_merge combines
if (cond) {
[then-block-1]
} else {
[else-block-1]
}
if (cond) {
[then-block-2]
} else {
[else-block-2]
}
where `then-block-2` or `else-block-2` ends in a jump. The phi nodes
following the control flow will be incorrectly updated to have an input
from a block that is not a predecessor.
Fixes: 4d3f6cb973 ("nir: merge some basic consecutive ifs")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30629>
(cherry picked from commit d2e6be94ae)
Before we were using direct CP_LOAD_STATE, which is broken with multiple
back-to-back draws. This caused regressions in some DX11 traces when
enabling early preamble. We still need to use indirect CP_LOAD_STATE for
VS params, which are sometimes written by the CP, however for everything
else we should use the new UBO path instead.
Fixes: 76e417ca59 ("turnip,ir3/a750: Implement consts loading via preamble")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30675>
(cherry picked from commit 850f2aab03)
It's one header dword and 5 payload dwords. This was papered over by us
not actually using the UBO path for one of the loads, but that's changed
in the next commit.
Fixes: 76e417ca59 ("turnip,ir3/a750: Implement consts loading via preamble")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30675>
(cherry picked from commit 4f2b5442a6)
After spilling during regular RA, merge sets need to be fixed up. To
find all merge sets, fixup_merge_sets used ra_foreach_dst. However,
after shared RA has run, shared dsts wouldn't have the IR3_REG_SSA flag
set anymore leaving their merge sets lingering. This patch fixes this by
using foreach_dst instead.
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
(cherry picked from commit 28d2a27030)
The preferred register for merge sets was not updated after allocating
one. This caused a new merge set to be allocated for every register it
contains. This patch fixes this by reusing the update function from the
standard RA.
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
(cherry picked from commit 9013e11d8c)
Previously only `-mtls-dialect=gnu2` was probed, which was appropriate
for arm, x86 and x86_64, but not for newer architectures such as
aarch64, loongarch64 and riscv64 which all use `-mtls-dialect=desc`
instead. Because the driver option is not consistent across
architectures (and probably will not), try both variants and choose the
first one working.
While at it, rename "gnu2_*" variables to "tlsdesc_*" respectively, for
clarity.
Cc: mesa-stable
Reviewed-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Yukari Chiba <i@0x7f.cc>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30599>
(cherry picked from commit cc2dbb8ea5)
The `capture_not_overwritten` unit test captures and compares two
backtraces -- one from inside a call to `func_c` and one outside -- and
confirms that they are not identical. That is, that `func_c` is in the
backtrace.
On 32-bit x86, without `-fno-omit-frame-pointer`, the function will not
emit a stack frame. As a result, the unit test fails.
The fix is to compile `func_c` with the flag `-fno-omit-frame-pointer`
to prevent the compiler from optimizing out the stack frame which is
otherwise unneeded.
Bug: https://bugs.gentoo.org/823774
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4091
Fixes: d0d14f3f64 ("util: Add unit test for stack backtrace caputure")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30622>
(cherry picked from commit 05dc4eb536)
the CL CTS added a new test being printf("\n", "foo"), but we ended up
printing the new line twice. If we can't find a specifier anymore, ignore
the argument as after the loop processing all arguments we'll print the
remaining format string anyway.
Cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30574>
(cherry picked from commit 4080269845)
Instead of aligning offsets, we just align the size every time we query
it. This simplifies our offset and size calculations later since we can
always just add up descriptor buffer sizes and know that we'll be okay.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: 7ab5c5d36d ("zink: use EXT_descriptor_buffer with ZINK_DESCRIPTORS=db")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30580>
(cherry picked from commit 0f8f407e57)
Using ior here is equivalent to using uadd_sat, but works for every driver
and shouldn't hurt anywhere.
I forgot to fix this up when fixing up some vvl errors with zink.
Fixes crashes with the integer_ctz CL CTS tests in zink.
Fixes: 39ec184db6 ("zink: lower 64 bit find_lsb, ufind_msb and bit_count")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30535>
(cherry picked from commit 48acf9d358)
* close(fd) requires also resetting the fd=-1 or else boom
* checking just driver_name is broken because loader_get_driver_for_fd()
uses MESA_LOADER_DRIVER_OVERRIDE, so there's no way to differentiate
an inferred load
Fixes: b907eb4750 ("egl: don't bind zink under dri2/3")
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30556>
(cherry picked from commit 1a579552af)
The CL CTS started to call this API, luckily we don't have to actually
implement it, because we don't intent to support CL 1.0 only devices in
the first place (probably).
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30575>
(cherry picked from commit cd2dc4f70c)
For some purposes (e.g. advanced blending) we need a non-zero alpha
value returned from reads. This is only guaranteed on Bifrost if
we explicitly request RGB1 component ordering. The default is to use
RGBA component ordering, which for R5G6B5 causes 0 to be read for
alpha.
A complication is that the Mali fixed function hardware requires
four components (which implies RGBA rather than RGB1). If fixed
function blending is in use, we modify the pixel format back to
RGBA when building the blend descriptor.
Cc: mesa-stable
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29606>
(cherry picked from commit 004e0eb3ab)
This reverts commit d6bb4ddc63.
Fixes: d6bb4ddc63 ("d3d12: Video Encode - Remove PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE as not supported")
PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE is necessary for some scenarios like the example below
described in https://github.com/microsoft/WSL/issues/11838
gst-launch-1.0 -v videotestsrc num-buffers=250 !
video/x-raw,width=1920,height=1200 !
vaapipostproc !
vaapih264enc !
filesink location=~/wsl_test.h264
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30548>
(cherry picked from commit a0f1a708c4)
This is more correct than the previous code and the CL CTS relies on edge
case behavior here, e.g. for 1Dbuffer images.
I think part of that is not actually required by the spec, but whatever.
Fixes: 7b22bc617b ("rusticl/memory: complete rework on how mapping is implemented")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30528>
(cherry picked from commit 2484331e82)
An application could map and unmap a host ptr allocation multiple times,
but because how the refcounting works, we might never ended up syncing the
written data to the mapped region.
This moves the refcounting out of the event processing.
Fixes: 7b22bc617b ("rusticl/memory: complete rework on how mapping is implemented")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30528>
(cherry picked from commit 1fa288b224)
If TraceRay() is called with the TerminateOnFirstHit flag, we need to
terminate the ray on the first confirmed intersection. This is handled
by the lowering of accept_ray_intersection and it's working fine for the
case of multiple instances of the intersection shader being called.
But if the shader calls reportIntersection() more than once, we were
handling them all and accepting the closest one regardless of the flag.
Check for the flag on every confirmed intersection and, if set, accept
it right there. The subsequent lowering will take care of terminating
handling the ray termination if necessary.
Fixes new test dEQP-VK.ray_tracing_pipeline.amber.flags-accept-first
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30418>
(cherry picked from commit f8553f56ac)
this had a number of issues:
* pipe_loader_get_driinfo_xml() frees driver_driconf immediately,
except the driOptionCache struct string pointers are all just copied
in merge_driconf instead of having the strings copied, which means any
subsequent access of driver_driconf strings is invalid access
* pipe_loader_drm_get_driconf_by_name() is a disaster that only happened
to work because the dlopen here is the same lib that gets opened elsewhere
by mesa and not closed. if the lib here is actually closed, then all
the statically allocated strings become invalid, which means they need to
be manually copied
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30494>
(cherry picked from commit 0c220741e6)
previously the size was checked at the top of the function, but this
ignored cases where the loader might end up resizing the drawable,
resulting in an attempted 0x0 swapchain creation based on stale
geometry
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30487>
(cherry picked from commit a6d97b0afe)
The iova allocations need to be CPU page aligned. (The GPU itself
always supports 4k mappings regardless of the smallest CPU page size,
but GEM buffer allocations must be an integer number of CPU pages.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Fixes: 63904240f2 ("tu: Re-enable bufferDeviceAddressCaptureReplay")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30431>
(cherry picked from commit 7fe3529715)
It doesn't do anything substantial yet, but it ignores enough so internal
shaders won't generate warnings.
I've also added ByVal parsing, because I need this one to actually fix a
correctness issue in a later patch.
Cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29896>
(cherry picked from commit 9b55dcca54)
Contrary to what the original commit said, this is actually still used
(see .gitlab-ci/bare-metal/poe-powered.sh:205), and the boot retry logic
has been broken ever since, exacerbating the rpi farm boot problems.
Fixes: 97b2afa16a ("ci/bare-metal: Drop the 2 vs 1 exit code from poe_run.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30340>
(cherry picked from commit 2bc82b7147)
At update_map at i915_state_sampler.c max_lod is no longer set to 1
for npots. This almost totally disabled mipmapping.
Max_lod should still be set to 1, but only if it is still 0,
because no mipmap-levels are present.
According to existing comment at update_map this is needed, to
avoid problems at sampling,
if MIN_FILTER and MAX_FILTER differ.
Cc: mesa-stable
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28638>
(cherry picked from commit ad02bfe41d)
Remove at i945_texture_layout_2d() call of util_next_power_of_two(),
which oversized the npot-blocks for every level to get power of 2
for width and height. Hardware doesnot expect these oversized
npot-blocks, causing mangled mipmapping.
This also is done at i915_texture_layout_2d(), which is
used by older gen3-gpus.
Cc: mesa-stable
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28638>
(cherry picked from commit bb95d744ca)
After using sparse array to manager virtgpu bo, we set gem_handle to 0
to indicate that the bo is invalid. However, the gem handle gets closed
before that and can be reused by another newly created bo, leading to
the tracked gem handle being unexpectedly zero'ed out.
Fixes: 88f481dd74 ("venus: make sure gem_handle and vn_renderer_bo are 1:1")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30362>
(cherry picked from commit f788c87d02)
`st_context_invalidate_state` call is required when changing buffer attachments.
Including header with BBitmap class definition is required to properly
call C++ destructor.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30372>
(cherry picked from commit 828c3cf002)
We cannot always use /dev/dri/card0.
As a matter of fact, on systems with SimpleDRM enabled /dev/dri/card0
will be created by it and removed once a GPU driver has loaded.
In any case we shouldn't hard-code the device number and instead walk
the device list to find the first suitable device.
This issue is trivially reproducible with `eglinfo -B -p gbm` on
Ubuntu 24.04 or Fedora 40
Fixes: 32f4cf3808 ("egl/gbm: Fix EGL_DEFAULT_DISPLAY")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30325>
(cherry picked from commit 7949471716)
Thanks to LCSSA, we can absolutely have phis with only one source and we
need to handle those in spilling. Fortunately, there's nothing really
special about that case. I was just prematurely optimizing.
Fixes: bcad2add47 ("nak: Add a spilling pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28084>
(cherry picked from commit 8bf3213a54)
We want to use actual sparse-capable queues as the default
trtt->queue, not copy queues that may have a companion_rcs_batch.
Before this patch, if we expose more than one queue *and* the
application creates a copy queue first, we'll end up setting
trtt->queue as the copy queue, which will GPU hang when we submit the
TR-TT batches as they don't support the pipe_control commands we
issue.
The trtt->queue queue is used for binding/unbinding buffers in code
paths where there's no specific queue coming from user space, such as
when we're creating or destroying a sparse resource.
This is not a problem yet on i915.ko since we are exposing
only a single queue, and it is not a problem for xe.ko since TR-TT is
not the default there. This is also not a problem in applications
that create the render or compute queue first. We plan to expose more
queues when using TR-TT, so this would become a problem without this
patch.
None of VK-GL-CTS seems to exercise that, and none of the Steam games
I tested exercise that as well. I was able to reproduce this issue
using our internal tracing tool.
v2: New implementation that doesn't break when we only have a compute
queue (Lionel).
Fixes: 04bfe828db ("anv/sparse: allow sparse resouces to use TR-TT as its backend")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30252>
(cherry picked from commit 3ab8ff99fa)
Even though we don't advertise the sparseResidency feature, a bunch of
CTS tests just call GetPhysicalDeviceImageFormatProperties2() with
SPARSE_RESIDENCY_BIT and see if that fails.
Fixes: d2177f4764 ("nvk: Don't advertise sparse residency on Maxwell A")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30303>
(cherry picked from commit 68d6cdfbc5)
This fixes sporadic rendering corruption reported on MTL with ChromeOS
in cases where multiple processes including Chrome were utilizing the
GPU concurrently, and one of the processes happened to submit a
BLORP-only batch buffer right after a switch from a different context.
In such a scenario we would fail to add the BO that holds the pixel
hashing tables to the execbuf IOCTL for the BLORP batch, because it
was being pinned from iris_restore_render_saved_bos() which isn't
called for BLORP operations, potentially causing it to use garbage as
pixel pipe hashing tables, which led to corruption of the BLORP
rendering.
Technically this could have affected DG2 as well, but it has only been
reported on MTL so far.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30274>
(cherry picked from commit 49b433d5e7)
On MTL ChromeOS boards, during AI based video conference, we were
observing a lot of overhead from invalidations. Upon debug, it was found
that we were using clflush in this function and that isn't efficient.
With this change, while executing compute workloads like zoo models, we
are getting ~25% performance improvements in a best case scenario.
Rework:
* Jordan: Call intel_clflushopt_range() rather than
__builtin_ia32_clflushopt() because intel_mem.c is not compiled
with -mclflushopt.
Backport-to: 24.1 24.2
Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30238>
(cherry picked from commit 2f6919e6c2)
Despite its name, egl_dri2 works under plain DRI without DRI2, and the
old autotools build system built it when $enable_dri = yes, with no
check for DRI2. This fixes the build for GNU/Hurd, which supports DRI,
but doesn't have DRM and thus no DRI2 support.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/587>
(cherry picked from commit 149e8bff52)
This reverts commit ad862c36e5.
This change does not work, because libdrm is required if with_dri2 is
true. Moreover, we don't want all of DRI2 on Hurd, we just want the
egl_dri2 driver, as done by autotools. So first revert this to stop
trying to build all of DRI2.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/587>
(cherry picked from commit ec55a6c329)
This reverts commit 2fd85105c6.
Despite its name, egl_dri2 works under plain DRI without DRI2, and the
old autotools build system built it when $enable_dri = yes, with no
check for DRI2. A future commit will adapt meson.build to follow that
approach rather than this hackier one.
Note that the case removed in the second hunk is already dead code,
since system_has_kms_drm is false on GNU/Hurd, and could have been
dropped as part of 66d2ae0386 ("meson: forcefully disable libdrm when
host doesn't have it").
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/587>
(cherry picked from commit 8461776a09)
This implements an undocumented workaround for a hardware bug that
affects draw calls with a pixel shader that has 0 push constant cycles
when TBIMR is enabled, which has been seen to lead to a hang with
Fallout 3 and Metal Gear Rising Revengeance. This hardware bug has
been reported as HSDES#22020184996 which is still pending a resolution
by the hardware team. However since this workaround found empirically
has been confirmed to fix the issue reliably and it's relatively
harmless it seems worth checking in already even though no final W/A
number is available nor has the W/A json file been updated.
To avoid the issue we simply pad the push constant payload to be at
least 1 register. This is enabled via a brw_wm_prog_key since the
driver needs to be in agreement with the compiler on whether the dummy
push constant cycle is present, and it can be avoided in cases where
the driver knows that TBIMR will be disabled (e.g. for BLORP).
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10728
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11399
Fixes: 57decad976 ("intel/xehp: Enable TBIMR by default.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30031>
(cherry picked from commit b98eebbcb2)
Sporadically a6xx gpu will fail to recover causing the lava job
a660_vk_full to loop on error messages for three hours before timing
out.
A few sporadic error messages may still be recoverable, but when multiple
errors occur over a short period, successful recovery is unlikely. Parse
the logs to look for repeated error messages within a short time period.
If found, cancel the lava job and rerun it.
Also add unit tests for this behaviour.
cc: mesa-stable
Reported-by: Valentine Burley <valentine.burley@gmail.com>
Acked-by: Daniel Stone <daniel.stone@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30032>
(cherry picked from commit 72c182f873)
If you have something like this :
con 32 %66 = @load_reg (%62) (base=0, legacy_fabs=0, legacy_fneg=0)
con 32 %27 = @resource_intel (%22 (0xdeaddead), %66, %67, %17 (0x0)) (desc_set=2, binding=96, resource_intel=0, resource_block_intel=-1)
Just copying the brw_reg in ssa_values[] is not enough for the
load_reg intrinsic. We need to call get_nir_src() to force some logic
to create the register correct.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b8209d69ff ("intel/fs: Add support for new-style registers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30050>
(cherry picked from commit 67b778445a)
This commit is fixing an issue where all AV1 slice data offsets always point
into the first slice data buffer, even when multiple slice data buffers
are submitted. However, after b0d6e58d88 ("Reapply "radeonsi/vcn: AV1 skip the redundant bs resize"")
only the first slice data buffer will be sent to decoder and the incorrect
behavior is required, so this commit also needs to be reverted.
This reverts commit 6746d4df6e.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30255>
Event::set_status only called the cbs for the passed in status, but we
need it to call all cbs up to the passed in status. This simplifies error
handling.
Cc: mesa-stable
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30215>
(cherry picked from commit d2d3f8e446)
nouveau uses the OS page size which is almost always 4096. The next
patch will make this properly queried but this version is back-portable.
Fixes: 58181b7bbc ("nvk: Bump the sparse alignment requirement on buffers to 64K")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30138>
(cherry picked from commit 68c06558be)
We need to call iris_wait_syncobj() on both release and debug builds,
so take it out of the assert() call. Only assert the result.
With this patch, gnome-session finally works for me. Also steam.
Fixes: 665d30b544 ("iris: Wait for drm_xe_exec_queue to be idle before destroying it")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30195>
(cherry picked from commit 22fe73a86a)
* swrast allocates images aligned to 64x64 tiles, which results in images
that are larger than the window. PutImage requests must be clamped on
the y-axis to avoid uploading/damaging out-of-bounds regions
* winsys coords are y-inverted
cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29910>
(cherry picked from commit 6088a0bf51)
Commit 94989b45a5 ("anv,driconf: Add fake non device local memory WA
for Total War: Warhammer 3") implemented a workaround to make
Warhammer 3 work on ADL, but the game still doesn't work on LNL, which
uses xe.ko, and MTL, which uses i915.ko: it still fails at launch
claiming it couldn't allocate memory.
So in this implementation, instead of clearing DEVICE_LOCAL_BIT we
just duplicate our memory types, one having the bit and one not
having.
v2:
- Check for VK_MAX_MEMORY_TYPES (José)
- Invert the order of the memory types (José)
- Fix white space issue (José)
v3:
- Comment our non-spec-compliance (José)
- Remove useless lines (José)
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8721
Fixes: 94989b45a5 ("anv,driconf: Add fake non device local memory WA for Total War: Warhammer 3")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30162>
(cherry picked from commit 241585667f)
if the src for a replace_buffer call is mapped after replacement:
* avoid clearing access flags
* update valid range
the pointer access here is always safe because the only case in which
this scenario can occur is if tc is forced to sync immediately after
creating a replaceent buffer, and the replacement buffer's lifetime
will always be exceeded by the lifetime of the real buffer
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30107>
(cherry picked from commit 70b40fd2a0)
when tc replaces a buffer in subdata, it may subsequently perform subdata calls
on the replacement if it is forced to sync during map, e.g.,
* bind_vbo(dst)
* draw
* subdata(src)
* buffer replacement
* map
* tc sync
* replace_buffer(dst, src)
* memcpy <- broken
* draw
in this scenario, src may not have data at the time of replacement,
but it will get data soon after, and this buffer range is the real one
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30107>
(cherry picked from commit 76da22bfc2)
Commit f695a9fed2 moved the 64-bit float <-> 16-bit float conversion
splitting into a core NIR pass, so the code remaining here is only
needed for 64-bit integer types.
Presumably in an attempt to remove the float handling, it replaced
simple bit_size == 64 checks with this expression:
(full_type & (nir_type_int64 | nir_type_uint64))
I believe that the intended expression was:
(full_type == nir_type_int64 || full_type == nir_type_uint64)
Unfortunately, the former is incorrect. Any integer or unsigned
NIR type would trigger the former expression. For example:
nir_type_uint32 & (nir_type_int64 | nir_type_uint64) => nir_type_uint
This meant that we were splitting e.g. u2f16 on 32-bit unsigned types
into u2f32 and f2f16, when we can easily natively handle that case.
To fix this, we go back to simple bit_size == 64 checks. This pass is
already run after nir_lower_fp16_casts which will split the float case,
so we will never see it here.
fossil-db on Alchemist shows a -1.14% reduction in affected shaders for
google-meet-clvk shaders. In another ChromeOS workload, it improves
performance by around 8% on Meteorlake.
Thanks to Sushma Venkatesh Reddy for finding this performance issue!
Fixes: f695a9fed2 ("intel/compiler: use nir_lower_fp16_casts")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30091>
(cherry picked from commit 837c441acb)
`with_platform_windows` depends on the `platforms` option,
and is an inaccurate proxy check for `host_machine.system() == 'windows'`.
The ICD path and shared library name are dependent on the host system,
not whether or not Lavapipe is built with `WIN32` platform support.
In fact Lavapipe can be built with no platforms (`-Dplatforms=`)
if you only need headless (then this check would be incorrect)
fixes: e030ab5163
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29900>
(cherry picked from commit 94379377c4)
Previously we tried to map GPU resources directly wherever we could,
however this was always causing random issues and was overall not very
robust.
Now we just allocate a staging buffer on the host to copy into, with some
short-cut for host_ptr allocations.
Fixes the following tests across various drivers:
1Dbuffer tests (radeonsi, zink)
buffers map_read_* (zink)
multiple_device_context context_* (zink)
thread_dimensions quick_* (zink)
math_brute_force (zink)
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30082>
(cherry picked from commit 7b22bc617b)
[Eric: also drop `can_map_directly` and `has_user_shadow_buffer` as
they're now dead code, which was breaking the build]
Don't multiply by 4 twice. While we're here, fix the in-bounds check on
a6xx, since we need to account for the multiplying by 4. This was done
correctly on a7xx but the commit below didn't correctly port it to a6xx
when adding the multiply on a6xx.
Fixes: 01bac643f6 ("freedreno/ir3: Fix ldg/stg offset")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30046>
(cherry picked from commit 2c462fe9cc)
si_set_patch_vertices was only called if tcs.current was non-NULL but
this condition is not enough for GFX9+ since vs is used as ls.
Add a check in si_update_tess_io_layout_state instead, and set
sctx->do_update_shaders for case where the ls_current is not yet
available.
This fix crashes on GFX6.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29876>
(cherry picked from commit a7a1e3d329)
buffer_size is not the full buffer size, but the part of the
buffer that is accessed by the compute shader.
This fixes the assert hit in si_set_shader_buffer.
Fixes: 1a99f50c7f ("radeonsi: use a compute shader to convert unsupported indices format")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29876>
(cherry picked from commit 84a563cf6f)
We select the feedback tranche according to the image format but not the
actual format that we will use for presentation. This breaks direct
scanout in cases where the application selected a visual with an alpha
channel but using EGL_EXT_present_opaque which previously worked as
expected.
Fix this by selecting the feedback tranche according to the actual
presentation format.
Fixes: 9ea9a963aa ("egl/wayland: Fix EGL_EXT_present_opaque")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28153>
(cherry picked from commit e5e108706c)
Instead of trusting in nil::Image::align_B, force it to the sparse
binding size because we know we're going to try and sparse bind it.
Otherwise, small sparse images could fail to bind at the bind step.
Fixes: 7321d151a9 ("nvk: Add support for sparse images")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30033>
(cherry picked from commit 04bdbb71de)
This is an issue happening on radeonsi after the commit
6f8e6fb99c "mesa/st: use compute pbo download for readpixels".
This commit enables a new code path which has this memory leak.
For instance, this issue is triggered on radeonsi with
"piglit/bin/glsl-fs-raytrace-bug27060 -auto -fbo":
Too many leaks! Only the first 5000 leaks encountered will be reported.
Indirect leak of 327424 byte(s) in 5582 object(s) allocated from:
#0 0x7fe27fa4d7ef in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb17ef)
#1 0x7fe2717fd4df in ralloc_size ../src/util/ralloc.c:118
#2 0x7fe272dbbae4 in calc_dom_children ../src/compiler/nir/nir_dominance.c:138
#3 0x7fe272dbbae4 in nir_calc_dominance_impl ../src/compiler/nir/nir_dominance.c:192
#4 0x7fe272b7f4a2 in nir_metadata_require ../src/compiler/nir/nir_metadata.c:40
#5 0x7fe272ba0d50 in nir_opt_cse_impl ../src/compiler/nir/nir_opt_cse.c:43
#6 0x7fe272ba0d50 in nir_opt_cse ../src/compiler/nir/nir_opt_cse.c:67
#7 0x7fe272686f83 in gl_nir_opts ../src/compiler/glsl/gl_nir_linker.c:92
#8 0x7fe271a31e69 in create_conversion_shader ../src/mesa/state_tracker/st_pbo_compute.c:701
#9 0x7fe271a353fa in create_conversion_shader_async ../src/mesa/state_tracker/st_pbo_compute.c:810
#10 0x7fe271806ea8 in util_queue_thread_func ../src/util/u_queue.c:309
#11 0x7fe27186379a in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
#12 0x7fe27ea9a7c3 (/lib64/libc.so.6+0x867c3)
...
SUMMARY: AddressSanitizer: 1384704 byte(s) leaked in 17291 allocation(s).
Fixes: 5dab7673e1 ("mesa/st: add specialized pbo download shaders")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30068>
(cherry picked from commit f3c9ea9b8d)
These flags are set at the end of the job based on its buffer usage and
then checked by following jobs.
If an application toggles stencil and depth tests alternatigly in
a sequence of jobs while also relying on previous contents to render,
with the current assignment the reload flags for depth or stencil may
be cleared incorrectly and a depth/stencil buffer may not be properly
reloaded.
Cc: mesa-stable
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30065>
(cherry picked from commit aa4d0836fe)
The NVIDIA proprietary driver exposes a DRM device these days and this
can trip up NVK as it advertises an NVIDIA device id. We fail to
enumerate but the check for nouveau happens too late and we throw a
warning. This means tha if NVK is even installed side-by-side with the
proprietary driver, we spam warnings on every device enumeration. It's
better to fail silently.
Fixes: 83786bf1c9 ("nvk: add vulkan skeleton")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11441
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30035>
(cherry picked from commit 73ec9f0183)
This adds a magic number and the device name to the binary in order to
verify we indeed have a binary we can parse and matches the device.
Also save the binary header explicitly in little-endian order, so that we
at least make sure that's always the same.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29946>
(cherry picked from commit f08f770f16)
Coverity has spotted a place where we could in theory overflow. In
reality it wont happen as the potential overflow is a bitfield with a
maximum of two values. Add an `assume()` statement to help out the
compiler and document our assumption.
fixes: dc1aedef2b
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29825>
(cherry picked from commit dc604f340a)
812b3415 added rules for upcasts with comparisons with a variety of
types. The float & unsigned rules should be ok, but the signed integer rules are
unsound as currently implemented. This can cause end-to-end miscompiles.
I originally hit this issue while debugging a large real world OpenCL kernel. I
found the bug symptoms changed when disabling loop unrolling, which tipped me
off to a compiler bug. I've reduced it to a minimal test case. Imagine my
surprise when I find out the NIR my backend ingested was already constant folded
to be wrong.
In the minimal test case, during optimization we have NIR:
32 %6 = ....
64 %9 = i2i64 %6
64 %44 = load_const (0x0000000000000001)
1 %45 = ilt %9, %44 (0x1)
This is a simple check (int64_t)%6 < 1.
nir_opt_algebraic turns this into:
32 %6 = ...
64 %9 = i2i64 %6
64 %44 = load_const (0x0000000000000001)
64 %55 = load_const (0x0000000080000000 = 2147483648)
1 %56 = ilt %55 (0x80000000), %44 (0x1)
64 %57 = load_const (0x000000007fffffff = 2147483647)
1 %58 = ilt %57 (0x7fffffff), %44 (0x1)
32 %59 = i2i32 %44 (0x1)
1 %60 = ilt %6, %59
1 %61 = ior %58, %60
1 %62 = iand %56, %61
This pile of math constant-folds to an unconditional "false"! The problem is
%56. At first glance, INT32_MIN < 1 is true so %56 should be true. Indeed, it
should. But here's the kicker: both constants are 64-bit here, so the ilt
operation is a 64-bit comparison -- that left-hand side is INT32_MIN
zero-extended to 64-bit for the signed comparison at 64-bit. So in fact, it
evaluates to false, causing the whole expression to go false. If we're going to
do a 64-bit comparison for %56, then we need to sign-extend the bound. So we'll
just adjust the Python and be on our way, right?
Unfortunately the issue is deeper. According to the comment in the generated
nir_opt_algebraic.c file, the guilty algebraic rule is:
('ilt', ('i2i64', 'a@32'), '#b') =>
('iand', ('ilt', -2147483648, 'b'), ('ior', ('ilt', 2147483647, 'b'), ('ilt', 'a', ('i2i32', 'b'))))
From a Python perspective? That rule is correct. -2147483648 < 1 is a true
statement. Adjusting the Python rule is not the appropriate solution here, since
the issue is more fundamental and might affect other rules. The real problem is
the translation of that Python replacement tree into C, incorrectly
zero-extending -2147483648 into 0x0000000080000000 instead of sign-extending to
0xffffffff80000000.
Crawling down the rabbit hole of the generated algebraic file, we see the
constant encoded as:
{ .constant = {
{ nir_search_value_constant, 64 },
nir_type_int, { -0x80000000 /* -2147483648 */ },
} },
NIR correctly translates the negative constant to a C level negate operation of
its absolute value. This maps to the correct sign-extension...
...for all constants except for INT_MIN. Because that constant lacks a ULL
suffix, it is a 32-bit integer. And for this integer (only), negating it hits
signed integer overflow (UB!) and then we end up with an effective
zero-extension when going to 64-bit.
This patch fixes the end-to-end miscompile.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Closes: #11402
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29952>
(cherry picked from commit 270446ee21)
This is not safe when we have conditional spills since we could be
spilling disabled lanes with undefined values that could overwrite
valid data for those lanes from a previous spill of the same temp
that was unconditional (or that condionally enabled those same
lanes).
Fixes some Piglit OpenCL tests as well as the following OpenCL tests:
integer_divideAssign
integer_moduloAssign
integer_mad_sat
integer_ops integer_divideAssign
integer_ops integer_mad_sat
integer_ops integer_moduloAssign
integer_ops quick_char_math
integer_ops quick_short_math
math_brute_force half_powr
math_brute_force pow
math_brute_force pown
math_brute_force powr
math_brute_force rootn
Fixes: 597560e27c ('broadcom/compiler: always enable per-quad on spill operations')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29909>
(cherry picked from commit d1f8351f3c)
The multop instruction implicitly writes rtop which is not preserved
acrosss thread switches. We can spill the sources of the multop
(since these would happen before multop) and the destination of
umul24 (since that would happen after umul24).
Fixes some OpenCL tests when V3D_DEBUG=opt_compile_time is used to
choose a different compile configuration.
cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29909>
(cherry picked from commit 38b7f411a1)
In a scenario where two non-concurrent cmdbufs are submitted to the
compute queue and with the second one using DGCC, the driver would have
chained the CS of the first cmdbuf to the new IB created right after
the DGC IB is executed.
Found while working on DGC task shader with vkd3d-proton.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29913>
(cherry picked from commit fec9b56f17)
nir_opt_preamble sometimes adds useless expressions, in which case we
may have stc instructions and no corresponding use of the constant.
Things can go sideways when these aren't included in the constlen, so
far only observed when earlypreamble is enabled.
Fixes: ccc64b7e00 ("ir3: Plumb through store_uniform_ir3 intrinsic")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29903>
(cherry picked from commit c42f6597f9)
radv has a comment in radv_meta_dcc_retile.c:
* BPE is always 4 at the moment and the rest is derived from the tilemode.
radeonsi has in si_retile_dcc:
/* We have only 1 variant per bpp for now, so expect 32 bpp. */
assert(tex->surface.bpe == 4);
This fixes ext_image_dma_buf_import-modifiers for radeonsi.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29612>
(cherry picked from commit abd048124a)
We shouldn't use this extension at all if we're not using the HTML
builder. This should hopefully fix this issue a bit more fundamentally.
This caused issues when using the spelling extension, something I do
locally from time to time.
Fixes: f72033bb70 ("docs: add bootstrap extension")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29888>
(cherry picked from commit f4e7204e73)
This will prevent the dynamic loader to pick the wrong function once we
rename things to the proper API names.
In any case, this should have been done all along anyway.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29855>
(cherry picked from commit be090abf2e)
In this assert we want to enforce that if a cached buffer is created
it is a cached+coherent as Xe KMD don't support cached+incoherent.
Did not caught this issue because it only reproduces in platforms with
GPU outside of LLC.
Fixes: 9d8d5cf8c9 ("anv: Remove block promoting non CPU mapped bos to coherent")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29826>
(cherry picked from commit 73ce3143a8)
The intention of this block was to set one of the flags that is used
to select a PAT index but this was doing more than that.
It was promoting WB+0 way coherency BOs to WC+1 way coherency possibly
causing regression in platforms without LLC.
anv_device_get_pat_entry() return WC/writecombining if no flags is
set so we don't need this block after all.
Reported-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Fixes: a65e982b44 ("anv: Split ANV_BO_ALLOC_HOST_CACHED_COHERENT into two actual flags")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29769>
(cherry picked from commit 9d8d5cf8c9)
CTS tests both layered and separate DPB, but radv wasn't handling
layered properly when used with the tier 2 dpb handling.
This adjusts the addresses to use the layer index for tier2.
Fixes dEQP-VK.video.decode.*layered*
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29758>
(cherry picked from commit b888946f7a)
On newer devices where ZPASS_DONE events have sample count writing
abilities the firmware expects these events to come in begin-end pairs,
essentially corresponding to a typical occlusion query usage. Since this
event is also used in the autotuner we have to avoid event pairs to be
emitted in an interleaved fashion.
Additional renderpass state now tracks whether a given renderpass contains
an occlusion query. If so, autotuner will emit miscellaneous ZPASS_DONE
events in order to form its own begin-end pairs before and after the
renderpass commands.
Occlusion query behavior inside a renderpass doesn't change. But when used
outside of a renderpass, possible autotuner usage requires to again emit
ZPASS_DONE events that end up forming begin-end pairs of these events both
at the start and the end of the query.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 4e6a1f8852 ("tu/autotune: Use `CP_EVENT_WRITE7::ZPASS_DONE` on A7XX")
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29403>
(cherry picked from commit 5653c52151)
When both an interval and some of its children would be live-in, we used
to add phis for all of them. This could lead to cases where the pressure
after spilling was higher than before.
This happens, for example, when both a split and its parent are live-in.
Before spilling, the split wouldn't add to the pressure because its
parent had already been inserted. After spilling, since we created a phi
for the split, the link with its parent would be lost and it would add
to the pressure.
Fix this by only adding phis for top-level intervals and adding splits
after them.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
(cherry picked from commit 6bc7cd6108)
There were a few places that used an instruction pointer to decide where
new instructions should be created. NULL was used to add them at the end
of the block. While fixing a spilling bug, a new option was needed to
add instructions at the beginning of the block. This will be much easier
to implement using cursors.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
(cherry picked from commit 18cd803cef)
Whenever instructions need to be created at specific locations, ir3
often passes around an instruction pointer. When set, new instructions
are added before or after it (depending on the context). When NULL, new
instructions are added at the end of the block. This whole scheme is
confusing.
This patch adds ir3_cursor and ir3_builder structs and the associated
helper functions. The API mirrors the one from nir_cursor/nir_builder.
This patch does not refactor existing code to use the new API. This will
happen in future patches.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
(cherry picked from commit 1972db36c6)
ir3_force_merge (through merge_merge_sets) expects instructions to be
indexed. However, the instructions created during spilling would not be
automatically indexed at this point.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
(cherry picked from commit 7a5b198a44)
It might happen that a collect that cannot be coalesced with one of its
sources while spilling can be coalesced with it afterwards. In this
case, we might be able to remove it in remove_src_early during spilling
but not afterwards (because it may have a child interval). If this
happens, we could end up with a register pressure that is higher after
spilling than before. Prevent this by never removing collects early
while spilling.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
(cherry picked from commit 1bc3b819e6)
for drivers that don't support PIPE_CAP_SHAREABLE_SHADERS,
the zombie shader mechanism is used, storing shaders to delete after
the next flush
the zombie mechanism also calls bind_*_state(pipe, NULL) during deletion,
however, which breaks drivers in the following scenario:
* create_all_shaders(pipe_A)
* bind_vs(pipe_A, vs_A)
* bind_fs(pipe_A, fs_A)
* draw(pipe_A)
* makeCurrent(pipe_B)
* delete_vs(pipe_B, vs_B)
* vs_B must only be deleted on pipe_A
* zombie_shader_add(pipe_A, vs_B)
* makeCurrent(pipe_A)
* free_zombie_shaders(pipe_A)
* bind_vs(pipe_A, NULL)
* delete_vs(pipe_A, vs_B)
* draw(pipe_A)
* boom
the problem being that bind_vs(pipe_A, NULL) was called when deleting
vs_B, but it was actually vs_A which was bound
to solve this, just flag the shader state for updating and let st figure it out
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11122
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29680>
(cherry picked from commit 95828d8901)
The commit 973e6f3b implemented compaction of the stream-number space. The
functions `update_vertex_elements(_sw)` began using the post-compacted
stream-numbers/indices when maintaining the `stream_usage_mask` and
when reading from the arrays `vtxstride` and `stream_freq`.
But, the `stream_instancedata_mask`, with which the `stream_usage_mask`
is compared/bitwise-anded, maintains bits for the pre-compacted indices.
Additionally, the information within the arrays is stored using the
pre-compacted indices.
The functions have a disagreement, regarding the type (pre- vs post-
compacted) of indices, with the rest of the relevant source. This change
removes the disagreement by having them use pre-compacted indices when
maintaining the `stream_usage_mask` and when reading from the arrays.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11283
Fixes: 973e6f3b ("gallium: remove start_slot parameter from pipe_context::set_vertex_buffers")
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29704>
(cherry picked from commit 4bf330471b)
If the application binds graphics shaders and that RADV performs an
internal operation with a compute pipeline, no shader objects would be
restored because binding the internal compute pipeline resets the ESO
state.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29678>
(cherry picked from commit 07eb970d67)
It's possible if nouveau_ws_bo_destroy() races with
nouveau_ws_bo_from_dma_buf() for the BO to be found in the cache and
referenced between dropping the final reference and actually invoking
GEM_CLOSE. This would result in us having a closed BO somewhere in our
cache.
Fixes: c370260a8f ("nouveau/winsys: Add dma-buf import support")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29737>
(cherry picked from commit faaf33556e)
I've been seeing a bunch of read page faults at the end of the
shader allocation, nvk uses a full page at the end to overallocate
so align with that and see if it goes away.
ahulliet and skeggsb both said 2k was used.
Cc: mesa-stable
Reviewed-by: Arthur Huillet <ahuillet@nvidia.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29722>
(cherry picked from commit f7434d7576)
MOV_INDIRECT picks one lane from the src[0] and moves it to all lanes
in the destination. Even if we split the instruction, src[0] should
remain identical.
Noticed this while trying to use this instruction in SIMD32. All
current use cases are limited to SIMD8 shaders (or SIMD16 on Xe2). Or
maybe in SIMD32 but with a uniform src[0]. That's we think we've never
seen the issue so far.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28036>
(cherry picked from commit 13dc2a28ce)
if mesh and task shaders are bound separately, and if they have different
workgroup sizes, the setting of workgroup size will be broken if
set during shader bind
this must be deferred to draw time to pull the correct values
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29733>
(cherry picked from commit 2bb35bf489)
If we have an array of multi-planar descriptors, buffer_list was
incorrectly incremented and this could have overwritten some BO entries.
In practice, this situation should be very rare because most of the
applications enable the global BO list.
Cc: mesa-stable
Closes: #10559
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28816>
(cherry picked from commit 730ba8322f)
Fix compilation failure when image is embedded in struct when
GL_EXT_shader_image_load_formatted is enabled:
struct GpuPointShadow {
image2D RayTracedShadowMapImage;
};
layout(std140, binding = 2) uniform ShadowsUBO {
GpuPointShadow PointShadows[1];
} shadowsUBO;
Compile log:
error: image not qualified with `writeonly' must have a format layout qualifier
Fixes: 082d180a22 ("mesa, glsl: add support for EXT_shader_image_load_formatted")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29693>
(cherry picked from commit b1d0ecd00d)
The Vulkan spec says:
"If pColorAttachmentInputIndices is NULL, it is equivalent to setting
each element to its index within the array."
Fix updated dEQP-VK.dynamic_rendering.primary_cmd_buff.local_read.*.
v2: Fix it correctly (Samuel)
Fixes: 03490ec019 ("vulkan/runtime: rework VK_KHR_dynamic_rendering_local_read state tracking")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29703>
(cherry picked from commit 51f410f621)
According to PRMs:
"All parameters are of type IEEE_Float, except those in the The ld*,
resinfo, and the offu, offv of the gather4_po[_c] instruction message
types, which are of type signed integer."
Currently, we load parameters with the correct types, but use them as send
sources with the default float type, which may confuse passes downstream.
Fix this by actually storing the retyped sources.
Cc: mesa-stable
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29581>
(cherry picked from commit 5ca51156e2)
This AUX-TT is only updated on the CPU since ee6e2bc4a3 ("anv: Place
images into the aux-map when safe to do so"). So the only really
important invalidation that needs to happens is on the beginning of a
primary command buffer.
We are required to idle the pipes prior invalidation the AUX-TT. This
might not be happening when the invalidation is put at the beginning
of the secondary command buffers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29671>
(cherry picked from commit 1851629407)
This change updates the affected calls to the proper function
which is radeon_set_config_reg().
For instance, this issue is triggered with
"piglit/bin/textureSize tes isampler2DMSArray -auto -fbo":
vertex-program-two-side: ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:4981: void si_emit_spi_ge_ring_state(si_context*, unsigned int): Assertion `(0x008988) >= CIK_UCONFIG_REG_OFFSET && (0x008988) < CIK_UCONFIG_REG_END' failed.
Fixes: bd71d62b8f ("radeonsi: program tessellation rings right before draws")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29645>
(cherry picked from commit 301a3bacce)
7278 is the chip on the rpi3, while the rpi4 that made it to market has
the 2711 chip.
When this was introduced (82bf1979), the rpi4 was probably still in
flux, which is why the rpi3 chip was put there (and v3d doesn't care
about that, but v3dv does).
cc: mesa-stable
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29584>
(cherry picked from commit 46247b3827)
It was set to "always run" for amd common files changes when I obviously
meant for it to be manual and messed up my copy/paste when I wrote that.
Fixes: ebaede788e ("amd/ci: limit radv jobs to radv + aco files changes")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29550>
(cherry picked from commit 47bd1cff4b)
This was added in 6e0089307e without a
dependency on libdrm_nouveau. If libdrm is not compiled with nouveau
enabled, the build errors here.
This is currently unused and since
821f4c8d99 removed dependency on
libdrm_nouveau, this should be gone too.
Fixes: 821f4c8d99 ("nouveau: import libdrm_nouveau")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29517>
(cherry picked from commit 4f5503fa2d)
Iris calls iris_resource_get_param with PIPE_RESOURCE_PARAM_STRIDE
internally now when exporting memory objects. OpenCL's gl_sharing allows
to export buffers as well, which do not have strides.
This fixes the assert being hit there for buffers.
Fixes: 831703157e ("iris: Use resource_get_param in resource_get_handle")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29501>
(cherry picked from commit bc149e0303)
In 2c65d90bc8 I forgot to add the new SHADER_OPCODE_READ_MASK_REG
opcode to the list of barrier instruction in the scheduler. Let's just
use a single opcode for all ARF registers that need special
scoreboarding and put the register as source (nicer for the debug
output).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2c65d90bc8 ("intel/brw: ensure find_live_channel don't access arch register without sync")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
(cherry picked from commit d8b78924c5)
In order to turn on/off through SNMP DuT under PoE switch, the SNMP key
in some vendors don't directly use the interface number, but a number
shifted a base number.
Define this base number as BM_POE_BASE environment in the runner.
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29306>
(cherry picked from commit 90f8be9bda)
Macro values that define values for different HW generations should
use the V3DV_X helper instead of being defined under a V3D_VERSION #if
condition.
Without this change, the original V3D_CLE_READAHEAD and
V3D_CLE_BUFFER_MIN_SIZE definitions used were only working for 4.2 HW.
For the 7.1 HW (RPi5) the 4.2 definitions were applied.
The CLE MMU errors were hidden as they were reported at dmesg as
"MMU error from client PTB (1) at 0x1884200, pte invalid" instead of
client CLE. So fixes all v3d dmesg warnings for PTB MMU errors on RPi5.
With this change we really don't need different functions per HW generation,
so we rename back file v3dx_cl.c to v3d_cl.c. As before, we can use
only the packets definitions for 4.2 HW as they use the same opcode as 7.1 HW.
Fixes: 11dce2ac81 ("v3d: fix CLE MMU errors avoiding using last bytes of CL BOs.")
Fixes: e2c624e74e ("v3d: Increase alignment to 16k on CL BO on RPi5")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29496>
(cherry picked from commit f32a258503)
Macro values that define values for different HW generations should
use the V3DV_X helper instead of being defined under a V3D_VERSION #if
condition.
Without this change, the original V3D_CLE_READAHEAD and
V3D_CLE_BUFFER_MIN_SIZE definitions used were only working for 4.2 HW.
For the 7.1 HW (RPi5) the 4.2 definitions were applied.
The CLE MMU errors were hidden as they were reported at dmesg as
"MMU error from client PTB (1) at 0x1884200, pte invalid" instead of
client CLE. So fixes all v3dv dmesg warnings for PTB MMU errors on RPi5.
With this change we really don't need different functions per HW generation,
so we rename back file v3dvx_cl.c to v3dv_cl.c. As before, we can use
only the packets definitions for 4.2 HW as they use the same opcode as 7.1 HW.
It fixes also an indentation error introduced with 26c8a5cd72.
Fixes: bb77ac983e ("v3dv: Increase alignment to 16k on CL BO on RPi5")
Fixes: 26c8a5cd72 ("v3dv: fix CLE MMU errors avoiding using last bytes of CL BOs.")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29496>
(cherry picked from commit 07d3d55783)
The SamplerDescriptor structure has a field which describes how
floating point coordinates should be converted to fixed point.
Setting this to "true" (which causes round to nearest even) fixes
a failing CTS test.
The CTS test in question is:
dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_float_color
The OpenGL spec is somewhat vague about how rounding is to be
performed, so it appears both settings should be legal; this may
indicate a problem with the CTS. Nevertheless "round to nearest even"
is probably a better default and since it fixes the failing test we
may as well use it.
Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29464>
(cherry picked from commit d91d2c275e)
Currently for an "unavailable" query, if VK_QUERY_RESULT_PARTIAL_BIT is
set, anv will return (slot.end - slot.begin). This can cause underflow
because slot.end might still be at the initial value of 0.
This commit fixes the issue by returning 0 in that situation.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29447>
(cherry picked from commit f8ccf70c99)
The queueFlags of the associated queue may have more flags than just the
type of queue it is, based on what that queue supports, like sparse or
protected content. Check that the queue is a blitter engine instead.
Fixes a bunch of dEQP-VK.api.copy_and_blit.core.*_transfer on MTL with
ANV_SPARSE=0
Fixes: 17b8b2cffd ("anv: Add support for a transfer queue on Alchemist")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29336>
(cherry picked from commit 8d098ecfea)
The code for checking flow control did not realize that
`LD_TEX` and `LD_TEX_IMM` were memory accesses, and hence was
not inserting waits where these were necessary. This showed up
as flakes in KHR-GLES31.core.shader_image_load_store.basic-glsl-misc-fs
Cc: mesa-stable
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29363>
(cherry picked from commit 272dcaff01)
LLVMContextr is not thread safe. There are many code paths that use
llvmpipe_context::context and adding locking to all of them is
difficult and adds unnecessary overhead. This approach restricts locking
to lp_sampler_matrix, which makes covering all uses of the LLVMContext
easy and only adds overhead when running lavapipe.
Fixes: 7ebf7f4 ("llvmpipe: Compile sample functioins on demand")
Reviewed-by: Roland Scheidegger <roland.scheidegger@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29397>
(cherry picked from commit c31038ef98)
In the sort functions used to sort varyings in gl_nir_link_varyings,
we were only checking the first input for whether or not it is xfb.
Check both inputs, and also provide a definite order for the xfb vs.
non-xfb varyings (the xfb come last, as the initial sort established).
This fixes a problem encountered on panfrost, where qsort could
mix xfb and non-xfb varyings which started out separate.
Note that the sort is still not stable. We probably should make it
stable, but that is a more extensive change that's handled in a later
commit.
Cc: mesa-stable
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29178>
(cherry picked from commit 5102a922e7)
The GLES spec limits the valid combinations of format and type that
may be returned by queries and/or used by ReadPixel. The list of valid
combinations appears in table 8.2 of the GLES 3.2 spec. Our code for
reporting the type and format of the current framebuffer, however,
does not verify that the combination is legal for GLES. For example,
RGBA and UNSIGNED_SHORT_1_5_5_5_REV is not a valid GLES combination,
but it's what we were returning for a panthor 16 bit frame buffer.
We can fix this either by changing the format or type that we return
(internally we can handle any format/type combination). We advertise the
read_format_bgra extension, so we could return GL_BGRA for the format.
However, very few applications (including notably the Khronos CTS for GLES)
cope well with BGRA. So instead we change the type to a non-_REV one
so that the combination appears in the GLES spec table of legal values.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29144>
(cherry picked from commit 4d298673da)
We can emit spill setup before RA if we use scratch. In that case
we have the same situation as during spilling, with the caveat that
we have already emitted the instructions so we need to find them
(they should be the only instructions ones before the instructions
accessing payload registers) and flag them as such.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
(cherry picked from commit 865e682ad7)
We read our payload registers first in the shader so we generally don't have
to care about temps being allocated to them and stomping their value before
we can read them. Hoewer, spilling setup instructions are an exception since
these will be inserted first when there is any spilling in the program.
To fix this, we flag RA nodes involved with these instructions so we can
then try to avoid assiging these registers to them.
Fixes CTS failures with V3D_DEBUG=opt_compile_time, particularly:
dEQP-VK.binding_model.buffer_device_address.set0.depth2.basessbo.convertcheckuv2.nostore.single.std140.comp_offset_nonzero
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
(cherry picked from commit cb83f25b39)
When compilation is required, we should return
VK_PIPELINE_COMPILE_REQUIRED. The spec prevents the application from
passing a module or SPIR-V code so we have nothing to compile if the
cache lookup fails :
VUID-VkPipelineShaderStageCreateInfo-stage-06844:
If a shader module identifier is specified for this stage, a
VkShaderModuleCreateInfo structure must not be present in the pNext
chain
VUID-VkPipelineShaderStageCreateInfo-stage-06848:
If a shader module identifier is specified for this stage, module
must be VK_NULL_HANDLE
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11208
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29340>
(cherry picked from commit 5f2288095b)
As we are marking the last V3D_CLE_READAHEAD bytes as unusable we don't
need to reserve V3D_CL_MAX_INSTR_SIZE bytes for the CLE packet.
This reverts c2601f0690 ("v3dv: ensure at least V3D_CL_MAX_INSTR_SIZE
bytes in last CL instruction")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29023>
(cherry picked from commit 7afebc15ce)
We increase the alignment to 16k for BOs allocated for the CL on RPi5 HW.
So we have the same ratio of usable space because of HW readahead as
than on RPi4, as readahead has been increased from 256 to 1024 bytes on
RPi5.
We have also concluded that when the kernel is running with 16k pages
that is the default on Raspberry Pi 5 HW, BO allocations are aligned to
16k so this increase has no cost and we would be using memory more
efficiently.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29023>
(cherry picked from commit bb77ac983e)
We increase the alignment to 16k for BOs allocated for the CL on RPi5 HW.
So we have the same ratio of usable space because of HW readahead as
than on RPi4, as readahead has been increased from 256 to 1024 bytes on
RPi5.
We have also concluded that when the kernel is running with 16k pages
that is the default on Raspberry Pi 5 HW, BO allocations are aligned to
16k so this increase has no cost and we would be using memory more
efficiently.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29023>
(cherry picked from commit e2c624e74e)
The last V3D_CLE_READAHEAD bytes of the CLE buffer are unusable because
using them would prefetch the next readahead bytes of the CL that would
be outside the allocated BO. To guarantee that we can chain a BO to the
current CL we always reserve space for the BRANCH or
RETURN_FROM_SUB_LIST packets.
Not taking this into account has been generating kernel dmesg errors like
"MMU error from client CLE".
As V3D_CLE_READAHEAD is different from RPi4 (256 bytes) to RPi5 (1024 bytes).
So we needed to rename v3dv_cl.c to v3dvX_cl.c to have different objects per
V3D_VERSION.
Extra assertions have been included to validate that we don't write
packets over the usable size of the CL silently.
v2: - Do not declare unusable the space needed for the BRANCH packet,
but take it into account for all space reservations.
v3: - Squash here ("v3dv: Secondary CL needs also to handle CLE readahead")
- Remove spureous parenthesis (Iago Toral)
- Refactor to avoid checking for needs_return_from_sub_list inside
cl_alloc_bo adding unusable_space as new parameter.
v4: - Improved logic for chaining BOs moving it to cl_alloc_bo using
a new enum v3dv_cl_chain_type to identify the different kinds
of BO chaining. Now we increase the size of the BO just before
submitting the BRACH/RETURN_FROM_SUB_LIST packages.
v5: - Assert on BO size updates that we are within the BO size.
(Iago Toral)
v6: - Remove changes at cmd_buffer_end_render_pass_secondary as we
assumed that cl->bo was already allocated when ending the
secondary CL, but it can be NULL. And this was already handle
by current code.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29023>
(cherry picked from commit 26c8a5cd72)
The last V3D_CLE_READAHEAD bytes of the CLE buffer are unusable because
using them would prefetch the next readahead bytes of the CL that would
be outside the allocated BO. To guarantee that we can chain a BO to the
current CL we always reserve space for the BRANCH packet.
Not taking this into account has been generating kernel dmesg errors like
"MMU error from client CLE".
As V3D_CLE_READAHEAD is different from RPi4 (256 bytes) to RPi5 (1024 bytes).
So we needed to rename v3d_cl.c to v3dX_cl.c to have different objects per
V3D_VERSION.
Extra assertions have been included to validate that we don't write
packets over the usable size of the CL silently.
v2: - Remove spurious blank line (Iago Toral)
- Do not declare unusable the space needed for the BRANCH packet,
and take it into account for all reservations.
v3: - Handle BRANCH packet reserve only when CLE BO allocation is done.
v4: - Assert on BO size updates that we are within the BO size.
(Iago Toral)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29023>
(cherry picked from commit 11dce2ac81)
When we start writing to an XFB buffer we need to synchronize with
any batches reading from it (because the data they need is about
to be overwritten). Do this by introducing a barrier in csf_launch_xfb.
This patch fixes a valhall failure in
KHR-GLES31.core.vertex_attrib_binding.advanced-iterations
Cc: mesa-stable
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29092>
(cherry picked from commit eefe34127f)
The EGL_WL_create_wayland_buffer_from_image is still used in WPE WebKit.
There is work in progress to continue adoption of DMA-BUF usage inside
WebKit which will eventually render the extension unneeded; but in the
meantime an update to a version of Mesa without the extension would
render applications using WPE WebKit unusable.
This reverts commit a3418105b9.
Signed-off-by: Adrian Perez de Castro <aperez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29266>
(cherry picked from commit 2934e1fad5)
The original fix from
0f3370eede ("raseonsi/vcn: fix a h264 decoding issue")
would in some cases also trigger for I frames with interlaced streams.
Instead of checking used_for_reference_flags, use slice type and
only add one reference for P/B frames if needed.
This change still fixes playback of the sample from the original issue,
avoids the issue with interlaced streams and also fixes the case where
application provides no references at all.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11060
Cc: mesa-stable
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29055>
(cherry picked from commit 5f4a6b5b00)
This change handles the case when "vertex_fetch_shader.cso" is null,
it implements the previous behavior in this specific case. This
situation is happening with clover.
For instance, this issue is triggered with "piglit/bin/cl-custom-buffer-flags":
==6467==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000000c (pc 0x7ff92908fe6e bp 0x7ffe86ae5ad0 sp 0x7ffe86ae5a30 T0)
==6467==The signal is caused by a READ memory access.
==6467==Hint: address points to the zero page.
#0 0x7ff92908fe6e in evergreen_emit_vertex_buffers ../src/gallium/drivers/r600/evergreen_state.c:2123
#1 0x7ff92908444b in r600_emit_atom ../src/gallium/drivers/r600/r600_pipe.h:627
#2 0x7ff92908444b in compute_emit_cs ../src/gallium/drivers/r600/evergreen_compute.c:798
#3 0x7ff92908444b in evergreen_launch_grid ../src/gallium/drivers/r600/evergreen_compute.c:927
#4 0x7ff9349f9350 in clover::kernel::launch(clover::command_queue&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&) ../src/gallium/frontends/clover/core/kernel.cpp:105
#5 0x7ff9349c331d in std::function<void (clover::event&)>::operator()(clover::event&) const /usr/include/c++/11.4.0/bits/std_function.h:590
#6 0x7ff9349c331d in clover::event::trigger() ../src/gallium/frontends/clover/core/event.cpp:54
#7 0x7ff9349c82f1 in clover::hard_event::hard_event(clover::command_queue&, unsigned int, clover::ref_vector<clover::event> const&, std::function<void (clover::event&)>) ../src/gallium/frontends/clover/core/event.cpp:138
#8 0x7ff9348daa47 in create<clover::hard_event, clover::command_queue&, int, clover::ref_vector<clover::event>&, clEnqueueNDRangeKernel(cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*, cl_uint, _cl_event* const*, _cl_event**)::<lambda(clover::event&)> > ../src/gallium/frontends/clover/util/pointer.hpp:241
#9 0x7ff9348daa47 in clEnqueueNDRangeKernel ../src/gallium/frontends/clover/api/kernel.cpp:334
Fixes: 659b7eb2 ("r600: better tracking for vertex buffer emission")
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10079
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29163>
(cherry picked from commit f8a1d9f787)
There's a success path on found dmabuf while the iova won't be cleaned
up. This change defers iova alloc till lookup miss and also to prepare
for later racy dmabuf re-import fix.
Also documented a potential leak on error path due to unable to tell
whether a gem handle should be closed or not without refcounting.
Fixes: f17c5297d7 ("tu: Add virtgpu support")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29093>
(cherry picked from commit 6ca192f586)
EGL 1.5 specification requires to not match on EGL_NATIVE_VISUAL_ID.
EGL_MESA_x11_native_visual_id extension allows us to remove this
restriction for X11, where we need to match EGL_NATIVE_VISUAL_ID to find
visuals which allow blending.
The reasoning is that on X11, compositors use the visual as "magic bit"
to decide whether to alpha-blend surface contents.
Unlike on most (all?) other windowing systems, requesting an alpha channel
for the config alone does not already imply blending on the compositor
level.
Thus, in order to allow clients to explicitly request configs with
"magic bit" and, similar to GLX, to order configs in a way so clients
not requesting alpha-blending do not get it by accident, do match
visual ids.
Note that one consequence of this is that more configs get
reported to clients.
Based on a patch by Freya Gentz <zegentzy@protonmail.com>, see
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2376
Signed-off-by: Robert Mader <robert.mader@posteo.de>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9989>
(cherry picked from commit 9bdab38424)
This is our compromise to make NVK and nouveau GL play nice when it
comes to modifiers. The old GL driver depends heavily on the PTE kind
and tile mode, even for images with modifiers. While it correctly
encodes the PTE kind and tile mode in the modifiers it advertises, it
may ignore the modifier and just trust what's set on the BO when it
imports a dma-buf image. This is partly because it doesn't support
VM_BIND and partly because of preexisting bugs in the modifiers
implementation. In either case, we can't fix it retroactively.
To work around this, NVK also sets the PTE kind and tile mode on the BO
when it's a dedicated allocation created for a DRM format modifiers
image. If DRM format modifiers are used without dedicated allocations,
things may still break but that's getting into vanishingly unlikely
scenarios.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24795>
(cherry picked from commit 4ad79bfef4)
If p_start_linear_vgpr allocates a VGPR that is already blocked, RA
will try moving the blocking VGPR somewhere else. If
p_start_linear_vgpr is inserted right before the branch, that move will
be inserted after exec has been overwritten, which might cause the move
to be skipped for some threads.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28041>
(cherry picked from commit 590ea76104)
For dma-buf, if the import and finish occur back-2-back for the same
dma-buf, zombie vma cleanup will unexpectedly close the re-imported
dma-buf gem handle. This change fixes it by trying to resurrect from
zombie vmas on the dma-buf import path.
Fixes: 63904240f2 ("tu: Re-enable bufferDeviceAddressCaptureReplay")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29173>
(cherry picked from commit a1392394ba)
Indeed, the object returned by LLVMCreatePassBuilderOptions()
was not freed.
For instance, this issue is triggered with "piglit/bin/cl-api-build-program":
Direct leak of 32 byte(s) in 1 object(s) allocated from:
#0 0x7f6b15abdf57 in operator new(unsigned long) (/usr/lib64/libasan.so.6+0xb2f57)
#1 0x7f6afff6529e in LLVMCreatePassBuilderOptions llvm-18.1.5/lib/Passes/PassBuilderBindings.cpp:83
#2 0x7f6b1186ee41 in optimize ../src/gallium/frontends/clover/llvm/invocation.cpp:521
#3 0x7f6b1186ee41 in clover::llvm::link_program(std::vector<clover::binary, std::allocator<clover::binary> > const&, clover::device const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ../src/gallium/frontends/clover/llvm/invocation.cpp:554
#4 0x7f6b1150ce67 in link_program ../src/gallium/frontends/clover/core/compiler.hpp:78
#5 0x7f6b1150ce67 in clover::program::link(clover::ref_vector<clover::device> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, clover::ref_vector<clover::program> const&) ../src/gallium/frontends/clover/core/program.cpp:78
#6 0x7f6b11401a2b in clBuildProgram ../src/gallium/frontends/clover/api/program.cpp:283
Fixes: 2d4fe5f229 ("clover/llvm: move to modern pass manager.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29164>
(cherry picked from commit df39994d51)
Commit 7d9ea77b45 ("glx: add automatic zink fallback loading between hw and sw drivers")
added an automatic zink fallback even when the zink gallium is not
enabled at build time.
It leads to unexpected error log while loading drisw driver and
zink is not installed on the rootfs:
MESA-LOADER: failed to open zink: /usr/lib/dri/zink_dri.so
Fixes: 7d9ea77b45 ("glx: add automatic zink fallback loading between hw and sw drivers")
Signed-off-by: Romain Naour <romain.naour@smile.fr>
Reviewed-by: Antoine Coutant <antoine.coutant@smile.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27478>
(cherry picked from commit 02ab51a61e)
commit 1887368df4 ("glx/sw: check for modifier support in the kopper path")
added dri3_priv.h header and dri3_check_multibuffer() function in drisw that
can be build without dri3.
Commit 4477139ec2 added a guard around dri3_check_multibuffer()
function but not around dri3_priv.h header.
Add HAVE_DRI3 guard around dri3_priv.h header.
Fixes: 1887368df4 ("glx/sw: check for modifier support in the kopper path")
v2: Remove the guard around dri3_check_multibuffer() function.
Signed-off-by: Romain Naour <romain.naour@smile.fr>
Signed-off-by: Antoine Coutant <antoine.coutant@smile.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27478>
(cherry picked from commit 3163b65ba7)
We need them for the case where explicit sample locations are not
enabled. While we're at it, fix the case where rasterization_samples=0.
This can happen when rasterizer discard is enabled. This fixes MSAA
resolves with NVK+Zink. In particular, it fixes MSAA for the Unigine
Heaven and Valley benchmark.
This also fixes all of the spec@arb_texture_float@multisample-formats
piglit tests.
Fixes: 41d094c2cc ("nvk: Support dynamic state for enabling sample locations")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10786
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29147>
(cherry picked from commit a160c2a14e)
this otherwise leads to a mismatch where some types of screen may have
the mutex initialized while others don't, in which case dri_release_screen()
will attempt to destroy an uninitialized mutex
cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29021>
(cherry picked from commit bc15c95c7a)
Running
KHR-GL46.sparse_texture_clamp_tests.SparseTextureClampLookupColor test
with Zink on Anv we run into an assert :
assert(inst->mlen <= MAX_SAMPLER_MESSAGE_SIZE * reg_unit(devinfo));
Turns out we've not covered all the cases in the SIMD lowering.
It's a bit of a shame to have both files reproduce the same logic.
Will try to think of a better way to extract the layout of the a send
message but that'll be a much bigger rework.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29118>
(cherry picked from commit d1c01e256d)
I have no idea why I've added it in the first place, but it's causing dead
code warnings to appear with newer rustc versions, so remove it.
Fixes: 7f77f91929 ("rusticl/icd: split Arc part out of CLObject into new trait")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29120>
(cherry picked from commit 146ac5169d)
Using the same buffer without a barrier actually can lead to data races as
drivers might not properly synchronize the content. Using the stream
uploader is a neat fix which prevents us from having to use a barrier but
still keep high throughput when launching kernels back-to-back.
Fixes: 5ff33f9905 ("rusticl: use real buffer for cb0 for drivers prefering")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27666>
(cherry picked from commit 8da8c6c2d8)
Currently ffmpeg has a bug in VAAPI AV1 decode that in some cases
it submits the same slice data buffer as many times as there is tiles.
However, in other cases it behaves correctly and all slice data buffers
contain different parts of bitstream to decode which this change breaks.
Now that the va frontend is passing correct offsets, this fixes decoding
AV1-TEST-VECTORS/av1-1-b8-22-svc-L1T2 with ffmpeg.
This reverts commit e6701f7231.
Cc: mesa-stable
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28960>
(cherry picked from commit 88dfe04b08)
The slice parameter data offset refers to offset in the submitted data buffer.
When multiple slice data buffers are submitted, the offsets of all slices needs
to be adjusted to be based from the start of the first data buffer.
Cc: mesa-stable
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28960>
(cherry picked from commit 6746d4df6e)
On pre-Valhall HW, the fragment shader metadata was part of the RSD
(renderer state descriptor), which was emitted at draw time, but
Valhall introduces a shader program descriptor containing only the
shader information, and this one is emitted at shader preparation
time.
If we don't add the FS state BO to batch, we might end up with a batch
being executed after the shader object has been destroyed, leading to
page faults when the GPU tries to access the shader program descriptor.
We make the panfrost_batch_add_bo() unconditional since it gracefully
handles the NULL case (which will happen on v7-).
Fixes: 087b63cb07 ("panfrost: Allow uploading fragment SPDs")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28926>
(cherry picked from commit 2cc317763c)
D16 rounds towards zero for fp32 -> fp16, but for fixed point it rounds to
nearest even in fp16. MIMG without D16 also rounds to nearest even, but in fp32.
This means D16 and f2f16_rtz(tex@32) can produce different results.
Sadly this also means we can never use d16 if fp16 rounding isn't undefined.
Cc: mesa-stable
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28730>
(cherry picked from commit 3a35522c8a)
Fixes fs-uint-to-float-of-extract-int8.shader_test and
fs-uint-to-float-of-extract-int16.shader_test added by piglit!883.
v2: Expand the comment explaining the potential problem. Suggested by
Caio.
Fixes: e6022281f2 ("intel/elk: Rename files to use elk prefix")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>
(cherry picked from commit 0fa17962d6)
Fixes fs-uint-to-float-of-extract-int8.shader_test and
fs-uint-to-float-of-extract-int16.shader_test added by piglit!883.
No shader-db or fossil-db changes on any Intel platform.
v2: Expand the comment explaining the potential problem. Suggested by
Caio.
Fixes: 29ce110be6 ("i965/fs: Remove extract virtual opcodes.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>
(cherry picked from commit bf5d82654a)
When loading the base shader serialization there is a discrepancy
between the state parameters that may already have been optimized,
because after storing the serialization the shader went through
st_finalize_nir, and _mesa_optimize_state_parameters was run, so
that original state parameters may have been optimized and replaced
by new parameters.
After get_nir_shader is called, the original state parameters are
re-added - in addition to the optimized parameters. This lead to
an bug with the uniform offsets when lowering uniforms to UBOs.
Therefore, as a hotfix for drivers that don't support packed
uniforms, ignore the base serialization and use the
serialization obtained after st_finalize_nir was run. With that
the problem can be avoided.
Fixes: 5eb0136a3c
mesa/st: when creating draw shader variants,
use the base nir and skip driver opts
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10881
v2: reorder conditional evaluation for better readability (zmike)
v3: revert c72bb8de7 ("r300: mark new fails") (Pavel Ondračka)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28994>
(cherry picked from commit 7de8a01087)
When a job is submitted to the flush_queue the resource dt_idx is reset,
and if a readback is requested then we have to make sure that the
corresponding kopper_preset has finished before we can acquire the image
for readback, so wait for the according fence in this case.
This fixes the validation error UNASSIGNED-Threading-MultipleThreads-Write
triggered by piglit "read-front" lavapipe.
Fixes: 8ade5588e3
zink: add kopper api
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28127>
(cherry picked from commit 811ed62865)
When this was promoted to EXT it expanded its properties struct to add a new
supportsNonZeroFirstInstance field.
Fixes: d38ff02c03 ("v3dv: mark some promoted extensions as supported")
Fixes: dEQP-VK.api.info.vulkan1p2_limits_validation.khr_vertex_attribute_divisor
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28964>
(cherry picked from commit e8f96dd0b0)
if swapchain creation fails (e.g., insane cts swapchain configs), the
swapchain gets demoted to a non-window image that is still accessed by
the frontend. this image should not ever hit corresponding zink entrypoints
for swapchain-only images, which requires a flag to test swapchain-edness
cc: mesa-stable
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28904>
(cherry picked from commit a50c17802a)
This was a leftover. Flags can be different than 0, like for required
subgroup size and it should already be correctly supported.
Fixes recent dEQP-VK.shader_object.performance.dispatch_base.
Fixes: 37d7c2172b ("radv: add support for creating/destroying shader objects")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28946>
(cherry picked from commit 0b51868193)
With explicit sync, only if it wasn't done earlier for FIFO.
Prevents potentially unbounded memory usage for (wl_buffer.release
events in) the queue, since we don't dispatch the queue anywhere else
with explicit sync.
v2:
* Use wl_display_dispatch_queue_pending instead of
wl_display_dispatch_queue_timeout. (Sebastian Wick)
* Call it from wsi_wl_swapchain_queue_present instead of
wsi_wl_swapchain_acquire_next_image_explicit. (Joshua Ashton)
Fixes: 5f7a5a27ef ("wsi: Implement linux-drm-syncobj-v1")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28874>
(cherry picked from commit c3be21f177)
This array has 3 components, because it's meant to hold the X, Y and Z
components of the work-group size sysval. However, mir_pick_ubo assumes
vec4 for the push-uniforms, which ends up promoting this to 4
components.
So let's make sure we don't write that last component. It's not going to
do anything good.
In practice, this leads to the viewport descriptor being smashed, which
doesn't actually do any real-world harm, because this only happens in
compute batches where that descriptor is unused. However, writing
outside of arrays is undefined behavior, so we should fix it regardless.
Fixes: 5006167061 ("panfrost: Hook-up indirect dispatch support")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28856>
(cherry picked from commit 186f7fa915)
Some cards can do higher bitrate, but 1000 Mbit/s should be high enough
for any practical use. It's also the value that AMF reports as max bitrate.
Fixes: 54d499818c ("radv/video: add initial support for encoding with h264.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28736>
(cherry picked from commit 1f07f5a79b)
Pending feedback resources (cmds, buffers, slots) for timeline semaphores are
generally reclaimed for re-use during subsequent semaphore waits/queries or any
queue submission containing at least one "wait" semaphore.
They are never reclaimed in the unexpected case when all submissions only
contain "signal" timeline semaphores, which consume such resources but
are never subsequently queried or waited upon.
This strange behavior is observed in several Valve games (Portal 2,
L4D2, CS2), which all run natively on linux with their own internal
distributions of DXVK v2.0 (at time of this MR submission). A Cursory
analysis of recent DXVK history indicates that it may be gone by v2.1.
The consequence is rapid guest memory leak and host Vk resource leak,
resulting in a crash within 1-2 minutes.
Fix that leak by running the reclaimation procedure for submissions with
_any_ accompanying semaphores.
Fixes: d63432012d ("venus: refactor semaphore feedback")
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28915>
(cherry picked from commit ee7e0168a1)
There're many cases in which the ring submissions must succeed. We don't
worry about real oom since things would fail earlier. For simulated oom
from random intentional allocs, there isn't robust way to fail those
must succeeds. e.g. the commands that don't have return codes or valid
error return struct defaults. So real oom propagation is still at best
effort.
Cc: mesa-stable
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28914>
(cherry picked from commit 3e16d25d1a)
optimized pipeline compile jobs may still be ongoing during ctx
destroy, and these must complete too or else crashes will occur
fixes shutdown crash with dEQP-EGL.functional.sharing.gles2.multithread.simple.images.texture_source.teximage2d_render
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28900>
(cherry picked from commit bd1a3921d1)
it's possible for a shader to be precompiling its separate shader variants
during destruction, which requires that the programs set be iterated
under lock in order to prune every new variant as it is created without
crashing
fixes crashes in spec@arb_separate_shader_objects@400 combinations.*
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28900>
(cherry picked from commit f18a1d3a31)
It doesn't work for all models, with the same happening to the
proprietary driver. There may be some hardware limitation on at least
the HW that is currently supported in Mesa.
So match what the proprietary driver is doing and disable by default.
Fixes: d6473ce28e ("etnaviv: Use NN cores to accelerate convolutions")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28918>
(cherry picked from commit 1277f58d8a)
We were accidentally leaving XY_BLOCK_COPY_BLT's Source and Destination
MOCS fields set to 0 (Error: Reserved for Non-Use) on Gfx12.0 systems.
This was causing assert fails in debug builds, since we try to ensure
that we don't do that. In theory, MOCS 0 is supposed to be equivalent
to MOCS 2 (all the caching), but...we probably ought to use MOCS 3
(uncached). Every Gfx12.5+ platform requires it, so although there
isn't a note about Gfx12.0 needing that, it's possible that it does.
We're currently only using the blitter for DRI PRIME blits on Gfx12.0,
anyway, and I think we're flushing all the caches regardless.
This bug was somewhat obscure to hit:
- You need a hybrid graphics system with Gfx12.0 and some other GPU
- You have to be using "reverse PRIME", i.e. rendering on the integrated
GPU and displaying on the discrete one. This is not the common case.
- You have to be using a debug build.
No observable performance delta in GfxBench5 Car Chase (an arbitrary
program) when rendering on Alderlake GT1 and displaying on an Arc A770.
Fixes: 194afe8416 ("anv/iris/blorp: use the right MOCS values for each engine")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28894>
(cherry picked from commit e6fb3ba037)
This was missing and this caused test failures for formats different
than VK_FORMAT_R8_UINT which is the only one supported for FSR.
Fixes recent
dEQP-VK.api.info.unsupported_image_usage.*.fragment_shading_rate_attachment.*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28893>
(cherry picked from commit e8d94536d2)
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.