If p_atomic_cmpxchg doesn't set the ray_query_shadow_bos[bucket] to new_bo
allocated by this thread, it returns the bucket BO allocated by the other
thread and we use it. But due to a mistake, we also release that BO, not
the candidate just allocated by this thread and never used again.
Fixes: 5d3e4193 ("anv: enable ray queries")
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30581>
(cherry picked from commit 1904fe1186)
We can't exceed c_int::MAX, because the CTS casts to ints in a few places.
We also need to take into account max pixel size when restricting to
max_mem_alloc as this cap is pixel based, not byte based.
Cc: mesa-stable
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30739>
(cherry picked from commit eef1af8128)
Make sure to preserve the depth or stencil components of D24S8 using the
fixed codepath just added. While we're here, fix the detection of
whether an attachment is bound.
Fixes: cb0f414b ("tu: Add support for suspending and resuming renderpasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26154>
(cherry picked from commit 812c8f6abe)
We need to make sure that we don't trash a passthrough depth/stencil
aspect if we need to store the whole attachment by loading it
beforehand.
Fixes: cb0f414b ("tu: Add support for suspending and resuming renderpasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26154>
(cherry picked from commit 5377219ca0)
This should be done after reads are checked and
sgpr_read_by_valu_as_lanemask_then_wr_by_salu is reset. The old version
also skipped checking the reads if the write check passed.
fossil-db (navi31):
Totals from 193 (0.24% of 79395) affected shaders:
Instrs: 3212435 -> 3212735 (+0.01%)
CodeSize: 16462868 -> 16463848 (+0.01%); split: -0.00%, +0.01%
Latency: 19492377 -> 19492462 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 4419705 -> 4419718 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Backport-to: 24.1
Backport-to: 24.2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30818>
(cherry picked from commit 61e73c2323)
At least one ray tracing shader in cp2077 is over 4MB on Xe2. There
isn't a memory pool large enough for the allocation, so the driver
crashes instead. This commit adds 8MB and 16MB pools.
I intend this as a stop gap fix. I would prefer to figure out why this
shader is so much larger than on previous platforms. The shader in
question has 3824 spills and 8625 fills. That is not good. I suspect
dealing with that will also solve the problem, but that will require a
bit more time.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11739
Suggested-by: Lionel Landwerlin
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30751>
(cherry picked from commit 09cf9fe8ab)
this pbo shader works by iterating over the framebuffer size
and storing a value to an offset for each source pixel. if the
number of pixels being written out does not correspond to fragcoord
to the extent that certain source pixels are not written at all, however,
then this method should not be used in order to avoid giving broken results
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30689>
(cherry picked from commit c2dcecffc5)
This commit solves the shortage-problem at the blit-functions by
checking the number of fence-registers after updating the batch.
If too many registers are used,
the batch-entries and relocs for the current blit function are
removed by setting batch->ptr and reloc_count to value before
the blit call and calling drm_intel_gem_bo_clear_relocs.
This truncated batch is flushed,
and the batch is updated again for the current blit function.
Cc: mesa-stable
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26769>
(cherry picked from commit ed2123158d)
out_args->scratch_offset and in_wg_id_x will alias on <gfx9.
To avoid the conversion code reading a garbage WG ID, move the
scratch/ring offset writing to the very end.
Fixes: 1e354172 ("radv,aco: Convert 1D ray launches to 2D")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30707>
(cherry picked from commit bd525f4282)
Similar to commit 6cef804067 ("nir/opt_if: fix opt_if_merge when
destination branch has a jump"), we shouldn't combine if statements when
the second if-then-else has a block that ends in a jump.
This fixes a case where opt_if_merge combines
if (cond) {
[then-block-1]
} else {
[else-block-1]
}
if (cond) {
[then-block-2]
} else {
[else-block-2]
}
where `then-block-2` or `else-block-2` ends in a jump. The phi nodes
following the control flow will be incorrectly updated to have an input
from a block that is not a predecessor.
Fixes: 4d3f6cb973 ("nir: merge some basic consecutive ifs")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30629>
(cherry picked from commit d2e6be94ae)
Before we were using direct CP_LOAD_STATE, which is broken with multiple
back-to-back draws. This caused regressions in some DX11 traces when
enabling early preamble. We still need to use indirect CP_LOAD_STATE for
VS params, which are sometimes written by the CP, however for everything
else we should use the new UBO path instead.
Fixes: 76e417ca59 ("turnip,ir3/a750: Implement consts loading via preamble")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30675>
(cherry picked from commit 850f2aab03)
It's one header dword and 5 payload dwords. This was papered over by us
not actually using the UBO path for one of the loads, but that's changed
in the next commit.
Fixes: 76e417ca59 ("turnip,ir3/a750: Implement consts loading via preamble")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30675>
(cherry picked from commit 4f2b5442a6)
After spilling during regular RA, merge sets need to be fixed up. To
find all merge sets, fixup_merge_sets used ra_foreach_dst. However,
after shared RA has run, shared dsts wouldn't have the IR3_REG_SSA flag
set anymore leaving their merge sets lingering. This patch fixes this by
using foreach_dst instead.
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
(cherry picked from commit 28d2a27030)
The preferred register for merge sets was not updated after allocating
one. This caused a new merge set to be allocated for every register it
contains. This patch fixes this by reusing the update function from the
standard RA.
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
(cherry picked from commit 9013e11d8c)
Previously only `-mtls-dialect=gnu2` was probed, which was appropriate
for arm, x86 and x86_64, but not for newer architectures such as
aarch64, loongarch64 and riscv64 which all use `-mtls-dialect=desc`
instead. Because the driver option is not consistent across
architectures (and probably will not), try both variants and choose the
first one working.
While at it, rename "gnu2_*" variables to "tlsdesc_*" respectively, for
clarity.
Cc: mesa-stable
Reviewed-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Yukari Chiba <i@0x7f.cc>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30599>
(cherry picked from commit cc2dbb8ea5)
The `capture_not_overwritten` unit test captures and compares two
backtraces -- one from inside a call to `func_c` and one outside -- and
confirms that they are not identical. That is, that `func_c` is in the
backtrace.
On 32-bit x86, without `-fno-omit-frame-pointer`, the function will not
emit a stack frame. As a result, the unit test fails.
The fix is to compile `func_c` with the flag `-fno-omit-frame-pointer`
to prevent the compiler from optimizing out the stack frame which is
otherwise unneeded.
Bug: https://bugs.gentoo.org/823774
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4091
Fixes: d0d14f3f64 ("util: Add unit test for stack backtrace caputure")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30622>
(cherry picked from commit 05dc4eb536)
the CL CTS added a new test being printf("\n", "foo"), but we ended up
printing the new line twice. If we can't find a specifier anymore, ignore
the argument as after the loop processing all arguments we'll print the
remaining format string anyway.
Cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30574>
(cherry picked from commit 4080269845)
Instead of aligning offsets, we just align the size every time we query
it. This simplifies our offset and size calculations later since we can
always just add up descriptor buffer sizes and know that we'll be okay.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: 7ab5c5d36d ("zink: use EXT_descriptor_buffer with ZINK_DESCRIPTORS=db")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30580>
(cherry picked from commit 0f8f407e57)
Using ior here is equivalent to using uadd_sat, but works for every driver
and shouldn't hurt anywhere.
I forgot to fix this up when fixing up some vvl errors with zink.
Fixes crashes with the integer_ctz CL CTS tests in zink.
Fixes: 39ec184db6 ("zink: lower 64 bit find_lsb, ufind_msb and bit_count")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30535>
(cherry picked from commit 48acf9d358)
* close(fd) requires also resetting the fd=-1 or else boom
* checking just driver_name is broken because loader_get_driver_for_fd()
uses MESA_LOADER_DRIVER_OVERRIDE, so there's no way to differentiate
an inferred load
Fixes: b907eb4750 ("egl: don't bind zink under dri2/3")
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30556>
(cherry picked from commit 1a579552af)
The CL CTS started to call this API, luckily we don't have to actually
implement it, because we don't intent to support CL 1.0 only devices in
the first place (probably).
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30575>
(cherry picked from commit cd2dc4f70c)
For some purposes (e.g. advanced blending) we need a non-zero alpha
value returned from reads. This is only guaranteed on Bifrost if
we explicitly request RGB1 component ordering. The default is to use
RGBA component ordering, which for R5G6B5 causes 0 to be read for
alpha.
A complication is that the Mali fixed function hardware requires
four components (which implies RGBA rather than RGB1). If fixed
function blending is in use, we modify the pixel format back to
RGBA when building the blend descriptor.
Cc: mesa-stable
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29606>
(cherry picked from commit 004e0eb3ab)
This reverts commit d6bb4ddc63.
Fixes: d6bb4ddc63 ("d3d12: Video Encode - Remove PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE as not supported")
PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE is necessary for some scenarios like the example below
described in https://github.com/microsoft/WSL/issues/11838
gst-launch-1.0 -v videotestsrc num-buffers=250 !
video/x-raw,width=1920,height=1200 !
vaapipostproc !
vaapih264enc !
filesink location=~/wsl_test.h264
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30548>
(cherry picked from commit a0f1a708c4)
This is more correct than the previous code and the CL CTS relies on edge
case behavior here, e.g. for 1Dbuffer images.
I think part of that is not actually required by the spec, but whatever.
Fixes: 7b22bc617b ("rusticl/memory: complete rework on how mapping is implemented")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30528>
(cherry picked from commit 2484331e82)
An application could map and unmap a host ptr allocation multiple times,
but because how the refcounting works, we might never ended up syncing the
written data to the mapped region.
This moves the refcounting out of the event processing.
Fixes: 7b22bc617b ("rusticl/memory: complete rework on how mapping is implemented")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30528>
(cherry picked from commit 1fa288b224)
If TraceRay() is called with the TerminateOnFirstHit flag, we need to
terminate the ray on the first confirmed intersection. This is handled
by the lowering of accept_ray_intersection and it's working fine for the
case of multiple instances of the intersection shader being called.
But if the shader calls reportIntersection() more than once, we were
handling them all and accepting the closest one regardless of the flag.
Check for the flag on every confirmed intersection and, if set, accept
it right there. The subsequent lowering will take care of terminating
handling the ray termination if necessary.
Fixes new test dEQP-VK.ray_tracing_pipeline.amber.flags-accept-first
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30418>
(cherry picked from commit f8553f56ac)
this had a number of issues:
* pipe_loader_get_driinfo_xml() frees driver_driconf immediately,
except the driOptionCache struct string pointers are all just copied
in merge_driconf instead of having the strings copied, which means any
subsequent access of driver_driconf strings is invalid access
* pipe_loader_drm_get_driconf_by_name() is a disaster that only happened
to work because the dlopen here is the same lib that gets opened elsewhere
by mesa and not closed. if the lib here is actually closed, then all
the statically allocated strings become invalid, which means they need to
be manually copied
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30494>
(cherry picked from commit 0c220741e6)
previously the size was checked at the top of the function, but this
ignored cases where the loader might end up resizing the drawable,
resulting in an attempted 0x0 swapchain creation based on stale
geometry
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30487>
(cherry picked from commit a6d97b0afe)
The iova allocations need to be CPU page aligned. (The GPU itself
always supports 4k mappings regardless of the smallest CPU page size,
but GEM buffer allocations must be an integer number of CPU pages.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Fixes: 63904240f2 ("tu: Re-enable bufferDeviceAddressCaptureReplay")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30431>
(cherry picked from commit 7fe3529715)
It doesn't do anything substantial yet, but it ignores enough so internal
shaders won't generate warnings.
I've also added ByVal parsing, because I need this one to actually fix a
correctness issue in a later patch.
Cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29896>
(cherry picked from commit 9b55dcca54)
Contrary to what the original commit said, this is actually still used
(see .gitlab-ci/bare-metal/poe-powered.sh:205), and the boot retry logic
has been broken ever since, exacerbating the rpi farm boot problems.
Fixes: 97b2afa16a ("ci/bare-metal: Drop the 2 vs 1 exit code from poe_run.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30340>
(cherry picked from commit 2bc82b7147)
At update_map at i915_state_sampler.c max_lod is no longer set to 1
for npots. This almost totally disabled mipmapping.
Max_lod should still be set to 1, but only if it is still 0,
because no mipmap-levels are present.
According to existing comment at update_map this is needed, to
avoid problems at sampling,
if MIN_FILTER and MAX_FILTER differ.
Cc: mesa-stable
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28638>
(cherry picked from commit ad02bfe41d)
Remove at i945_texture_layout_2d() call of util_next_power_of_two(),
which oversized the npot-blocks for every level to get power of 2
for width and height. Hardware doesnot expect these oversized
npot-blocks, causing mangled mipmapping.
This also is done at i915_texture_layout_2d(), which is
used by older gen3-gpus.
Cc: mesa-stable
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28638>
(cherry picked from commit bb95d744ca)
After using sparse array to manager virtgpu bo, we set gem_handle to 0
to indicate that the bo is invalid. However, the gem handle gets closed
before that and can be reused by another newly created bo, leading to
the tracked gem handle being unexpectedly zero'ed out.
Fixes: 88f481dd74 ("venus: make sure gem_handle and vn_renderer_bo are 1:1")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30362>
(cherry picked from commit f788c87d02)
`st_context_invalidate_state` call is required when changing buffer attachments.
Including header with BBitmap class definition is required to properly
call C++ destructor.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30372>
(cherry picked from commit 828c3cf002)
We cannot always use /dev/dri/card0.
As a matter of fact, on systems with SimpleDRM enabled /dev/dri/card0
will be created by it and removed once a GPU driver has loaded.
In any case we shouldn't hard-code the device number and instead walk
the device list to find the first suitable device.
This issue is trivially reproducible with `eglinfo -B -p gbm` on
Ubuntu 24.04 or Fedora 40
Fixes: 32f4cf3808 ("egl/gbm: Fix EGL_DEFAULT_DISPLAY")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30325>
(cherry picked from commit 7949471716)
Thanks to LCSSA, we can absolutely have phis with only one source and we
need to handle those in spilling. Fortunately, there's nothing really
special about that case. I was just prematurely optimizing.
Fixes: bcad2add47 ("nak: Add a spilling pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28084>
(cherry picked from commit 8bf3213a54)
We want to use actual sparse-capable queues as the default
trtt->queue, not copy queues that may have a companion_rcs_batch.
Before this patch, if we expose more than one queue *and* the
application creates a copy queue first, we'll end up setting
trtt->queue as the copy queue, which will GPU hang when we submit the
TR-TT batches as they don't support the pipe_control commands we
issue.
The trtt->queue queue is used for binding/unbinding buffers in code
paths where there's no specific queue coming from user space, such as
when we're creating or destroying a sparse resource.
This is not a problem yet on i915.ko since we are exposing
only a single queue, and it is not a problem for xe.ko since TR-TT is
not the default there. This is also not a problem in applications
that create the render or compute queue first. We plan to expose more
queues when using TR-TT, so this would become a problem without this
patch.
None of VK-GL-CTS seems to exercise that, and none of the Steam games
I tested exercise that as well. I was able to reproduce this issue
using our internal tracing tool.
v2: New implementation that doesn't break when we only have a compute
queue (Lionel).
Fixes: 04bfe828db ("anv/sparse: allow sparse resouces to use TR-TT as its backend")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30252>
(cherry picked from commit 3ab8ff99fa)
Even though we don't advertise the sparseResidency feature, a bunch of
CTS tests just call GetPhysicalDeviceImageFormatProperties2() with
SPARSE_RESIDENCY_BIT and see if that fails.
Fixes: d2177f4764 ("nvk: Don't advertise sparse residency on Maxwell A")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30303>
(cherry picked from commit 68d6cdfbc5)
This fixes sporadic rendering corruption reported on MTL with ChromeOS
in cases where multiple processes including Chrome were utilizing the
GPU concurrently, and one of the processes happened to submit a
BLORP-only batch buffer right after a switch from a different context.
In such a scenario we would fail to add the BO that holds the pixel
hashing tables to the execbuf IOCTL for the BLORP batch, because it
was being pinned from iris_restore_render_saved_bos() which isn't
called for BLORP operations, potentially causing it to use garbage as
pixel pipe hashing tables, which led to corruption of the BLORP
rendering.
Technically this could have affected DG2 as well, but it has only been
reported on MTL so far.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30274>
(cherry picked from commit 49b433d5e7)
On MTL ChromeOS boards, during AI based video conference, we were
observing a lot of overhead from invalidations. Upon debug, it was found
that we were using clflush in this function and that isn't efficient.
With this change, while executing compute workloads like zoo models, we
are getting ~25% performance improvements in a best case scenario.
Rework:
* Jordan: Call intel_clflushopt_range() rather than
__builtin_ia32_clflushopt() because intel_mem.c is not compiled
with -mclflushopt.
Backport-to: 24.1 24.2
Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30238>
(cherry picked from commit 2f6919e6c2)
Despite its name, egl_dri2 works under plain DRI without DRI2, and the
old autotools build system built it when $enable_dri = yes, with no
check for DRI2. This fixes the build for GNU/Hurd, which supports DRI,
but doesn't have DRM and thus no DRI2 support.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/587>
(cherry picked from commit 149e8bff52)
This reverts commit ad862c36e5.
This change does not work, because libdrm is required if with_dri2 is
true. Moreover, we don't want all of DRI2 on Hurd, we just want the
egl_dri2 driver, as done by autotools. So first revert this to stop
trying to build all of DRI2.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/587>
(cherry picked from commit ec55a6c329)
This reverts commit 2fd85105c6.
Despite its name, egl_dri2 works under plain DRI without DRI2, and the
old autotools build system built it when $enable_dri = yes, with no
check for DRI2. A future commit will adapt meson.build to follow that
approach rather than this hackier one.
Note that the case removed in the second hunk is already dead code,
since system_has_kms_drm is false on GNU/Hurd, and could have been
dropped as part of 66d2ae0386 ("meson: forcefully disable libdrm when
host doesn't have it").
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/587>
(cherry picked from commit 8461776a09)
This implements an undocumented workaround for a hardware bug that
affects draw calls with a pixel shader that has 0 push constant cycles
when TBIMR is enabled, which has been seen to lead to a hang with
Fallout 3 and Metal Gear Rising Revengeance. This hardware bug has
been reported as HSDES#22020184996 which is still pending a resolution
by the hardware team. However since this workaround found empirically
has been confirmed to fix the issue reliably and it's relatively
harmless it seems worth checking in already even though no final W/A
number is available nor has the W/A json file been updated.
To avoid the issue we simply pad the push constant payload to be at
least 1 register. This is enabled via a brw_wm_prog_key since the
driver needs to be in agreement with the compiler on whether the dummy
push constant cycle is present, and it can be avoided in cases where
the driver knows that TBIMR will be disabled (e.g. for BLORP).
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10728
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11399
Fixes: 57decad976 ("intel/xehp: Enable TBIMR by default.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30031>
(cherry picked from commit b98eebbcb2)
Sporadically a6xx gpu will fail to recover causing the lava job
a660_vk_full to loop on error messages for three hours before timing
out.
A few sporadic error messages may still be recoverable, but when multiple
errors occur over a short period, successful recovery is unlikely. Parse
the logs to look for repeated error messages within a short time period.
If found, cancel the lava job and rerun it.
Also add unit tests for this behaviour.
cc: mesa-stable
Reported-by: Valentine Burley <valentine.burley@gmail.com>
Acked-by: Daniel Stone <daniel.stone@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30032>
(cherry picked from commit 72c182f873)
If you have something like this :
con 32 %66 = @load_reg (%62) (base=0, legacy_fabs=0, legacy_fneg=0)
con 32 %27 = @resource_intel (%22 (0xdeaddead), %66, %67, %17 (0x0)) (desc_set=2, binding=96, resource_intel=0, resource_block_intel=-1)
Just copying the brw_reg in ssa_values[] is not enough for the
load_reg intrinsic. We need to call get_nir_src() to force some logic
to create the register correct.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b8209d69ff ("intel/fs: Add support for new-style registers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30050>
(cherry picked from commit 67b778445a)
This commit is fixing an issue where all AV1 slice data offsets always point
into the first slice data buffer, even when multiple slice data buffers
are submitted. However, after b0d6e58d88 ("Reapply "radeonsi/vcn: AV1 skip the redundant bs resize"")
only the first slice data buffer will be sent to decoder and the incorrect
behavior is required, so this commit also needs to be reverted.
This reverts commit 6746d4df6e.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30255>
Event::set_status only called the cbs for the passed in status, but we
need it to call all cbs up to the passed in status. This simplifies error
handling.
Cc: mesa-stable
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30215>
(cherry picked from commit d2d3f8e446)
nouveau uses the OS page size which is almost always 4096. The next
patch will make this properly queried but this version is back-portable.
Fixes: 58181b7bbc ("nvk: Bump the sparse alignment requirement on buffers to 64K")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30138>
(cherry picked from commit 68c06558be)
We need to call iris_wait_syncobj() on both release and debug builds,
so take it out of the assert() call. Only assert the result.
With this patch, gnome-session finally works for me. Also steam.
Fixes: 665d30b544 ("iris: Wait for drm_xe_exec_queue to be idle before destroying it")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30195>
(cherry picked from commit 22fe73a86a)
* swrast allocates images aligned to 64x64 tiles, which results in images
that are larger than the window. PutImage requests must be clamped on
the y-axis to avoid uploading/damaging out-of-bounds regions
* winsys coords are y-inverted
cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29910>
(cherry picked from commit 6088a0bf51)
Commit 94989b45a5 ("anv,driconf: Add fake non device local memory WA
for Total War: Warhammer 3") implemented a workaround to make
Warhammer 3 work on ADL, but the game still doesn't work on LNL, which
uses xe.ko, and MTL, which uses i915.ko: it still fails at launch
claiming it couldn't allocate memory.
So in this implementation, instead of clearing DEVICE_LOCAL_BIT we
just duplicate our memory types, one having the bit and one not
having.
v2:
- Check for VK_MAX_MEMORY_TYPES (José)
- Invert the order of the memory types (José)
- Fix white space issue (José)
v3:
- Comment our non-spec-compliance (José)
- Remove useless lines (José)
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8721
Fixes: 94989b45a5 ("anv,driconf: Add fake non device local memory WA for Total War: Warhammer 3")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30162>
(cherry picked from commit 241585667f)
if the src for a replace_buffer call is mapped after replacement:
* avoid clearing access flags
* update valid range
the pointer access here is always safe because the only case in which
this scenario can occur is if tc is forced to sync immediately after
creating a replaceent buffer, and the replacement buffer's lifetime
will always be exceeded by the lifetime of the real buffer
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30107>
(cherry picked from commit 70b40fd2a0)
when tc replaces a buffer in subdata, it may subsequently perform subdata calls
on the replacement if it is forced to sync during map, e.g.,
* bind_vbo(dst)
* draw
* subdata(src)
* buffer replacement
* map
* tc sync
* replace_buffer(dst, src)
* memcpy <- broken
* draw
in this scenario, src may not have data at the time of replacement,
but it will get data soon after, and this buffer range is the real one
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30107>
(cherry picked from commit 76da22bfc2)
Commit f695a9fed2 moved the 64-bit float <-> 16-bit float conversion
splitting into a core NIR pass, so the code remaining here is only
needed for 64-bit integer types.
Presumably in an attempt to remove the float handling, it replaced
simple bit_size == 64 checks with this expression:
(full_type & (nir_type_int64 | nir_type_uint64))
I believe that the intended expression was:
(full_type == nir_type_int64 || full_type == nir_type_uint64)
Unfortunately, the former is incorrect. Any integer or unsigned
NIR type would trigger the former expression. For example:
nir_type_uint32 & (nir_type_int64 | nir_type_uint64) => nir_type_uint
This meant that we were splitting e.g. u2f16 on 32-bit unsigned types
into u2f32 and f2f16, when we can easily natively handle that case.
To fix this, we go back to simple bit_size == 64 checks. This pass is
already run after nir_lower_fp16_casts which will split the float case,
so we will never see it here.
fossil-db on Alchemist shows a -1.14% reduction in affected shaders for
google-meet-clvk shaders. In another ChromeOS workload, it improves
performance by around 8% on Meteorlake.
Thanks to Sushma Venkatesh Reddy for finding this performance issue!
Fixes: f695a9fed2 ("intel/compiler: use nir_lower_fp16_casts")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30091>
(cherry picked from commit 837c441acb)
`with_platform_windows` depends on the `platforms` option,
and is an inaccurate proxy check for `host_machine.system() == 'windows'`.
The ICD path and shared library name are dependent on the host system,
not whether or not Lavapipe is built with `WIN32` platform support.
In fact Lavapipe can be built with no platforms (`-Dplatforms=`)
if you only need headless (then this check would be incorrect)
fixes: e030ab5163
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29900>
(cherry picked from commit 94379377c4)
Previously we tried to map GPU resources directly wherever we could,
however this was always causing random issues and was overall not very
robust.
Now we just allocate a staging buffer on the host to copy into, with some
short-cut for host_ptr allocations.
Fixes the following tests across various drivers:
1Dbuffer tests (radeonsi, zink)
buffers map_read_* (zink)
multiple_device_context context_* (zink)
thread_dimensions quick_* (zink)
math_brute_force (zink)
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30082>
(cherry picked from commit 7b22bc617b)
[Eric: also drop `can_map_directly` and `has_user_shadow_buffer` as
they're now dead code, which was breaking the build]
Don't multiply by 4 twice. While we're here, fix the in-bounds check on
a6xx, since we need to account for the multiplying by 4. This was done
correctly on a7xx but the commit below didn't correctly port it to a6xx
when adding the multiply on a6xx.
Fixes: 01bac643f6 ("freedreno/ir3: Fix ldg/stg offset")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30046>
(cherry picked from commit 2c462fe9cc)
si_set_patch_vertices was only called if tcs.current was non-NULL but
this condition is not enough for GFX9+ since vs is used as ls.
Add a check in si_update_tess_io_layout_state instead, and set
sctx->do_update_shaders for case where the ls_current is not yet
available.
This fix crashes on GFX6.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29876>
(cherry picked from commit a7a1e3d329)
buffer_size is not the full buffer size, but the part of the
buffer that is accessed by the compute shader.
This fixes the assert hit in si_set_shader_buffer.
Fixes: 1a99f50c7f ("radeonsi: use a compute shader to convert unsupported indices format")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29876>
(cherry picked from commit 84a563cf6f)
We select the feedback tranche according to the image format but not the
actual format that we will use for presentation. This breaks direct
scanout in cases where the application selected a visual with an alpha
channel but using EGL_EXT_present_opaque which previously worked as
expected.
Fix this by selecting the feedback tranche according to the actual
presentation format.
Fixes: 9ea9a963aa ("egl/wayland: Fix EGL_EXT_present_opaque")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28153>
(cherry picked from commit e5e108706c)
Instead of trusting in nil::Image::align_B, force it to the sparse
binding size because we know we're going to try and sparse bind it.
Otherwise, small sparse images could fail to bind at the bind step.
Fixes: 7321d151a9 ("nvk: Add support for sparse images")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30033>
(cherry picked from commit 04bdbb71de)
This is an issue happening on radeonsi after the commit
6f8e6fb99c "mesa/st: use compute pbo download for readpixels".
This commit enables a new code path which has this memory leak.
For instance, this issue is triggered on radeonsi with
"piglit/bin/glsl-fs-raytrace-bug27060 -auto -fbo":
Too many leaks! Only the first 5000 leaks encountered will be reported.
Indirect leak of 327424 byte(s) in 5582 object(s) allocated from:
#0 0x7fe27fa4d7ef in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb17ef)
#1 0x7fe2717fd4df in ralloc_size ../src/util/ralloc.c:118
#2 0x7fe272dbbae4 in calc_dom_children ../src/compiler/nir/nir_dominance.c:138
#3 0x7fe272dbbae4 in nir_calc_dominance_impl ../src/compiler/nir/nir_dominance.c:192
#4 0x7fe272b7f4a2 in nir_metadata_require ../src/compiler/nir/nir_metadata.c:40
#5 0x7fe272ba0d50 in nir_opt_cse_impl ../src/compiler/nir/nir_opt_cse.c:43
#6 0x7fe272ba0d50 in nir_opt_cse ../src/compiler/nir/nir_opt_cse.c:67
#7 0x7fe272686f83 in gl_nir_opts ../src/compiler/glsl/gl_nir_linker.c:92
#8 0x7fe271a31e69 in create_conversion_shader ../src/mesa/state_tracker/st_pbo_compute.c:701
#9 0x7fe271a353fa in create_conversion_shader_async ../src/mesa/state_tracker/st_pbo_compute.c:810
#10 0x7fe271806ea8 in util_queue_thread_func ../src/util/u_queue.c:309
#11 0x7fe27186379a in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
#12 0x7fe27ea9a7c3 (/lib64/libc.so.6+0x867c3)
...
SUMMARY: AddressSanitizer: 1384704 byte(s) leaked in 17291 allocation(s).
Fixes: 5dab7673e1 ("mesa/st: add specialized pbo download shaders")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30068>
(cherry picked from commit f3c9ea9b8d)
These flags are set at the end of the job based on its buffer usage and
then checked by following jobs.
If an application toggles stencil and depth tests alternatigly in
a sequence of jobs while also relying on previous contents to render,
with the current assignment the reload flags for depth or stencil may
be cleared incorrectly and a depth/stencil buffer may not be properly
reloaded.
Cc: mesa-stable
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30065>
(cherry picked from commit aa4d0836fe)
The NVIDIA proprietary driver exposes a DRM device these days and this
can trip up NVK as it advertises an NVIDIA device id. We fail to
enumerate but the check for nouveau happens too late and we throw a
warning. This means tha if NVK is even installed side-by-side with the
proprietary driver, we spam warnings on every device enumeration. It's
better to fail silently.
Fixes: 83786bf1c9 ("nvk: add vulkan skeleton")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11441
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30035>
(cherry picked from commit 73ec9f0183)
This adds a magic number and the device name to the binary in order to
verify we indeed have a binary we can parse and matches the device.
Also save the binary header explicitly in little-endian order, so that we
at least make sure that's always the same.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29946>
(cherry picked from commit f08f770f16)
Coverity has spotted a place where we could in theory overflow. In
reality it wont happen as the potential overflow is a bitfield with a
maximum of two values. Add an `assume()` statement to help out the
compiler and document our assumption.
fixes: dc1aedef2b
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29825>
(cherry picked from commit dc604f340a)
812b3415 added rules for upcasts with comparisons with a variety of
types. The float & unsigned rules should be ok, but the signed integer rules are
unsound as currently implemented. This can cause end-to-end miscompiles.
I originally hit this issue while debugging a large real world OpenCL kernel. I
found the bug symptoms changed when disabling loop unrolling, which tipped me
off to a compiler bug. I've reduced it to a minimal test case. Imagine my
surprise when I find out the NIR my backend ingested was already constant folded
to be wrong.
In the minimal test case, during optimization we have NIR:
32 %6 = ....
64 %9 = i2i64 %6
64 %44 = load_const (0x0000000000000001)
1 %45 = ilt %9, %44 (0x1)
This is a simple check (int64_t)%6 < 1.
nir_opt_algebraic turns this into:
32 %6 = ...
64 %9 = i2i64 %6
64 %44 = load_const (0x0000000000000001)
64 %55 = load_const (0x0000000080000000 = 2147483648)
1 %56 = ilt %55 (0x80000000), %44 (0x1)
64 %57 = load_const (0x000000007fffffff = 2147483647)
1 %58 = ilt %57 (0x7fffffff), %44 (0x1)
32 %59 = i2i32 %44 (0x1)
1 %60 = ilt %6, %59
1 %61 = ior %58, %60
1 %62 = iand %56, %61
This pile of math constant-folds to an unconditional "false"! The problem is
%56. At first glance, INT32_MIN < 1 is true so %56 should be true. Indeed, it
should. But here's the kicker: both constants are 64-bit here, so the ilt
operation is a 64-bit comparison -- that left-hand side is INT32_MIN
zero-extended to 64-bit for the signed comparison at 64-bit. So in fact, it
evaluates to false, causing the whole expression to go false. If we're going to
do a 64-bit comparison for %56, then we need to sign-extend the bound. So we'll
just adjust the Python and be on our way, right?
Unfortunately the issue is deeper. According to the comment in the generated
nir_opt_algebraic.c file, the guilty algebraic rule is:
('ilt', ('i2i64', 'a@32'), '#b') =>
('iand', ('ilt', -2147483648, 'b'), ('ior', ('ilt', 2147483647, 'b'), ('ilt', 'a', ('i2i32', 'b'))))
From a Python perspective? That rule is correct. -2147483648 < 1 is a true
statement. Adjusting the Python rule is not the appropriate solution here, since
the issue is more fundamental and might affect other rules. The real problem is
the translation of that Python replacement tree into C, incorrectly
zero-extending -2147483648 into 0x0000000080000000 instead of sign-extending to
0xffffffff80000000.
Crawling down the rabbit hole of the generated algebraic file, we see the
constant encoded as:
{ .constant = {
{ nir_search_value_constant, 64 },
nir_type_int, { -0x80000000 /* -2147483648 */ },
} },
NIR correctly translates the negative constant to a C level negate operation of
its absolute value. This maps to the correct sign-extension...
...for all constants except for INT_MIN. Because that constant lacks a ULL
suffix, it is a 32-bit integer. And for this integer (only), negating it hits
signed integer overflow (UB!) and then we end up with an effective
zero-extension when going to 64-bit.
This patch fixes the end-to-end miscompile.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Closes: #11402
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29952>
(cherry picked from commit 270446ee21)
This is not safe when we have conditional spills since we could be
spilling disabled lanes with undefined values that could overwrite
valid data for those lanes from a previous spill of the same temp
that was unconditional (or that condionally enabled those same
lanes).
Fixes some Piglit OpenCL tests as well as the following OpenCL tests:
integer_divideAssign
integer_moduloAssign
integer_mad_sat
integer_ops integer_divideAssign
integer_ops integer_mad_sat
integer_ops integer_moduloAssign
integer_ops quick_char_math
integer_ops quick_short_math
math_brute_force half_powr
math_brute_force pow
math_brute_force pown
math_brute_force powr
math_brute_force rootn
Fixes: 597560e27c ('broadcom/compiler: always enable per-quad on spill operations')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29909>
(cherry picked from commit d1f8351f3c)
The multop instruction implicitly writes rtop which is not preserved
acrosss thread switches. We can spill the sources of the multop
(since these would happen before multop) and the destination of
umul24 (since that would happen after umul24).
Fixes some OpenCL tests when V3D_DEBUG=opt_compile_time is used to
choose a different compile configuration.
cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29909>
(cherry picked from commit 38b7f411a1)
In a scenario where two non-concurrent cmdbufs are submitted to the
compute queue and with the second one using DGCC, the driver would have
chained the CS of the first cmdbuf to the new IB created right after
the DGC IB is executed.
Found while working on DGC task shader with vkd3d-proton.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29913>
(cherry picked from commit fec9b56f17)
nir_opt_preamble sometimes adds useless expressions, in which case we
may have stc instructions and no corresponding use of the constant.
Things can go sideways when these aren't included in the constlen, so
far only observed when earlypreamble is enabled.
Fixes: ccc64b7e00 ("ir3: Plumb through store_uniform_ir3 intrinsic")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29903>
(cherry picked from commit c42f6597f9)
radv has a comment in radv_meta_dcc_retile.c:
* BPE is always 4 at the moment and the rest is derived from the tilemode.
radeonsi has in si_retile_dcc:
/* We have only 1 variant per bpp for now, so expect 32 bpp. */
assert(tex->surface.bpe == 4);
This fixes ext_image_dma_buf_import-modifiers for radeonsi.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29612>
(cherry picked from commit abd048124a)
We shouldn't use this extension at all if we're not using the HTML
builder. This should hopefully fix this issue a bit more fundamentally.
This caused issues when using the spelling extension, something I do
locally from time to time.
Fixes: f72033bb70 ("docs: add bootstrap extension")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29888>
(cherry picked from commit f4e7204e73)
This will prevent the dynamic loader to pick the wrong function once we
rename things to the proper API names.
In any case, this should have been done all along anyway.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29855>
(cherry picked from commit be090abf2e)
In this assert we want to enforce that if a cached buffer is created
it is a cached+coherent as Xe KMD don't support cached+incoherent.
Did not caught this issue because it only reproduces in platforms with
GPU outside of LLC.
Fixes: 9d8d5cf8c9 ("anv: Remove block promoting non CPU mapped bos to coherent")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29826>
(cherry picked from commit 73ce3143a8)
The intention of this block was to set one of the flags that is used
to select a PAT index but this was doing more than that.
It was promoting WB+0 way coherency BOs to WC+1 way coherency possibly
causing regression in platforms without LLC.
anv_device_get_pat_entry() return WC/writecombining if no flags is
set so we don't need this block after all.
Reported-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Fixes: a65e982b44 ("anv: Split ANV_BO_ALLOC_HOST_CACHED_COHERENT into two actual flags")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29769>
(cherry picked from commit 9d8d5cf8c9)
CTS tests both layered and separate DPB, but radv wasn't handling
layered properly when used with the tier 2 dpb handling.
This adjusts the addresses to use the layer index for tier2.
Fixes dEQP-VK.video.decode.*layered*
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29758>
(cherry picked from commit b888946f7a)
On newer devices where ZPASS_DONE events have sample count writing
abilities the firmware expects these events to come in begin-end pairs,
essentially corresponding to a typical occlusion query usage. Since this
event is also used in the autotuner we have to avoid event pairs to be
emitted in an interleaved fashion.
Additional renderpass state now tracks whether a given renderpass contains
an occlusion query. If so, autotuner will emit miscellaneous ZPASS_DONE
events in order to form its own begin-end pairs before and after the
renderpass commands.
Occlusion query behavior inside a renderpass doesn't change. But when used
outside of a renderpass, possible autotuner usage requires to again emit
ZPASS_DONE events that end up forming begin-end pairs of these events both
at the start and the end of the query.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 4e6a1f8852 ("tu/autotune: Use `CP_EVENT_WRITE7::ZPASS_DONE` on A7XX")
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29403>
(cherry picked from commit 5653c52151)
When both an interval and some of its children would be live-in, we used
to add phis for all of them. This could lead to cases where the pressure
after spilling was higher than before.
This happens, for example, when both a split and its parent are live-in.
Before spilling, the split wouldn't add to the pressure because its
parent had already been inserted. After spilling, since we created a phi
for the split, the link with its parent would be lost and it would add
to the pressure.
Fix this by only adding phis for top-level intervals and adding splits
after them.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
(cherry picked from commit 6bc7cd6108)
There were a few places that used an instruction pointer to decide where
new instructions should be created. NULL was used to add them at the end
of the block. While fixing a spilling bug, a new option was needed to
add instructions at the beginning of the block. This will be much easier
to implement using cursors.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
(cherry picked from commit 18cd803cef)
Whenever instructions need to be created at specific locations, ir3
often passes around an instruction pointer. When set, new instructions
are added before or after it (depending on the context). When NULL, new
instructions are added at the end of the block. This whole scheme is
confusing.
This patch adds ir3_cursor and ir3_builder structs and the associated
helper functions. The API mirrors the one from nir_cursor/nir_builder.
This patch does not refactor existing code to use the new API. This will
happen in future patches.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
(cherry picked from commit 1972db36c6)
ir3_force_merge (through merge_merge_sets) expects instructions to be
indexed. However, the instructions created during spilling would not be
automatically indexed at this point.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
(cherry picked from commit 7a5b198a44)
It might happen that a collect that cannot be coalesced with one of its
sources while spilling can be coalesced with it afterwards. In this
case, we might be able to remove it in remove_src_early during spilling
but not afterwards (because it may have a child interval). If this
happens, we could end up with a register pressure that is higher after
spilling than before. Prevent this by never removing collects early
while spilling.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
(cherry picked from commit 1bc3b819e6)
for drivers that don't support PIPE_CAP_SHAREABLE_SHADERS,
the zombie shader mechanism is used, storing shaders to delete after
the next flush
the zombie mechanism also calls bind_*_state(pipe, NULL) during deletion,
however, which breaks drivers in the following scenario:
* create_all_shaders(pipe_A)
* bind_vs(pipe_A, vs_A)
* bind_fs(pipe_A, fs_A)
* draw(pipe_A)
* makeCurrent(pipe_B)
* delete_vs(pipe_B, vs_B)
* vs_B must only be deleted on pipe_A
* zombie_shader_add(pipe_A, vs_B)
* makeCurrent(pipe_A)
* free_zombie_shaders(pipe_A)
* bind_vs(pipe_A, NULL)
* delete_vs(pipe_A, vs_B)
* draw(pipe_A)
* boom
the problem being that bind_vs(pipe_A, NULL) was called when deleting
vs_B, but it was actually vs_A which was bound
to solve this, just flag the shader state for updating and let st figure it out
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11122
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29680>
(cherry picked from commit 95828d8901)
The commit 973e6f3b implemented compaction of the stream-number space. The
functions `update_vertex_elements(_sw)` began using the post-compacted
stream-numbers/indices when maintaining the `stream_usage_mask` and
when reading from the arrays `vtxstride` and `stream_freq`.
But, the `stream_instancedata_mask`, with which the `stream_usage_mask`
is compared/bitwise-anded, maintains bits for the pre-compacted indices.
Additionally, the information within the arrays is stored using the
pre-compacted indices.
The functions have a disagreement, regarding the type (pre- vs post-
compacted) of indices, with the rest of the relevant source. This change
removes the disagreement by having them use pre-compacted indices when
maintaining the `stream_usage_mask` and when reading from the arrays.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11283
Fixes: 973e6f3b ("gallium: remove start_slot parameter from pipe_context::set_vertex_buffers")
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29704>
(cherry picked from commit 4bf330471b)
If the application binds graphics shaders and that RADV performs an
internal operation with a compute pipeline, no shader objects would be
restored because binding the internal compute pipeline resets the ESO
state.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29678>
(cherry picked from commit 07eb970d67)
It's possible if nouveau_ws_bo_destroy() races with
nouveau_ws_bo_from_dma_buf() for the BO to be found in the cache and
referenced between dropping the final reference and actually invoking
GEM_CLOSE. This would result in us having a closed BO somewhere in our
cache.
Fixes: c370260a8f ("nouveau/winsys: Add dma-buf import support")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29737>
(cherry picked from commit faaf33556e)
I've been seeing a bunch of read page faults at the end of the
shader allocation, nvk uses a full page at the end to overallocate
so align with that and see if it goes away.
ahulliet and skeggsb both said 2k was used.
Cc: mesa-stable
Reviewed-by: Arthur Huillet <ahuillet@nvidia.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29722>
(cherry picked from commit f7434d7576)
MOV_INDIRECT picks one lane from the src[0] and moves it to all lanes
in the destination. Even if we split the instruction, src[0] should
remain identical.
Noticed this while trying to use this instruction in SIMD32. All
current use cases are limited to SIMD8 shaders (or SIMD16 on Xe2). Or
maybe in SIMD32 but with a uniform src[0]. That's we think we've never
seen the issue so far.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28036>
(cherry picked from commit 13dc2a28ce)
if mesh and task shaders are bound separately, and if they have different
workgroup sizes, the setting of workgroup size will be broken if
set during shader bind
this must be deferred to draw time to pull the correct values
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29733>
(cherry picked from commit 2bb35bf489)
If we have an array of multi-planar descriptors, buffer_list was
incorrectly incremented and this could have overwritten some BO entries.
In practice, this situation should be very rare because most of the
applications enable the global BO list.
Cc: mesa-stable
Closes: #10559
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28816>
(cherry picked from commit 730ba8322f)
Fix compilation failure when image is embedded in struct when
GL_EXT_shader_image_load_formatted is enabled:
struct GpuPointShadow {
image2D RayTracedShadowMapImage;
};
layout(std140, binding = 2) uniform ShadowsUBO {
GpuPointShadow PointShadows[1];
} shadowsUBO;
Compile log:
error: image not qualified with `writeonly' must have a format layout qualifier
Fixes: 082d180a22 ("mesa, glsl: add support for EXT_shader_image_load_formatted")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29693>
(cherry picked from commit b1d0ecd00d)
The Vulkan spec says:
"If pColorAttachmentInputIndices is NULL, it is equivalent to setting
each element to its index within the array."
Fix updated dEQP-VK.dynamic_rendering.primary_cmd_buff.local_read.*.
v2: Fix it correctly (Samuel)
Fixes: 03490ec019 ("vulkan/runtime: rework VK_KHR_dynamic_rendering_local_read state tracking")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29703>
(cherry picked from commit 51f410f621)
According to PRMs:
"All parameters are of type IEEE_Float, except those in the The ld*,
resinfo, and the offu, offv of the gather4_po[_c] instruction message
types, which are of type signed integer."
Currently, we load parameters with the correct types, but use them as send
sources with the default float type, which may confuse passes downstream.
Fix this by actually storing the retyped sources.
Cc: mesa-stable
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29581>
(cherry picked from commit 5ca51156e2)
This AUX-TT is only updated on the CPU since ee6e2bc4a3 ("anv: Place
images into the aux-map when safe to do so"). So the only really
important invalidation that needs to happens is on the beginning of a
primary command buffer.
We are required to idle the pipes prior invalidation the AUX-TT. This
might not be happening when the invalidation is put at the beginning
of the secondary command buffers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29671>
(cherry picked from commit 1851629407)
This change updates the affected calls to the proper function
which is radeon_set_config_reg().
For instance, this issue is triggered with
"piglit/bin/textureSize tes isampler2DMSArray -auto -fbo":
vertex-program-two-side: ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:4981: void si_emit_spi_ge_ring_state(si_context*, unsigned int): Assertion `(0x008988) >= CIK_UCONFIG_REG_OFFSET && (0x008988) < CIK_UCONFIG_REG_END' failed.
Fixes: bd71d62b8f ("radeonsi: program tessellation rings right before draws")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29645>
(cherry picked from commit 301a3bacce)
7278 is the chip on the rpi3, while the rpi4 that made it to market has
the 2711 chip.
When this was introduced (82bf1979), the rpi4 was probably still in
flux, which is why the rpi3 chip was put there (and v3d doesn't care
about that, but v3dv does).
cc: mesa-stable
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29584>
(cherry picked from commit 46247b3827)
It was set to "always run" for amd common files changes when I obviously
meant for it to be manual and messed up my copy/paste when I wrote that.
Fixes: ebaede788e ("amd/ci: limit radv jobs to radv + aco files changes")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29550>
(cherry picked from commit 47bd1cff4b)
- mesa: check for enabled extensions for \*UID enums
- zink: lower 64 bit find_lsb, ufind_msb and bit_count
- zink: lower 8/16 bit alu ops vk spirv doesn't allow
- rusticl/kernel: properly respect device thread limits per dimension
- rusticl/memory: Fix memory unmaps after rework
- rusticl/image: take pitches into account when allocating memory for maps
- rusticl/image: properly sync mappings content for 1Dbuffer images
- rusticl/queue: add clSetCommandQueueProperty
- util/u_printf: do not double print format string with unused arugments
- rusticl/memory: fix sampler argument size check
Konstantin Seurer (1):
- aco: print s_delay_alu INSTSKIP>3 correctly
Lionel Landwerlin (6):
- anv: fix check on pipeline mode to track buffer writes
- vulkan/runtime: allow null/empty debug names
- anv: reuse object string for RMV token
- anv: add missing MEDIA_STATE_FLUSH for internal shaders
- anv/blorp: force CC_VIEWPORT reallocation when programming 3DSTATE_VIEWPORT_STATE_POINTERS_CC
- brw/rt: fix ray_object_(direction|origin) for closest-hit shaders
Marek Olšák (2):
- nir/opt_algebraic: use fmulz for fpow lowering to fix incorrect rendering
- radeonsi: fix buffer coherency issues on gfx6-8,12 due to missing PFP->ME sync
Matt Turner (2):
- util: Add ATTRIBUTE_OPTIMIZE(flags)
- util: Force emission of stack frame in stack unit test
Mike Blumenkrantz (7):
- dri: link with libloader
- kopper: check swapchain size after possible loader image resize
- pipe-loader: fix driconf memory management
- egl: fix zink init
- dri: fix kms_swrast screen fail
- egl/wayland: bail on zink init in non-sw mode if extension check fails
- zink: fix partial update handling
Pavel Ondračka (1):
- r300: bias presubtract fix
Rhys Perry (1):
- docs: update ACO_DEBUG documentation for scheduler options
Rob Clark (2):
- tu: Fix issues with 16k (or larger) page sizes
- freedreno/drm/virtio: Fix issues with 16k (or larger) page sizes
Sil Vilerino (1):
- Revert "d3d12: Video Encode - Remove PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE as not supported" This reverts commit d6bb4ddc638f3ee37fbbe066c631dad80aaeb2d3. Fixes: d6bb4ddc638 ("d3d12: Video Encode - Remove PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE as not supported")
Tapani Pälli (1):
- anv: fix a cmd_buffer reference in simple shader
Timothy Arceri (3):
- nir: set disallow_undef_to_nan for legacy ARB asm programs
- glsl: fix glsl to nir support for lower precision builtins
- glsl: always copy bindless sampler packing constructors to a temp
Valentine Burley (2):
- vulkan/wsi: Refactor can_present_on_device
- tu: Always report that we can present on kgsl
WANG Xuerui (1):
- meson: Additionally probe -mtls-dialect=desc for TLSDESC support
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.