Until now the RD dumps were stored in files on a per-device basis, using
the device index but assuming only one Vulkan instance is active. With
multiple active instances, different devices separated across those
instances could end up storing RD dumps into files with the same name.
tu_instance struct now has an index member variable that's assigned upon
creation with an incrementally-increasing global counter value. RD dump
output name now also contains this instance index, avoiding the described
naming collisions.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: f9c4e25483 ("freedreno: add fd_rd_output facilities for gzip-compressed RD dumps")
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30977>
(cherry picked from commit 4c359eae01)
In commit fe3d90aedf ("intel/fs/xe2+: Fix calculation of spill message
width for Xe2 regs.") we aligned the width of scratch messages to
physical register sizes (32B prior to Xe2, 64B for Xe2+).
But our spilling offsets are computed using the register allocations
sizes which are in units of 32B. That means on Xe2, you can end up
spilling a virtual register allocated at 32B (which we use for surface
state computations with exec_all) and then the spilling of that
register will be emitted in SIMD16, having the upper 8 lanes
overwriting the next spilled register.
We could potentially limit spills to SIMD8 messages on Xe2 (only
writing 32B of data), but we're also unlikely to have all 32B virtual
register spilled next to one another. And if not tightly packed, we
would have 64B registers stored on 2 different cachelines which sounds
inefficient.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: fe3d90aedf ("intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs.")
Backport-to: 24.2
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30983>
(cherry picked from commit aa494cbacf)
The mutex needs to be locked before accessing the handle table.
After 64ca0fd2f2 ("frontends/va: Allocate surface buffers on demand")
the issue is now much more likely to happen and can be reproduced when
transcoding using ffmpeg.
Cc: mesa-stable
Reviewed-By: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30935>
(cherry picked from commit fccf31c231)
Conflicts:
src/gallium/frontends/va/image.c
The bug report has a sequence that looks like this:
* set tex as framebuffer
* dispatch a compute shader that doesn't use tex
* dispatch a compute shader that uses it
Since we were updating the counters at step 2, step 3 failed to realize
that calling si_make_CB_shader_coherent was needed.
While at it, this commit splits the draw call tracking counter in 2: one
for CB, one for DB.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11638
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30591>
(cherry picked from commit bfcee149ed)
This reverts 2fc396ae75 ("intel/dev: Disable LNL PCI IDs on Mesa 24.2
(require INTEL_FORCE_PROBE)") from the 24.2 branch.
Mesa's 2fc396ae75 was needed because, although we were compatible
with Linux 6.11, this kernel change which will appear in Linux 6.12
would break our previous LNL support, causing the driver to fail to
load:
7108b4a589cd ("drm/xe/uapi: Expose SIMD16 EU mask in topology query")
Since d8b5ee8d65 ("intel/dev: Support new topology type with SIMD16
EUs"), this issue was resolved, but we were told we had to wait for
the kernel to remove force_probe before they would guarantee backwards
compatibility with uapi changes.
Since drm-next 6d0ebb390485 ("Merge tag 'drm-intel-next-2024-08-29' of
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next"), the
upstream drm kernel has now removed force_probe for LNL. Ref:
9c57bc08652a ("drm/xe/lnl: Drop force_probe requirement")
Tested with upstream drm kernel:
commit 6d0ebb3904853d18eeec7af5e8b4ca351b6f9025
Merge: 8bdb468dd7a5 b5d4657e192b
Author: Dave Airlie <airlied@redhat.com>
Merge tag 'drm-intel-next-2024-08-29' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30959>
Don't claim to support extendedDynamicState3SampleLocationsEnable on pre-A650 GPUs,
which can't advertise VK_EXT_sample_locations.
Fixes dEQP-VK.info.device_mandatory_features on A6xx Gen 1 and Gen 2.
Fixes: 84726da2f4 ("tu: Implement extendedDynamicState3SampleLocationsEnable")
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30730>
(cherry picked from commit 98d52cf292)
The storage allocated was always the same for both the ordinary texture
result data as well as the residency info. However, the former can be
float vector, whereas the latter is always int vector.
At least some llvm versions/builds will assert on this mismatch when
storing the data.
While here, also cut unnecessary zero initialization (lp_build_alloca()
already explicitly does this).
Fixes: 6168317b84 (lavapipe: Implement shaderResourceResidency)
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Brian Paul <brian.paul@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30878>
(cherry picked from commit 9b717596b2)
We put minSampleShading in the nvk_shader and [de]serialize that to/from
the binary so it also needs to go in the hash. We could also plumb the
pipeline state through to the deserialize callback but that's quite a
stretch and this literally only affects minSampleShading which is a
rarely used feature.
Fixes: 813b253939 ("nvk: Switch to shader objects")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30914>
(cherry picked from commit 5f402f3aae)
Setting it to the same value as (or higher than) the job timeout
effectively bypasses the safety mechanism.
Let's change it to `job timeout - 5min`.
Fixes: f39ffc6911 ("ci/etnaviv: Get the gc2000_piglit manual job mostly working.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30800>
(cherry picked from commit b978d3eb54)
It was set to 170min, which made sense when the job timeout was 3h, but
then 4bb564f40d ("broadcom/ci: add more jobs to test with rpi5")
lowered the job timeout to 2h without lowering the test timeout to match.
Fixes: 4bb564f40d ("broadcom/ci: add more jobs to test with rpi5")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30800>
(cherry picked from commit aac9c74a83)
This function never expands a type - it only narrows it. As such, we
don't need to ever sign extend to fill additional new bits. I think
this code was left over from earlier versions of my optimization pass
that was buggy and trying to handle cases it should not have.
Fixes: 580e1c592d ("intel/brw: Introduce a new SSA-based copy propagation pass")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30884>
(cherry picked from commit 51c85e0363)
This should be done after reads are checked and
sgpr_read_by_valu_as_lanemask_then_wr_by_salu is reset. The old version
also skipped checking the reads if the write check passed.
fossil-db (navi31):
Totals from 193 (0.24% of 79395) affected shaders:
Instrs: 3212435 -> 3212735 (+0.01%)
CodeSize: 16462868 -> 16463848 (+0.01%); split: -0.00%, +0.01%
Latency: 19492377 -> 19492462 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 4419705 -> 4419718 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Backport-to: 24.1
Backport-to: 24.2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30818>
(cherry picked from commit 61e73c2323)
We observed that Venus on ANV on JSL platform has some cacheline flush
issue. The overflush shows up as:
1. There're 2 threads venus bliting the feedback buffers suballocated
from the same backing device memory, back to back.
2. On thread A, flushing the feedback buffer for cpu read is placed
behind flushing a shader storage buffer for cpu read.
3. On thread B, flushing a different feedback buffer with the same
backing device memory (different offset bound to) can kick the
feedback buffer flush in (2) earlier than it should be flushed.
4. As a result, CPU polling thread for thread B results would see venus
feedback buffer update earlier than shader storage buffer results
being updated, breaking Venus sync primitives optimization.
During investigation, a solid workaround for JSL platform is to force
Venus to align up to 128 bytes for feedback buffer suballocation while
the default is at 64 bytes.
Cc: mesa-stable
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30879>
(cherry picked from commit 7941d705c3)
If p_atomic_cmpxchg doesn't set the ray_query_shadow_bos[bucket] to new_bo
allocated by this thread, it returns the bucket BO allocated by the other
thread and we use it. But due to a mistake, we also release that BO, not
the candidate just allocated by this thread and never used again.
Fixes: 5d3e4193 ("anv: enable ray queries")
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30581>
(cherry picked from commit 1904fe1186)
We can't exceed c_int::MAX, because the CTS casts to ints in a few places.
We also need to take into account max pixel size when restricting to
max_mem_alloc as this cap is pixel based, not byte based.
Cc: mesa-stable
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30739>
(cherry picked from commit eef1af8128)
vkcube, vkgears and vkmark are crashing with the following error/segfault:
$ NVK_I_WANT_A_BROKEN_VULKAN_DRIVER=1 vkcube
WARNING: NVK is not a conformant Vulkan implementation, testing use only.
Selected GPU 0: GeForce GT 640 (NVK GK107), type: DiscreteGpu
ERROR: couldn't get DataFile for op ldc_nv
Segmentation fault (core dumped)
Handling nir_intrinsic_ldc_nv as per nir_intrinsic_load_ubo in Converter::getFile()
allows to run vkcube, vkgears and vkmark on Nvidia GT640
Fixes: dc99d9b2 ("nvk,nak: Switch to nir_intrinsic_ldc_nv")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30832>
(cherry picked from commit 7b32df696e)
Make sure to preserve the depth or stencil components of D24S8 using the
fixed codepath just added. While we're here, fix the detection of
whether an attachment is bound.
Fixes: cb0f414b ("tu: Add support for suspending and resuming renderpasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26154>
(cherry picked from commit 812c8f6abe)
We need to make sure that we don't trash a passthrough depth/stencil
aspect if we need to store the whole attachment by loading it
beforehand.
Fixes: cb0f414b ("tu: Add support for suspending and resuming renderpasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26154>
(cherry picked from commit 5377219ca0)
At least one ray tracing shader in cp2077 is over 4MB on Xe2. There
isn't a memory pool large enough for the allocation, so the driver
crashes instead. This commit adds 8MB and 16MB pools.
I intend this as a stop gap fix. I would prefer to figure out why this
shader is so much larger than on previous platforms. The shader in
question has 3824 spills and 8625 fills. That is not good. I suspect
dealing with that will also solve the problem, but that will require a
bit more time.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11739
Suggested-by: Lionel Landwerlin
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30751>
(cherry picked from commit 09cf9fe8ab)
We were currently emitting logical atomic instructions with a packed
destination region for sub-dword LSC atomics, along the lines of:
> untyped_atomic_logical(32) dst<1>:HF, ...
However, these instructions use an LSC data size D16U32, which means
that the 16b data on the return payload is expanded to 32b by the LSC
shared function, so we were lying to the compiler about the location
of the individual channels on the return payload, its execution
masking, etc. This is why the hacks that manually set the
'inst->size_written' of the instruction were required.
In some cases this worked, but any non-trivial manipulation of the
instruction destination by lowering or optimization passes could have
led to corruption, as has been reproduced in deqp-vk during
lower_simd_width() for shaders that use 16-bit atomics in SIMD32
dispatch mode.
Note that LSC sub-dword reads aren't affected by this because they use
raw UD destinations and specify the actual bit size of the operation
datatype as the immediate SURFACE_LOGICAL_SRC_IMM_ARG, which doesn't
work for atomic operations since that immediate specifies the atomic
opcode.
Instead, have the logical operation implement the behavior of 16-bit
destinations correctly instead of silently replacing the 16-bit region
with an inconsistent 32-bit region -- This is done by emitting the MOV
instructions used to pack the data from the UD temporary into the
packed destination from the lower_logical_sends() pass instead of from
the NIR translation pass.
Fixes: 43169dbbe5 ("intel/compiler: Support 16 bit float ops")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30683>
(cherry picked from commit 71ca8529c5)
this pbo shader works by iterating over the framebuffer size
and storing a value to an offset for each source pixel. if the
number of pixels being written out does not correspond to fragcoord
to the extent that certain source pixels are not written at all, however,
then this method should not be used in order to avoid giving broken results
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30689>
(cherry picked from commit c2dcecffc5)
SBID SET can only be used on SEND, SENDC, or DPAS instructions. The
existing code was handling SET for SEND/SENDC, but was using the wrong
encoding for DPAS. Add a new case to handle that and make it clear that
the existing code is only for SEND/SENDC.
While here, rewrite the encoder to use 2-bit binary immediates shifted
up into the mode [9:8] field, rather than pre-shifted hex values. This
matches the documentation better and is a little easier to follow.
On the decode side, we were incorrectly decoding MATH instructions.
Because they're marked is_unordered, we were hitting the SEND/SENDC
decoding, which is incorrect for MATH.
Fixes 22 cooperative matrix tests on Lunar Lake.
Huge thanks to Paulo Zanoni for bisecting failures to one of my commits,
then analyzing shaders and experimenting to discover that the failure
was really an unrelated bug, just being provoked by different choices of
registers. His work narrowing the problem down made it much easier to
discover and fix this bug.
Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>
(cherry picked from commit d22d6d814d)
This commit solves the shortage-problem at the blit-functions by
checking the number of fence-registers after updating the batch.
If too many registers are used,
the batch-entries and relocs for the current blit function are
removed by setting batch->ptr and reloc_count to value before
the blit call and calling drm_intel_gem_bo_clear_relocs.
This truncated batch is flushed,
and the batch is updated again for the current blit function.
Cc: mesa-stable
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26769>
(cherry picked from commit ed2123158d)
out_args->scratch_offset and in_wg_id_x will alias on <gfx9.
To avoid the conversion code reading a garbage WG ID, move the
scratch/ring offset writing to the very end.
Fixes: 1e354172 ("radv,aco: Convert 1D ray launches to 2D")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30707>
(cherry picked from commit bd525f4282)
Make sure to take new GRF size into consideration and adjust the
indirect offset according to new size so that when we do the indirect
load with address register, we load right values.
This helps pass the following tests:
- dEQP-VK.binding_model.descriptor_buffer.mutable_descriptor.*geom*
- dEQP-VK.ray_query.*geometry_shader.*
Backport-to: 24.2
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30679>
(cherry picked from commit c4f2a8d984)
Similar to commit 6cef804067 ("nir/opt_if: fix opt_if_merge when
destination branch has a jump"), we shouldn't combine if statements when
the second if-then-else has a block that ends in a jump.
This fixes a case where opt_if_merge combines
if (cond) {
[then-block-1]
} else {
[else-block-1]
}
if (cond) {
[then-block-2]
} else {
[else-block-2]
}
where `then-block-2` or `else-block-2` ends in a jump. The phi nodes
following the control flow will be incorrectly updated to have an input
from a block that is not a predecessor.
Fixes: 4d3f6cb973 ("nir: merge some basic consecutive ifs")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30629>
(cherry picked from commit d2e6be94ae)
Before we were using direct CP_LOAD_STATE, which is broken with multiple
back-to-back draws. This caused regressions in some DX11 traces when
enabling early preamble. We still need to use indirect CP_LOAD_STATE for
VS params, which are sometimes written by the CP, however for everything
else we should use the new UBO path instead.
Fixes: 76e417ca59 ("turnip,ir3/a750: Implement consts loading via preamble")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30675>
(cherry picked from commit 850f2aab03)
It's one header dword and 5 payload dwords. This was papered over by us
not actually using the UBO path for one of the loads, but that's changed
in the next commit.
Fixes: 76e417ca59 ("turnip,ir3/a750: Implement consts loading via preamble")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30675>
(cherry picked from commit 4f2b5442a6)
After spilling during regular RA, merge sets need to be fixed up. To
find all merge sets, fixup_merge_sets used ra_foreach_dst. However,
after shared RA has run, shared dsts wouldn't have the IR3_REG_SSA flag
set anymore leaving their merge sets lingering. This patch fixes this by
using foreach_dst instead.
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
(cherry picked from commit 28d2a27030)
The preferred register for merge sets was not updated after allocating
one. This caused a new merge set to be allocated for every register it
contains. This patch fixes this by reusing the update function from the
standard RA.
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
(cherry picked from commit 9013e11d8c)
Bspec 57508: Structure_SIMD16TraceRayMessage:: RayQuery Enable
"When this bit is set in the header, Trace Ray Message behaves like a
Ray Query. This message requires a write-back message indicating
RayQuery for all valid Rays (SIMD lanes) have completed."
If we don't pass the write-back register, somehow it was stepping on
over R0 register and can mess up the scratch space accesses which could
potentially lead to GPU hang. It can be noticed while running it under
simulator trace.
send.rta (16|M0) null r124 r126:1 0x0 0x02000100 {$15} // wr:1+1, rd:0; simd16 trace ray
R0 = 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>
(cherry picked from commit c3c62e493f)
Previously only `-mtls-dialect=gnu2` was probed, which was appropriate
for arm, x86 and x86_64, but not for newer architectures such as
aarch64, loongarch64 and riscv64 which all use `-mtls-dialect=desc`
instead. Because the driver option is not consistent across
architectures (and probably will not), try both variants and choose the
first one working.
While at it, rename "gnu2_*" variables to "tlsdesc_*" respectively, for
clarity.
Cc: mesa-stable
Reviewed-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Yukari Chiba <i@0x7f.cc>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30599>
(cherry picked from commit cc2dbb8ea5)
Although the ORCJIT codepath is fresh and relatively less tested, this
is still better than no llvmpipe at all for those newer architectures
that will not gain MCJIT support, such as LoongArch or RISC-V.
Fixes: 6f02ec5ed1 ("llvmpipe: add an implementation with llvm orcjit")
Reviewed-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Yukari Chiba <i@0x7f.cc>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30599>
(cherry picked from commit 56f38672a2)
This is unneeded in some environments, like ChromeOS and Android. And
for CrOS it specifically causes problems with the gpu sandbox rules.. we
don't want to have to update the sandbox rules for each new mesa
version.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30579>
(cherry picked from commit 19ff16387a)
The ownership of the TargetMachine object is released when LPJit
singleton is constructed, leads to a slight memory loss detectable.
Keep the ownership by saving the unique pointer as another class member
named tm_unique.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3423e73cec)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30645>
The `capture_not_overwritten` unit test captures and compares two
backtraces -- one from inside a call to `func_c` and one outside -- and
confirms that they are not identical. That is, that `func_c` is in the
backtrace.
On 32-bit x86, without `-fno-omit-frame-pointer`, the function will not
emit a stack frame. As a result, the unit test fails.
The fix is to compile `func_c` with the flag `-fno-omit-frame-pointer`
to prevent the compiler from optimizing out the stack frame which is
otherwise unneeded.
Bug: https://bugs.gentoo.org/823774
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4091
Fixes: d0d14f3f64 ("util: Add unit test for stack backtrace caputure")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30622>
(cherry picked from commit 05dc4eb536)
the CL CTS added a new test being printf("\n", "foo"), but we ended up
printing the new line twice. If we can't find a specifier anymore, ignore
the argument as after the loop processing all arguments we'll print the
remaining format string anyway.
Cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30574>
(cherry picked from commit 4080269845)
Instead of aligning offsets, we just align the size every time we query
it. This simplifies our offset and size calculations later since we can
always just add up descriptor buffer sizes and know that we'll be okay.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: 7ab5c5d36d ("zink: use EXT_descriptor_buffer with ZINK_DESCRIPTORS=db")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30580>
(cherry picked from commit 0f8f407e57)
There are two problems.
1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a
different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'.
2. Ignoring the first problem, this only produces the desired flags
for LE and G. All other cases can produce the wrong result.
shader-db:
All Intel platforms had similar results. (Broadwell shown)
total instructions in shared programs: 18282314 -> 18282316 (<.01%)
instructions in affected programs: 78 -> 80 (2.56%)
helped: 0
HURT: 2
total cycles in shared programs: 952924234 -> 952924252 (<.01%)
cycles in affected programs: 584 -> 602 (3.08%)
helped: 0
HURT: 2
Fixes: e6022281f2 ("intel/elk: Rename files to use elk prefix")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
(cherry picked from commit 9125b7c1b4)
There are two problems.
1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a
different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'.
2. Ignoring the first problem, this only produces the desired flags
for LE and G. All other cases can produce the wrong result.
For example, batman_arkham_city_goty.foz 6a63c4caacaa0dae has the
following code:
mad.ge.f0.0(8) g51<1>F g50<8,8,1>F g46<8,8,1>F g11<1,1,1>F
mov.sat(8) g52<1>F g51<1,1,0>F
...
(+f0.0) sel(8) g54<1>UD g53<8,8,1>UD 0x3f000000UD
Without this commit, the saturate is incorrectly propagated to the MAD.
A similar case exists in witcher_3_dxvk_g2.foz 5b03243be667a275.
There are even worse cases like total_war_warhammer3.dx12vk-g6.foz
78328466761ef7ab and ee920491573860fc. The former has the following
code (and the latter has very similar code):
mad.l.f0.0(16) g95<1>F g93<8,8,1>F g62<8,8,1>F g68<1,1,1>F
...
mov.sat(16) g109<1>F -g95<1,1,0>F
...
(+f0.0) sel(16) g68<1>UD g111<1,1,0>UD g54<1,1,0>UD
(+f0.0) sel(16) g70<1>UD g113<1,1,0>UD g56<1,1,0>UD
(+f0.0) sel(16) g72<1>UD g115<1,1,0>UD g58<1,1,0>UD
Saturate propagation makes a hash of this code:
mad.sat.l.f0.0(16) g106<1>F -g93<8,8,1>F -g62<8,8,1>F g68<1,1,1>F
...
(+f0.0) sel(16) g70<1>UD g110<1,1,0>UD g56<1,1,0>UD
(+f0.0) sel(16) g72<1>UD g112<1,1,0>UD g58<1,1,0>UD
(+f0.0) sel(16) g68<1>UD g108<1,1,0>UD g54<1,1,0>UD
Not only is the saturate incorrectly applied to the MAD, but the MAD
result is negated without changing the conditional modifier to G!
NOTE: Backports of this commit to stable branches may need to be more
like the following commit to elk.
shader-db:
All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19729375 -> 19729377 (<.01%)
instructions in affected programs: 112 -> 114 (1.79%)
helped: 0
HURT: 2
total cycles in shared programs: 916234266 -> 916234288 (<.01%)
cycles in affected programs: 636 -> 658 (3.46%)
helped: 0
HURT: 2
fossil-db:
All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 151531594 -> 151531601 (+0.00%)
Cycle count: 17209107419 -> 17209107474 (+0.00%); split: -0.00%, +0.00%
Totals from 6 (0.00% of 630198) affected shaders:
Instrs: 4550 -> 4557 (+0.15%)
Cycle count: 194629 -> 194684 (+0.03%); split: -0.00%, +0.03%
Fixes: 947c828d5c ("i965/fs: Add a saturation propagation optimization pass.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
(cherry picked from commit 3d8fea0e09)
Using ior here is equivalent to using uadd_sat, but works for every driver
and shouldn't hurt anywhere.
I forgot to fix this up when fixing up some vvl errors with zink.
Fixes crashes with the integer_ctz CL CTS tests in zink.
Fixes: 39ec184db6 ("zink: lower 64 bit find_lsb, ufind_msb and bit_count")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30535>
(cherry picked from commit 48acf9d358)
* close(fd) requires also resetting the fd=-1 or else boom
* checking just driver_name is broken because loader_get_driver_for_fd()
uses MESA_LOADER_DRIVER_OVERRIDE, so there's no way to differentiate
an inferred load
Fixes: b907eb4750 ("egl: don't bind zink under dri2/3")
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30556>
(cherry picked from commit 1a579552af)
The CL CTS started to call this API, luckily we don't have to actually
implement it, because we don't intent to support CL 1.0 only devices in
the first place (probably).
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30575>
(cherry picked from commit cd2dc4f70c)
For some purposes (e.g. advanced blending) we need a non-zero alpha
value returned from reads. This is only guaranteed on Bifrost if
we explicitly request RGB1 component ordering. The default is to use
RGBA component ordering, which for R5G6B5 causes 0 to be read for
alpha.
A complication is that the Mali fixed function hardware requires
four components (which implies RGBA rather than RGB1). If fixed
function blending is in use, we modify the pixel format back to
RGBA when building the blend descriptor.
Cc: mesa-stable
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29606>
(cherry picked from commit 004e0eb3ab)
We have to swizzle the border color in order to offset the
automatic swizzling introduced to compensate for limited
component order support in AFBC/AFRC. However, the border color
format is only available if the `TEXTURE_BORDER_COLOR_QUIRK` is
enabled, so set that for v10 (it was already set for v7).
While testing, we uncovered another issue: valhall introduces a
swizzle for depth+stencil formats that isn't present for bifrost, and
also isn't needed (or wanted) for the border color. So ignore the
border color swizzle for depth+stencil on valhall (on bifrost the
swizzle is a no-op anyway).
Fixes: 87aad0a5e4 ("panfrost: encode component order as an inverted swizzle (v10)")
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30542>
(cherry picked from commit 3135f76331)
Before the patch, intel_device_info_get_max_preferred_slm_size()
returns values in kilobytes, but then
intel_device_info_get_max_slm_size() is multiplying it by 1024.
As a result, LNL is reporting maxComputeSharedMemorySize to be
134217728, which is 128mb.
Fix this by making intel_device_info_get_max_slm_size() not multiply
it by 1024.
This should fix at least the following dEQP tests:
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.1
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.128
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.16
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.2
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.4
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.64
Some tests were failing with:
deqp-vk: ../../src/intel/common/intel_compute_slm.c:24: slm_encode_lookup: Assertion `kbytes <= table[table_len - 1].size_in_kb' failed.
while other tests were triggering the OOM.
v2:
- Make everybody return sizes in bytes (José).
v3:
- Rename variable to bytes (José, Jordan).
Fixes: fd368f5521 ("anv: Set maxComputeSharedMemorySize value for Xe2 platforms")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30541>
(cherry picked from commit 0e38b794e2)
This reverts commit d6bb4ddc63.
Fixes: d6bb4ddc63 ("d3d12: Video Encode - Remove PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE as not supported")
PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE is necessary for some scenarios like the example below
described in https://github.com/microsoft/WSL/issues/11838
gst-launch-1.0 -v videotestsrc num-buffers=250 !
video/x-raw,width=1920,height=1200 !
vaapipostproc !
vaapih264enc !
filesink location=~/wsl_test.h264
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30548>
(cherry picked from commit a0f1a708c4)
This is more correct than the previous code and the CL CTS relies on edge
case behavior here, e.g. for 1Dbuffer images.
I think part of that is not actually required by the spec, but whatever.
Fixes: 7b22bc617b ("rusticl/memory: complete rework on how mapping is implemented")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30528>
(cherry picked from commit 2484331e82)
An application could map and unmap a host ptr allocation multiple times,
but because how the refcounting works, we might never ended up syncing the
written data to the mapped region.
This moves the refcounting out of the event processing.
Fixes: 7b22bc617b ("rusticl/memory: complete rework on how mapping is implemented")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30528>
(cherry picked from commit 1fa288b224)
If TraceRay() is called with the TerminateOnFirstHit flag, we need to
terminate the ray on the first confirmed intersection. This is handled
by the lowering of accept_ray_intersection and it's working fine for the
case of multiple instances of the intersection shader being called.
But if the shader calls reportIntersection() more than once, we were
handling them all and accepting the closest one regardless of the flag.
Check for the flag on every confirmed intersection and, if set, accept
it right there. The subsequent lowering will take care of terminating
handling the ray termination if necessary.
Fixes new test dEQP-VK.ray_tracing_pipeline.amber.flags-accept-first
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30418>
(cherry picked from commit f8553f56ac)
this had a number of issues:
* pipe_loader_get_driinfo_xml() frees driver_driconf immediately,
except the driOptionCache struct string pointers are all just copied
in merge_driconf instead of having the strings copied, which means any
subsequent access of driver_driconf strings is invalid access
* pipe_loader_drm_get_driconf_by_name() is a disaster that only happened
to work because the dlopen here is the same lib that gets opened elsewhere
by mesa and not closed. if the lib here is actually closed, then all
the statically allocated strings become invalid, which means they need to
be manually copied
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30494>
(cherry picked from commit 0c220741e6)
Consolidate the various uses of CP_EVENT_WRITE into helpers, and use use
fd_gpu_event to manage the differences between a6xx and a7xx. This is a
bit churny as it spreading a fair bit of the CHIP template param around.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30304>
(cherry picked from commit 1f41d59059)
Previously this check would create a mask of the bytes used in the
grf, and then shift the mask. This worked well when there was 32 bytes
in the register because a 64-bit uint64_t could easily detect that
bytes were used in the next regiter. (The next register was the high
32-bits of the `access_mask` variable.)
With Xe2, the register size becomes 64 bytes, meaning this strategy
doesn't work. Instead of a mask, we can just check to see if more than
1 grfs are used during each loop iteration. (Suggested by Ken.) This
will make it easier to extend for Xe2 in a follow on commit.
Verified this with
dEQP-VK.subgroups.arithmetic.compute.subgroupexclusivemul_u64vec4_requiredsubgroupsize
on Xe2, which otherwise would cause the program to fail to validate
because it assumed a grf was 32 bytes.
Backport-to: 24.2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>
(cherry picked from commit f2800deacb)
previously the size was checked at the top of the function, but this
ignored cases where the loader might end up resizing the drawable,
resulting in an attempted 0x0 swapchain creation based on stale
geometry
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30487>
(cherry picked from commit a6d97b0afe)
The user is allowed to pass a list of modifiers including
DRM_FORMAT_MOD_INVALID, meaning that the user is OK with implicit
modifiers. Since merging the DRI interfaces, this assert that we are
never returning an implicit modifier is unnecessary and also wrong. It
was originally added to be super-safe, but we now know that our drivers
work very well with modifiers, so don't need it.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Fixes: 0b16d7ebb9 ("dri: Allow INVALID for modifier-less drivers")
Closes: mesa/mesa#11591
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30464>
(cherry picked from commit cd961a7e3f)
The iova allocations need to be CPU page aligned. (The GPU itself
always supports 4k mappings regardless of the smallest CPU page size,
but GEM buffer allocations must be an integer number of CPU pages.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Fixes: 63904240f2 ("tu: Re-enable bufferDeviceAddressCaptureReplay")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30431>
(cherry picked from commit 7fe3529715)
The condition of flat ccs and vram_only checker causes different
aux usage at binding stage. The current design is reusing CCS_E
on Xe2, so we want both Xe2 integrated and discreted GPUs behave
the same way.
Xe2 shouldn't need any special setup of CCS in the loop.
Backport-to: 24.2
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30111>
(cherry picked from commit c5ee7e9bdc)
On pre-Xe2 platforms, the compression on these modifiers that
don't support compression are enabled. The compressed will be
resolved when needed. On Xe2+ we haven't support explicit
resolve, so all the paths to resolves are prohibited now. But
the code is still doing it, causing an assertion failure:
Fixes: vkcube
src/intel/vulkan/anv_private.h:5467:
anv_image_get_fast_clear_type_addr: Assertion
`device->info->ver < 20' failed.
Backport-to: 24.2
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30111>
(cherry picked from commit e054068787)
Users get the deprecation warning but didn't do anything, they left
things to `auto` and we pick the deprecated `swrast`? Hardly seems fair!
(I forgot to do this when I added the deprecation warning to ajax's commit)
Fixes: 010b2f9497 ("gallium/meson: Deconflate swrast/softpipe/llvmpipe")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30366>
(cherry picked from commit 77b69cdbc3)
`st_context_invalidate_state` call is required when changing buffer attachments.
Including header with BBitmap class definition is required to properly
call C++ destructor.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30372>
(cherry picked from commit 828c3cf002)
If the user passes in DRM_FORMAT_MOD_INVALID as an acceptable modifier,
we can progress with implicit modifiers. Add this to a more
comprehensive special case along with linear to make sure that we can
still allocate when users pass in a modifier list to a driver which
doesn't support modifiers.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Fixes: 361f362258 ("dri: Unify createImage and createImageWithModifiers")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30383>
(cherry picked from commit 0b16d7ebb9)
Contrary to what the original commit said, this is actually still used
(see .gitlab-ci/bare-metal/poe-powered.sh:205), and the boot retry logic
has been broken ever since, exacerbating the rpi farm boot problems.
Fixes: 97b2afa16a ("ci/bare-metal: Drop the 2 vs 1 exit code from poe_run.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30340>
(cherry picked from commit 2bc82b7147)
At update_map at i915_state_sampler.c max_lod is no longer set to 1
for npots. This almost totally disabled mipmapping.
Max_lod should still be set to 1, but only if it is still 0,
because no mipmap-levels are present.
According to existing comment at update_map this is needed, to
avoid problems at sampling,
if MIN_FILTER and MAX_FILTER differ.
Cc: mesa-stable
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28638>
(cherry picked from commit ad02bfe41d)
Remove at i945_texture_layout_2d() call of util_next_power_of_two(),
which oversized the npot-blocks for every level to get power of 2
for width and height. Hardware doesnot expect these oversized
npot-blocks, causing mangled mipmapping.
This also is done at i915_texture_layout_2d(), which is
used by older gen3-gpus.
Cc: mesa-stable
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28638>
(cherry picked from commit bb95d744ca)
After using sparse array to manager virtgpu bo, we set gem_handle to 0
to indicate that the bo is invalid. However, the gem handle gets closed
before that and can be reused by another newly created bo, leading to
the tracked gem handle being unexpectedly zero'ed out.
Fixes: 88f481dd74 ("venus: make sure gem_handle and vn_renderer_bo are 1:1")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30362>
(cherry picked from commit f788c87d02)
the original implementation of config selection had a number of flaws:
* using eglChooseConfigs with lots of loops, which was okay for filtering but
also added considerable complexity and made it difficult to correctly
get all the configs
* not adding enough configs; there were a lot more color and zs formats
which weren't in the base config list
* double buffer configs were never created
* srgb configs were also never created
there will now be fewer configs than there were pre-DRIL, but this is only
because accum buffers are now gone and not because anything of value is
missing
Fixes: 3de62b2f9a ("gallium/dril: Compatibility stub for the legacy DRI loader interface")
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30311>
(cherry picked from commit ec7afd2c24)
From all the many possible errors returned by the vm_bind ioctl, some
can actually happen in the wild when the system is under memory
pressure. Thomas Hellström pointed to us that, due to its asynchronous
nature, the vm_bind ioctl itself has to pin some memory, so if the
number of bind operations passed is too big, there is a probability
that it may run out of memory.
Previously the Kernel would return ENOMEM when this condition
happened. Since commit e8babb280b5e ("drm/xe: Convert multiple bind
ops into single job") the Kernel has started returning ENOBUFS when it
doesn't have enough memory to do what it wants but thinks we'd succeed
if we tried to do one bind operation at a time (instead of doing
multiple operations in the same ioctl), and ENOMEM in some other
situations. Still-uncommitted commit "drm/xe: Return -ENOBUFS if a
kmalloc fails which is tied to an array of binds" proposes converting
a few more ENOMEM cases no ENOBUFS.
Still, even ENOMEM situations could in theory be possible to recover
from, because if we wait some amount of time, resources that may have
been consuming memory could end up being freed by other threads or
processes, allowing the operations to succeed. So our main idea in
this patch is that we treat both ENOMEM and ENOBUFS in the same way,
so our implementation can work with any xe.ko driver regardless of
having or not having the commits mentioned above.
So in this patch, when we detect the system is under memory pressure
(i.e., the vm_bind() function returns VK_ERROR_OUT_OF_HOST_MEMORY), we
throw away our performance expectations and try to go slowly and
steady. First we wait everything we're supposed to wait (hoping that
this alone could also help to alleviate the memory pressure), and then
we synchronously bind one piece at a time (as this will ensure ENOBUFS
can't be returned), hoping that this won't cause the Kernel to try to
reserve too much memory. All this while also hoping that whatever
thing that may be eating all the memory goes away in the meantime. If
even this fails, we give up and hope the upper layer will be able to
figure out what to do.
This fixes a bunch of LNL failures and flaky tests (as LNL is our
first officially supported xe.ko platform). This can be seen in dEQP
but only if multiple tests are being run parallel. Happens in multiple
tests, some of which may include:
- dEQP-VK.sparse_resources.image_sparse_binding.2d_array.rgba8_snorm.1024_128_8
- dEQP-VK.sparse_resources.image_sparse_binding.3d.rgba16_snorm.1024_128_8
- dEQP-VK.sparse_resources.image_sparse_binding.3d.rgba16ui.512_256_6
I don't ever see these errors when running Alchemist/DG2 with xe.ko.
Fixes: e9f63df2f2 ("intel/dev: Enable LNL PCI IDs without INTEL_FORCE_PROBE")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30276>
(cherry picked from commit dd5362c78a)
We cannot always use /dev/dri/card0.
As a matter of fact, on systems with SimpleDRM enabled /dev/dri/card0
will be created by it and removed once a GPU driver has loaded.
In any case we shouldn't hard-code the device number and instead walk
the device list to find the first suitable device.
This issue is trivially reproducible with `eglinfo -B -p gbm` on
Ubuntu 24.04 or Fedora 40
Fixes: 32f4cf3808 ("egl/gbm: Fix EGL_DEFAULT_DISPLAY")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30325>
(cherry picked from commit 7949471716)
Installing this private library into the default library
search path avoids needing to rely on -Wl,-rpath,
which is inconsistently implemented as either DT_RUNPATH
or DT_RPATH on different distributions; in particular,
on distributions that implement it as DT_RPATH,
it interferes with use of LD_LIBRARY_PATH and has semantics
that are difficult to reason about, and is incompatible with
Steam's container runtime (which has the known limitation that
it only implements DT_RUNPATH and not DT_RPATH).
To avoid third-party developers being tempted to link to the
unstable libgallium, give it a name that varies with each Mesa release,
so that there is no obvious way for third-party software to link to it.
This is similar to the way the proprietary Nvidia driver sets up its similar
implementation-detail libraries such as libnvidia-glcore.so.535.183.01.
Fixes: 50fc7cc2 ("glx: directly link to gallium")
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30328>
(cherry picked from commit 9b7bb6cc9f)
Thanks to LCSSA, we can absolutely have phis with only one source and we
need to handle those in spilling. Fortunately, there's nothing really
special about that case. I was just prematurely optimizing.
Fixes: bcad2add47 ("nak: Add a spilling pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28084>
(cherry picked from commit 8bf3213a54)
So that the depfile contains a reference to the original source rather
than the copied one. This is necessary to avoid ninja not finding the
copy and causing spurious rebuilds when the copy has been removed, as
well as correctly tracking changes to the input files.
fixes: 46644ba371
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30132>
(cherry picked from commit 36160c967c)
Fix to call vn_image_bind_wsi_memory as long as the image is a wsi
image. This is needed so that we track the wsi memory in the wsi image
so that creating from swapchain info works normally on x11/wayland
platforms. This change also make it clear that ANB image owns the wsi
memory
Fixes: c4b30b604f ("venus: support VK_ANDROID_NATIVE_BUFFER_SPEC_VERSION 8")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30278>
(cherry picked from commit a27e3c5078)
Even though we don't advertise the sparseResidency feature, a bunch of
CTS tests just call GetPhysicalDeviceImageFormatProperties2() with
SPARSE_RESIDENCY_BIT and see if that fails.
Fixes: d2177f4764 ("nvk: Don't advertise sparse residency on Maxwell A")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30303>
(cherry picked from commit 68d6cdfbc5)
This fixes sporadic rendering corruption reported on MTL with ChromeOS
in cases where multiple processes including Chrome were utilizing the
GPU concurrently, and one of the processes happened to submit a
BLORP-only batch buffer right after a switch from a different context.
In such a scenario we would fail to add the BO that holds the pixel
hashing tables to the execbuf IOCTL for the BLORP batch, because it
was being pinned from iris_restore_render_saved_bos() which isn't
called for BLORP operations, potentially causing it to use garbage as
pixel pipe hashing tables, which led to corruption of the BLORP
rendering.
Technically this could have affected DG2 as well, but it has only been
reported on MTL so far.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30274>
(cherry picked from commit 49b433d5e7)
It doesn't do anything substantial yet, but it ignores enough so internal
shaders won't generate warnings.
I've also added ByVal parsing, because I need this one to actually fix a
correctness issue in a later patch.
Cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29896>
(cherry picked from commit 9b55dcca54)
We want to use actual sparse-capable queues as the default
trtt->queue, not copy queues that may have a companion_rcs_batch.
Before this patch, if we expose more than one queue *and* the
application creates a copy queue first, we'll end up setting
trtt->queue as the copy queue, which will GPU hang when we submit the
TR-TT batches as they don't support the pipe_control commands we
issue.
The trtt->queue queue is used for binding/unbinding buffers in code
paths where there's no specific queue coming from user space, such as
when we're creating or destroying a sparse resource.
This is not a problem yet on i915.ko since we are exposing
only a single queue, and it is not a problem for xe.ko since TR-TT is
not the default there. This is also not a problem in applications
that create the render or compute queue first. We plan to expose more
queues when using TR-TT, so this would become a problem without this
patch.
None of VK-GL-CTS seems to exercise that, and none of the Steam games
I tested exercise that as well. I was able to reproduce this issue
using our internal tracing tool.
v2: New implementation that doesn't break when we only have a compute
queue (Lionel).
Fixes: 04bfe828db ("anv/sparse: allow sparse resouces to use TR-TT as its backend")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30252>
(cherry picked from commit 3ab8ff99fa)
On MTL ChromeOS boards, during AI based video conference, we were
observing a lot of overhead from invalidations. Upon debug, it was found
that we were using clflush in this function and that isn't efficient.
With this change, while executing compute workloads like zoo models, we
are getting ~25% performance improvements in a best case scenario.
Rework:
* Jordan: Call intel_clflushopt_range() rather than
__builtin_ia32_clflushopt() because intel_mem.c is not compiled
with -mclflushopt.
Backport-to: 24.1 24.2
Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30238>
(cherry picked from commit 2f6919e6c2)
Despite its name, egl_dri2 works under plain DRI without DRI2, and the
old autotools build system built it when $enable_dri = yes, with no
check for DRI2. This fixes the build for GNU/Hurd, which supports DRI,
but doesn't have DRM and thus no DRI2 support.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/587>
(cherry picked from commit 149e8bff52)
This reverts commit ad862c36e5.
This change does not work, because libdrm is required if with_dri2 is
true. Moreover, we don't want all of DRI2 on Hurd, we just want the
egl_dri2 driver, as done by autotools. So first revert this to stop
trying to build all of DRI2.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/587>
(cherry picked from commit ec55a6c329)
This reverts commit 2fd85105c6.
Despite its name, egl_dri2 works under plain DRI without DRI2, and the
old autotools build system built it when $enable_dri = yes, with no
check for DRI2. A future commit will adapt meson.build to follow that
approach rather than this hackier one.
Note that the case removed in the second hunk is already dead code,
since system_has_kms_drm is false on GNU/Hurd, and could have been
dropped as part of 66d2ae0386 ("meson: forcefully disable libdrm when
host doesn't have it").
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/587>
(cherry picked from commit 8461776a09)
This implements an undocumented workaround for a hardware bug that
affects draw calls with a pixel shader that has 0 push constant cycles
when TBIMR is enabled, which has been seen to lead to a hang with
Fallout 3 and Metal Gear Rising Revengeance. This hardware bug has
been reported as HSDES#22020184996 which is still pending a resolution
by the hardware team. However since this workaround found empirically
has been confirmed to fix the issue reliably and it's relatively
harmless it seems worth checking in already even though no final W/A
number is available nor has the W/A json file been updated.
To avoid the issue we simply pad the push constant payload to be at
least 1 register. This is enabled via a brw_wm_prog_key since the
driver needs to be in agreement with the compiler on whether the dummy
push constant cycle is present, and it can be avoided in cases where
the driver knows that TBIMR will be disabled (e.g. for BLORP).
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10728
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11399
Fixes: 57decad976 ("intel/xehp: Enable TBIMR by default.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30031>
(cherry picked from commit b98eebbcb2)
Sporadically a6xx gpu will fail to recover causing the lava job
a660_vk_full to loop on error messages for three hours before timing
out.
A few sporadic error messages may still be recoverable, but when multiple
errors occur over a short period, successful recovery is unlikely. Parse
the logs to look for repeated error messages within a short time period.
If found, cancel the lava job and rerun it.
Also add unit tests for this behaviour.
cc: mesa-stable
Reported-by: Valentine Burley <valentine.burley@gmail.com>
Acked-by: Daniel Stone <daniel.stone@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30032>
(cherry picked from commit 72c182f873)
# Bring artifacts back from the NFS dir to the build dir where gitlab-runner
# will look for them.
cp -Rp /nfs/results/. results/
section_end dut_cleanup
if[ -f "${STRUCTURED_LOG_FILE}"];then
cp -p ${STRUCTURED_LOG_FILE} results/
echo"Structured log file is available at https://${CI_PROJECT_ROOT_NAMESPACE}.pages.freedesktop.org/-/${CI_PROJECT_NAME}/-/jobs/${CI_JOB_ID}/artifacts/results/${STRUCTURED_LOG_FILE}"
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.