With arrays we really have to use the correct size for the base
mipmap to get the right array pitch. In particular, using
surf_pitch results in pitch that is bigger than the base mipmap
and hence results in wrong pitches computed by the HW.
It seems that on GFX9 this has mostly been hidden by the epitch
provided in the descriptor but this is not something we do on
GFX10 anymore.
Now this has some draw-backs:
1. normalized coordinates don't work
2. Bounds checking uses slightly bigger bounds.
2 mostly is not an issue as we still ensure that they're within
the texture memory and not overlapping other layers/mips, but
we can't properly ignore writes.
1 is kinda dead in the water ... On the other hand I'd argue that
using normalized coords & a filter for sampling a block view of
a compressed format is extraordinarily useless.
The old method we employed already had these drawbacks for everything
except the base miplevel of the imageview.
AFAICT this is the same tradeoff AMDVLK makes and no CTS test hits
this. (once it does I think the HW is dead in the water ... Only
workaround I can think of is shader processing which is hard because
we don't know texture formats at compile time.)
I also removed the extra calculations when the image has only 1 mip
level because they ended up being a no-op in that case.
CC: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2292
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2266
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2483
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2906
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3607
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7090>
(cherry picked from commit 1fb3e1fb70)
Fixes rendering corruption in the shadowmappingcascade Sascha Willems
Vulkan demo. To see the corruption, I adjusted the demo options as
follows:
1. Enable "Display depth map"
2. Set "Split lambda" to 0.100
3. Make "Cascade" non-zero.
Fixes: 80ffbe915f ("anv: Add support for HiZ+CCS")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7046>
(cherry picked from commit cce6fc3b5c)
In 53bfcdeecf, we added load/store_scratch instructions which deviate
a little bit from most memory load/store instructions in that we can't
use the normal untyped read/write instructions which can read and write
up to a vec4 at a time. Instead, we have to use the DWORD scattered
read/write instructions which are scalar. To handle this, we added code
to brw_nir_lower_mem_access_bit_sizes to cause them to be scalarized.
However, one case was missing: the load-as-larger-vector case. In this
case, we take small bit-sized constant-offset loads replace it with a
32-bit load and shuffle the result around as needed.
For scratch, this case is much trickier to get right because it often
emits vec2 or wider which we would then have to lower again. We did
this for other load and store ops because, for lower bit-sizes we have
to scalarize thanks to the byte scattered read/write instructions being
scalar. However, for scratch we're not losing as much because we can't
vectorize 32-bit loads and stores either. It's easier to just disallow
it whenever we have to scalarize.
Fixes: 53bfcdeecf "intel/fs: Implement the new load/store_scratch..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6872>
(cherry picked from commit fd04f858b0)
When support for multi-slice fast-clears was introduced for color
surfaces, an existing optimization for skipping fast-clears was not
updated (this optimization assumed single-slice fast-clears). As a
result, the driver began to skip multi-layer fast-clears if just the
first slice was in the CLEAR state (ignoring the state of the others).
A Civilization VI trace was the only workload I found to make use of
this optimization and it did so for 2D, non-array textures. Therefore,
this fix simply checks that the depth of the clear box is 1. It also
moves the single-slice aux-state query closer to the optimization to
clarify the need for the depth check.
Enables iris to pass a case of the fcc-write-after-clear piglit test,
[fast-clear tracking across layers 0 -> 1 -> (0,1)].
Fixes: 393f659ed8 ("iris: Enable fast clears on other miplevels and layers than 0.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6973>
(cherry picked from commit 3f3a5f3489)
When I copied and pasted the code from MOV_INDIRECT for handling the
dependency controls, I missed a subtle difference between MOV_INDIRECT
and SHUFFLE. Specifically, MOV_INDIRECT gets lowered to a narrow
instruction on Gen7 by the SIMD width lowering whereas SHUFFLE has to
split it in the generator. Therefore, the check safety check for
whether or not we can use dependency control has to be based on the
lowered width rather than the width of the original instruction.
Fixes: a8ac61b0ee "intel/fs: NoMask initialize the address..."
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3593
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6989>
(cherry picked from commit 8427e56067)
The volatile pattern gives me flaky results for 32-bit builds on
ChromeOS Android. This is because on 32-bit the volatile 64-bit
loads gets split into 2 32-bit loads each.
So if we read the lower dword first and then the upper dword, it
can happen that the upper dword is already changed but the lower
dword isn't yet. In particular for occlusion queries this gives
false readings, as the upper dword commonly only constains the
ready bit.
With the GCC atomic intrinsics we get a call to __atomic_load_8
in libatomic.so which does the right thing.
An alternative fix would be to explicitly split the 32-bit loads
in the right order and do a bunch of retries if things change, though
that gets messy quickly and for 32-bit builds only doesn't feel worth
it that much.
CC: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6933>
(cherry picked from commit 7568c97df1)
The linker was adding all state vars as uniforms, doubling the storage size
for shaders using only builtin uniforms, which increased CPU overhead for
constant buffer uploads.
When this code was originally ported from the GLSL IR linker we forgot
to exclude builtins because the check was not done in the
add_uniform_to_shader class but rather a check was done when passing
variables to this class for processing.
Fixes: 664e4a610d ("glsl/nir: Fill in the Parameters in NIR linker")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Tested-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6958>
(cherry picked from commit 038fcbcaed)
Workaround # 22011374674
Applied to i965, iris and anv drivers
No performance impact is observed with WA.
Cc: mesa-stable
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 545d852a7a)
After disable SDMA on Arcturus(gfx9), dead lock with aux_context_lock is
detected since si_screen_clear_buffer is called recursively before
release lock.
The call trace is:
si_clear_render_target->si_compute_clear_render_target->
si_launch_grid_internal->si_launch_grid->si_emit_cache_flush->
si_prim_discard_signal_next_compute_ib_start->u_suballocator_alloc->
si_resource_create->si_buffer_create->si_alloc_resource->
si_screen_clear_buffer->simple_mtx_lock->
si_sdma_clear_buffer->si_pipe_clear_buffer->
si_clear_buffer->si_compute_do_clear_or_copy->
si_launch_grid_internal->si_launch_grid->si_emit_cache_flush->
si_prim_discard_signal_next_compute_ib_start->u_suballocator_alloc->
si_resource_create->si_buffer_create->si_alloc_resource->
si_screen_clear_buffer->simple_mtx_lock
Fixes: 07a49bf597 "radeonsi: disable SDMA on gfx9"
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6941>
(cherry picked from commit 5e8791a0bf)
In the case where end was a instruction-based cursor, we would mix up
our blocks and end up with block_begin pointing after the second split.
This causes a segfault as the cf_node list walk at the end of the
function never terminates properly. There's also a possibility of
mix-up if begin is an instruction-based cursor which was found by
inspection.
Fixes: fc7f2d2364 "nir/cf: add new control modification API's"
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6866>
(cherry picked from commit 7dbb1f7462)
vl_mpeg12_decoder needs to override the chroma_format value to get the
correct size calculated (chroma_format is used by vl_video_buffer_adjust_size).
I'm not sure why it's needed, but this is needed to get correct mpeg decode.
Fixes: 24f2b0a856 ("gallium/video: remove pipe_video_buffer.chroma_format")
Acked-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6817>
(cherry picked from commit 2584d48b2c)
As shown in the valid SPIR-V below, if one switch case statement
directly jumps to the merge block, it has no branches at all and
we have to reset the fall variable. Otherwise, it creates an
unintentional fallthrough.
OpSelectionMerge %97 None
OpSwitch %96 %97 1 %99 2 %100
%100 = OpLabel
%102 = OpAccessChain %_ptr_StorageBuffer_v4float %86 %uint_0 %uint_37
%103 = OpLoad %v4float %102
%104 = OpBitcast %v4uint %103
%105 = OpCompositeExtract %uint %104 0
%106 = OpShiftLeftLogical %uint %105 %uint_1
OpBranch %97
%99 = OpLabel
OpBranch %97
%97 = OpLabel
%107 = OpPhi %uint %uint_4 %75 %uint_5 %99 %106 %100
This fixes serious corruption in Horizon Zero Dawn.
v2: Changed the code to skip the entire if-block instead of resetting
the fallthrough variable.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3460
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6590>
(cherry picked from commit 57fba85da4)
Scratch stores are being lowered to the instructions with side-effects,
however they should be enabled in fs helper invocations, since they
are produced from operations which don't imply side-effects.
To fix this - we move the decision of whether the sample mask predication
is enable to the point where logical brw instructions are created.
GLSL example of the issue:
int tmp[1024];
...
do {
// changes to tmp
} while (some_condition(tmp))
If `tmp` is lowered to scrach memory, `some_condition` would be
undefined if scratch write is predicated on sample mask, making
possible for the while loop to become infinite and hang the GPU.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3256
Fixes: 53bfcdeecf
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 77486db867)
Without this, we end up throwing errors on code along these lines when
rendering using single-buffering:
GLint att;
glGetIntegerv(GL_READ_BUFFER, &att);
glGetFramebufferAttachmentParameteriv(GL_READ_FRAMEBUFFER, att, ...);
This is because we internally translate GL_BACK (which is what
glGetIntegerv returned) to GL_FRONT, which we don't handle in the
Desktop GL case. So let's start handling it.
This fixes the GLTF-GL33.gtf21.GL2FixedTests.buffer_color.blend_color
test for me.
Fixes: e6ca6e587e ("mesa: Handle pbuffers in desktop GL framebuffer attachment queries")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6815>
(cherry picked from commit 9e13a16c97)
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
"In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior.... Out-of-bounds reads return undefined values, which
include values from other variables of the active program or zero."
Robustness extensions suggest to return zero on out-of-bounds
accesses, however it's not applicable to the arrays of samplers,
so just clamp the index.
Otherwise instr->sampler_index or instr->texture_index would be out
of bounds, and they are used as an index to arrays of driver state.
E.g. this fixes such dereference:
if (options->lower_tex_packing[tex->sampler_index] !=
in nir_lower_tex.c
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6428>
(cherry picked from commit f2b17dec12)
Out-of-bounds writes could be eliminated per spec:
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
"In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior.... Out-of-bounds writes may be discarded or overwrite
other variables of the active program."
Fixes: 1235850522
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6428>
(cherry picked from commit 0ba82f78a5)
Out-of-bounds writes could be eliminated per spec:
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
"In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior....
Out-of-bounds writes may be discarded or overwrite
other variables of the active program.
Out-of-bounds reads return undefined values, which
include values from other variables of the active program or zero."
GL_KHR_robustness and GL_ARB_robustness encourage us to return zero
for reads.
Otherwise get_io_offset would return out-of-bound offset which may
result in out-of-bound loading/storing of inputs/outputs,
that could cause issues in drivers down the line.
E.g. this fixes such dereference:
int vue_slot = vue_map->varying_to_slot[intrin->const_index[0]];
in brw_nir.c
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6428>
(cherry picked from commit 66669eb529)
It's not really unordered in the sense that it can still stall on
ordered things and we don't need a SYNC_NOP for that because it is a
SYNC_NOP. However, it also doesn't count when computing instruction
distances.
Fixes: 18e72ee210 "intel/fs: Add FS_OPCODE_SCHEDULING_FENCE"
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6781>
(cherry picked from commit f63ffc18e7)
However complicated DCC addressing is it is still based on tiles.
If we have the intra-tile offsets + tile dimensions we can expand
that to the full image ourselves.
Behavior around ~1080p on a 2500U:
old:
30-60 ms on every miss
new:
5 ms initally (miss in the tile cache)
<0.5 ms afterwards
The most common case is that the tile cache only contains data for
2 tiles, which for Raven/Renoir/Navi14 will be 4 KiB each, so the
size increase is fairly modest.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5865>
(cherry picked from commit a37aeb128d)
syncobj wait takes int64_t timeout and won't clamp it
in kernel code, so we have to pass in INT64_MAX instead
of OS_TIMEOUT_INFINITE which is UINT64_MAX. Otherwise
syncobj wait with OS_TIMEOUT_INFINITE case just return
fail.
Fixes: c638301b42 "radeonsi: fix syncobj wait timeout"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6676>
(cherry picked from commit ef980ac0c1)
SPIR-V's OpKill is a control-flow instruction but NIR's discard is not.
Therefore, it can be valid SPIR-V to have
if (...) {
foo = /* something */
} else {
discard;
}
use(foo);
without any phi between the definition of foo and its use. This is not
true in NIR, however, because NIR's discard isn't considered
control-flow. Arguably, this is a NIR bug but making discard control-
flow is a very deep change that can have serious ans subtle
side-effects. The easier thing to do is just fix up the SSA in case we
have an OpKill which might have gotten us into the above case.
Fixes dEQP-VK.graphicsfuzz.vectors-and-discard-in-function with the new
NIR dominance validation pass enabled.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5288>
(cherry picked from commit 7cedc4128a)
The result of 0xf << 28 is a signed integer and hence overflows into the sign
bit. In practice compilers did the right thing here, since the intent of the
code was unsigned arithmetic anyway.
Cc: mesa-stable
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6568>
(cherry picked from commit 93c8777ace)
The result of 0xf << 28 is a signed integer and hence overflows into the sign
bit. In practice compilers did the right thing here, since the intent of the
code was unsigned arithmetic anyway.
These conditions were observed in:
* dEQP-VK.pipeline.image.suballocation.sampling_type.combined.view_type.1d.format.r4g4b4a4_unorm_pack16.count_8.size.512x1
* dEQP-VK.binding_model.descriptorset_random.sets32.noarray.ubolimitlow.sbolimitlow.sampledimglow.outimgonly.noiub.nouab.frag.ialimithigh.0
Cc: mesa-stable
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6568>
(cherry picked from commit f18fc34c4d)
If decrement == 0 then:
- it isn't safe to access the submission
- even if it is, checking that the result of the atomic_sub is 0
doesn't given an unique owner anymore.
So skip it. The submission always starts out with refcount >= 1,
so first one to decrement to 0 still get dibs on executing it.
Fixes: 4aa75bb3bd "radv: Add wait-before-submit support for timelines."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6478>
(cherry picked from commit 6b75262941)
We can get duplicate declarations for an index (for example dvec3 + float
packed into 2 vec4s, the second one won't pack into the first's array
decl), and we'd end up stepping by the wrong amount in GS vtx/prim emit.
Fixes vs-gs-fs-double, sso-vs-gs-fs-array-interleave piglit tests.
Fixes: 49155c3264 ("draw/tgsi: fix geometry shader input/output swizzling")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6567>
(cherry picked from commit 329dee1455)
Only advertise VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT if CLOCK_MONOTONIC_RAW
is defined. Fixes the build on OpenBSD which has CLOCK_MONOTONIC but not
CLOCK_MONOTONIC_RAW.
Fixes: 67a2c1493c ("vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6517>
(cherry picked from commit 4500e6e460)
The raw query is meant to be used with MDAPI [1]. When using this
metric without this library, we usually selected the TestOa metric to
provide some default sensible values (instead of undefined).
Historically this TestOa metric lived in the kernel at ID=1. We
removed all metrics from the kernel in kernel commit 9aba9c188da136
("drm/i915/perf: remove generated code").
This fixes the Mesa code to use a valid metric set ID (1 could work
some of the time, but not guaranteed).
[1] : https://github.com/intel/metrics-discovery
v2: Store fallback metric at init time
v3: Drop TestOa lookout
v4: Skip the existing queries (Marcin)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
CC: <mesa-stable@lists.freedesktop.org>
Tested-by: Marcin Ślusarz <marcin.slusarz@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6438>
(cherry-picked from commit ec1fa1d51f)
The immediate case is pretty uncommon to see but it can happen, in
theory. BROADCAST is typically used to uniformize values and those are
usually 32-bit. However, it does come up in some subgroup ops.
Fixes: 49c21802cb "intel/compiler: Split has_64bit_types into float/int"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6211>
(cherry picked from commit cccb497d3c)
I have no idea how this pass ever worked. I guess it worked ok on the
one or two piglit tests but the whole thing seemed very fragile. It
makes a number of undocumented and unasserted assumptions and they
aren't always valid. This rewrite makes a number of changes:
1. It now properly handles the case where the gl_SampleMask write comes
before the gl_FragColor or gl_FragData[0] write.
2. It should early-exit faster because it now looks at bits in
shader_info::outputs_written instead of looking for variables.
3. Instead of the fragile variable lookup where we try to look the
variable up by both location and driver_location and match, we just
use the driver_location calculations used by brw_fs_nir.
4. It asserts that the index parameter to store_output is a constant
instead of silently failing if it isn't.
5. We now actually assert the implicit assumption that the two writes
are in the same block. We go even further and assert that they are
in the last block in the shader.
6. In the case where 3 or fewer components of the output are written,
we explicitly choose to leave the sample mask alone.
Fixes: 7ecfbd4f6d "nir: Add alpha_to_coverage lowering pass"
Closes: #3166
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6233>
(cherry picked from commit b6fdb1405e)
In FIFO presentation mode we block either on our present-queue or on Present
events after an image was transmitted.
In case we receive completion events without idle events at some point we
exhaust our acquire-queue and can not block anymore on present-queue.
Ensure that the consumer has at least one image to acquire before blocking
again on present-queue. Otherwise wait for one from the X server.
CC: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3344
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6513>
(cherry picked from commit ec5e918ef4)
Mesa builds with -std=c99 but uses timespec_get() a c11 function.
Build with _ISOC11_SOURCE for c11 visibility when -std is specified.
On linux c11 visibility comes from defining _GNU_SOURCE.
Fixes: e3a8013de8 ("util/u_queue: add util_queue_fence_wait_timeout")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5630>
(cherry picked from commit f9a7e6e854)
Since cbee1bfb34 endian.h is unconditionally
used if available.
glibc has byte order defines with two leading underscores. OpenBSD
has private defines with a single leading underscore in machine/endian.h
and public defines in endian.h with no underscore.
The code under the endian.h block did not check if symbols were
defined before equating them so '#if __BYTE_ORDER == __LITTLE_ENDIAN'
would turn into '#if 0 == 0' which is always true.
Fixes: cbee1bfb34 ("meson/configure: detect endian.h instead of trying to guess when it's available")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5630>
(cherry picked from commit 7eab6845e9)
PTB assumes that base instance to be 0 at start of tile, but hw would
not do that, we need to set it. It is worth to note that the opcode
name is somewhat confusing as what it really sets is the base
instance. We could rename the opcode, but then the name would be
different to the original Broadcom name, so confusing in any case.
This fixes several dEQP-GLES3 and dEQP-GLES31 tests that passes
individually, but started to fail depending on other tests running
before using base instance different to zero.
This is the backport of a Vulkan patch that fixed some Vulkan CTS
tests that start to fails after some other tests used an instance id.
CC: 20.2 20.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6447>
(cherry picked from commit 05a0349949)
If you have a sequence where there is a single buffer associated with
the current render target, and then you end up shadowing it on the 3d
pipe (u_blitter), because of how we swap the new shadow and rsc before
the back-blit, you could end up confusing things into thinking that
the blitters framebuffer state is the same as the current framebuffer
state.
Re-organizing the sequence to swap after the blit is complicated when
also having to deal with CPU memcpy blit path, and the batch/rsc
accounting. So instead just detect this case and flush if we need to.
Fixes:
dEQP-GLES31.functional.stencil_texturing.render.depth24_stencil8_clear
dEQP-GLES31.functional.stencil_texturing.render.depth24_stencil8_draw
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6434>
(cherry picked from commit 1fa43a4a8e)
Several optimization paths, including constant folding, can lead to
indexing vector with an out of bounds index.
Out-of-bounds writes could be eliminated per spec:
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
"In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior.... Out-of-bounds writes may be discarded or overwrite
other variables of the active program."
Fixes piglit tests:
spec@glsl-1.20@execution@vector-out-of-bounds-access@fs-vec4-out-of-bounds-1
spec@glsl-1.20@execution@vector-out-of-bounds-access@fs-vec4-out-of-bounds-6
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6363>
(cherry picked from commit 5922d57a18)
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
"In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior.... Out-of-bounds writes may be discarded or overwrite
other variables of the active program."
Fixes crashes when dereferencing gl_ClipDistance and gl_TessLevel*, e.g:
int index = -1;
gl_ClipDistance[index] = -1;
When LowerCombinedClipCullDistance is true.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6363>
(cherry picked from commit e802bff69e)
On glmark2-es2 -bterrain:
594.05KB leaked over 9282 calls from:
panfrost_batch_update_bo_access
at ../src/gallium/drivers/panfrost/pan_job.c:462
in /home/alyssa/rockchip_dri.so
panfrost_batch_add_bo
at ../src/gallium/drivers/panfrost/pan_job.c:560
panfrost_batch_add_bo
at ../src/gallium/drivers/panfrost/pan_job.c:519
in /home/alyssa/rockchip_dri.so
panfrost_batch_add_resource_bos
at ../src/gallium/drivers/panfrost/pan_job.c:569
panfrost_batch_add_fbo_bos
at ../src/gallium/drivers/panfrost/pan_job.c:588
in /home/alyssa/rockchip_dri.so
panfrost_create_batch
at ../src/gallium/drivers/panfrost/pan_job.c:126
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: mesa-stable
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6373>
(cherry picked from commit 1cb47f8eea)
8.74KB leaked over 52 calls from:
0xffffbb5b9fc3
in ??
_mesa_hash_table_init
at ../src/util/hash_table.c:163
in /home/alyssa/rockchip_dri.so
_mesa_hash_table_create
at ../src/util/hash_table.c:186
_mesa_hash_table_u64_create
at ../src/util/hash_table.c:701
in /home/alyssa/rockchip_dri.so
panfrost_nir_assign_sysvals
at ../src/panfrost/util/pan_sysval.c:130
in /home/alyssa/rockchip_dri.so
midgard_compile_shader_nir
at ../src/panfrost/midgard/midgard_compile.c:2905
in /home/alyssa/rockchip_dri.so
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: mesa-stable
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6373>
(cherry picked from commit 680fb05f99)
Before we drop the reference.
160 calls with 0B peak consumption from:
0xffffbd9d2fc3
in ??
pan_compute_liveness
at ../src/panfrost/util/pan_liveness.c:127
in /home/alyssa/rockchip_dri.so
mir_compute_liveness
at ../src/panfrost/midgard/midgard_liveness.c:55
in /home/alyssa/rockchip_dri.so
midgard_opt_dead_code_eliminate
at ../src/panfrost/midgard/midgard_opt_dce.c:118
in /home/alyssa/rockchip_dri.so
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: mesa-stable
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6373>
(cherry picked from commit 8dd38e5a3e)
No need to put it on the context, we can keep it local in mir_squeeze
and drop when we're done.
15.77KB leaked over 85 calls from:
0xffffaed3bfc3
in ??
_mesa_hash_table_rehash
at ../src/util/hash_table.c:368
in /home/alyssa/rockchip_dri.so
hash_table_insert
at ../src/util/hash_table.c:403
in /home/alyssa/rockchip_dri.so
find_or_allocate_temp
at ../src/panfrost/midgard/mir_squeeze.c:48
in /home/alyssa/rockchip_dri.so
find_or_allocate_temp
at ../src/panfrost/midgard/mir_squeeze.c:35
in /home/alyssa/rockchip_dri.so
mir_squeeze_index
at ../src/panfrost/midgard/mir_squeeze.c:76
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: mesa-stable
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6373>
(cherry picked from commit 62637a913a)
We really need to update the enum for consistency, but that involves
a bunch of GL & bitfield work which is error-prone, so since this is
a fix for stable lets do the simple things.
Confirmed that nothing in radv/aco/nir/spirv uses MAX_VERT_ATTRIB
except the one thing I bumped.
CC: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6120>
(cherry picked from commit f038b3a136)
KHR-GL45.robust_buffer_access_behavior.vertex_buffer_objects
on the CTS 4.6.0 branch and this fixes it.
Roland identified that if the vertex format doesn't contain channels
then we shouldn't be overriding them to 0, so RGB fetch out of bounds
should return 0 for RGB, but the A channel should still be getting back
1.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6287>
(cherry picked from commit 430e3310e2)
We cannot decompress from the compute queue. While I'm pretty sure
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL is only useful on the
graphics queue, I cannot find a VU that prevents the transition
from happening on another queue, so we need to be careful here.
This patch ensures we do the decompression on the barrier that changes
the queue ownership.
Another problem was that DCC images were considered fast-clearable
when not DCC compressed, which resulted in a mess with concurrent
queue ownership.
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3387
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6252>
(cherry picked from commit e362ccb20c)
Currently Linux kernel gathers performance counters at fixed
intervals (~5-10ms), so if application uses AMD_performance_monitor
extension and immediately after glFinish() asks GL driver for HW
performance counter values it might not get any data (values == 0).
Fix this by moving the "read counters from kernel" code from
"is query ready" to "get counter values" callback with a loop around
it. Unfortunately it means that the "read counters from kernel"
code can spin for up to 10ms.
Ideally kernel should gather performance counters whenever we ask
it for counter values, but for now we have deal with what we have.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5788>
(cherry picked from commit 2fbab5a1b3)
The code applied the restriction too late, which could overflow LDS size,
which started happening more often after the minimum vertex count was
increased for Sienna.
Incorporate the clamping into the previous code for rounding up the counts.
Now the LDS size can never overflow, but it may use vector lanes less
efficiently (max_gsprims can be decreased more), which will be addressed
in the next commit.
Fixes: 4ecc39e1aa ("radeonsi/gfx10: NGG geometry shader PM4 and upload")
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6137>
(cherry picked from commit 64c741ffb7)
One day, we may want copy_prop_vars or other passes to be able to see
through certain types of casts such as when someone casts a uint64_t to
a uvec2. However, for now we should just avoid casts all together.
Fixes: d8e3edb784 "nir/deref: Support casts and ptr_as_array in..."
Tested-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6072>
(cherry picked from commit 611f654fcf)
It's possible an SSA value depends on a register; in this case, chasing
the source would result in a crash as the chase helper in NIR asserts
is_ssa. Instead we should check a priori that all the argments are in
fact SSA, bailing otherwise.
In the piglit shader exhibiting this bug (by looping over the index),
bailing on the ishl instruction is -necessary-. This is not merely us
being cowardly to avoid seeing through the registers; indeed, if we
wrote away the ishl instruction, the shift itself would have to be
stored in a load/store register (r26/r27) which would preclude reading
it in the loop, creating a register allocation failure later in the
compile. So this is the correct solution due to the restricted
semantics.
Closes#3286
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reported-by: Icecream95 <ixn@keemail.me>
Fixes: f5401cb886 ("pan/midgard: Add address analysis framework")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6144>
(cherry picked from commit b2f475251e)
This effectively reverts part of 2907faee, which changed dri2_make_current() to
always take a dri2_dpy reference regardless of whether or not a new context or
surface(s) were being bound. This led to a reference count imbalance as there
was no corresponding code added to drop a reference on the dri2_dpy. As a
consequence, any application that called eglInitialize() on a default/native
display after having called eglTerminate() would always get back the old
dri2_dpy, inheriting its previous state.
As the reference count is there to prevent the dri2_dpy from being destroyed
between eglTerminate() and eglInitialize() calls when a context is still bound,
a reference should only be taken when a successful call to
dri2_dpy->core->bindContext() has been made. Fix the issue by restoring the old
reference counting behaviour.
Fixes: 4e8f95f64d ("egl_dri2: Always unbind old contexts")
Fixes: 2907faee7a ("egl/dri2: try to bind old context if bindContext failed")
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Nicolas Cortes <nicolas.g.cortes@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3328
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6105>
(cherry picked from commit d0e32e5f81)
We were space-leaking iris_compiled_shader objects, leaving them around
basically forever - long after the associated iris_uncompiled_shader was
deleted. Perhaps even more importantly, this left the BO containing the
assembly referenced, meaning those were never reclaimed either. For
long running applications, this can leak quite a bit of memory.
Now, when freeing iris_uncompiled_shader, we hunt down any associated
iris_compiled_shader objects and pitch those (and their BO) as well.
One issue is that the shader variants can still be bound, because we
haven't done a draw that updates the compiled shaders yet. This can
cause issues because state changes want to look at the old program to
know what to flag dirty. It's a bit tricky to get right, so instead
we defer variant deletion until the shaders are properly unbound, by
stashing them on a "dead" list and tidying that each time we try and
delete some shader variants.
This ensures long running programs delete their shaders eventually.
Fixes: ed4ffb9715 ("iris: rework program cache interface")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6075>
(cherry picked from commit 128cbcd3a7)
Fixes some dEQP-VK.renderpass2.* flakes. Valgrind:
Test case 'dEQP-VK.renderpass2.dedicated_allocation.attachment.8.724'..
==754520== Conditional jump or move depends on uninitialised value(s)
==754520== at 0x575B21C: radv_layout_is_htile_compressed (radv_image.c:1690)
==754520== by 0x572F470: radv_handle_depth_image_transition (radv_cmd_buffer.c:5855)
==754520== by 0x572F2F2: radv_handle_image_transition (radv_cmd_buffer.c:6123)
==754520== by 0x572EEC6: radv_handle_subpass_image_transition (radv_cmd_buffer.c:3385)
==754520== by 0x572A104: radv_cmd_buffer_begin_subpass (radv_cmd_buffer.c:4843)
==754520== by 0x572A007: radv_CmdBeginRenderPass (radv_cmd_buffer.c:4913)
==754520== by 0x572A197: radv_CmdBeginRenderPass2 (radv_cmd_buffer.c:4921)
Why false?
A renderloop happens when the same attachment is both used as input
attachment and output (color, ds) attachment in a subpass. Of course
this doesn't happen outside of a renderpass and hence we can initialize
it to false at the start of the renderpass.
Fixes: 66131ceb8b "radv: Pass through render loop detection to internal layout decisions."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3074
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6068>
(cherry picked from commit 18fe130ec9)
This avoids some performance regressions on Gen12 platforms caused by
SIMD32 fragment shaders reported in titles like Dota2, TF2, Xonotic,
and GFXBench5 Car Chase and Aztec Ruins.
The most obvious pattern in the regressing shaders I identified among
these workloads is that they all had non-uniform discard statements,
which are handled rather optimistically by the current IR analysis
pass: No penalty is currently applied to the SIMD32 variant of the
shader in the form of differing branching weights like we do for other
control flow instructions in order to account for the greater
likelihood of divergence of a SIMD32 shader.
Simply changing that by giving the same treatment to discard
statements as we give to other branching instructions seemed to hurt
more than it helped on platforms earlier than Gen12, since it reversed
most of the improvement obtained from SIMD32 fragment shaders in
Manhattan for no measurable benefit in other workloads (Manhattan has
a handful of shaders with statically non-uniform discard statements
which actually perform better in SIMD32 mode due to their approximate
dynamic uniformity). For that reason this change is applied to Gen12+
platforms only.
I've been running a number of tests trying to understand the
difference in behavior between Gen12 and earlier platforms, and most
of the evidence I've gathered seems to point at EU fusion being the
culprit: Unlike previous generations, on Gen12 EUs are arranged in
pairs which execute instructions in lockstep, giving an effective warp
size of 64 threads in SIMD32 mode, which seems to increase the
likelihood for control flow divergence in some of the affected shaders
significantly.
Fixes: 188a3659ae "intel/ir: Import shader performance analysis pass."
Reported-by: Caleb Callaway <caleb.callaway@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5910>
(cherry picked from commit 4d73988f6f)
When creating a sampler-type, we need to pass the correct vaclue for
the "is_shadow"-parameter to glsl_sampler_type(), otherwise the compiler
backend will have no clue about this being a shadow-sampler.
Fixes: 1c0f92d8a8 ("nir: Create sampler variables in prog_to_nir.")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5986>
(cherry picked from commit c33e8d7d52)
This change mostly touches error handling code paths, where a
bug was found when the DRI driver failed to bind a new DRI
context. Specifically, the reason for it to fail was the window
system unable (for whatever reason) to provide the DRI drawable
with a buffer. In this instance, Mesa un-does the EGL bindings,
but doesn't restore the old DRI context, hence remaining in a
funny state. It's worth mentioning that despite trying, there
is no guarantee that the old DRI context can be restored,
depending on the runtime.
Before this change, if bindContext() failed then
dri2_make_current() would rebind the old EGL context and
surfaces and return EGL_BAD_MATCH. However, it wouldn't rebind
the DRI context and surfaces, thus leaving it in an
inconsistent and unrecoverable state.
After this change, dri2_make_current() tries to bind the old
DRI context and surfaces when bindContext() failed. If unable
to do so, it leaves EGL and the DRI driver in a consistent
state, it reports an error and returns EGL_BAD_MATCH.
Fixes: 4e8f95f64d ("egl_dri2: Always unbind old contexts")
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5707>
(cherry picked from commit 2907faee7a)
Whenever a struct type is decorated Block or BufferBlock we turn that
into a GLSL_TYPE_INTERFACE. Since these decorations can end up random
places, we should allow them for constants.
Closes: #3252
Fixes: 9d0ae777dd "spirv: Use interface type for block and buffer..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5855>
(cherry picked from commit 351b5137d7)
EXT_shader_image_load_store says:
The format of the image unit must be in the "1x32" equivalence class
otherwise the atomic operation is invalid.
ARB_shader_image_load_store says:
We will only support 32-bit atomic operations on images
Fixes: fc0a2e5d01 ("glsl: add EXT_shader_image_load_store new image functions")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5688>
(cherry picked from commit 1e3aeda528)
ntq_setup_fs_inputs and ntq_setup_gs_inputs sort the inputs according to
the driver location. This input array is then used to calculate the VPM
offset for the outputs in the previous stage. However, it wasn’t taking
into account variables that are packed into a single varying slot. In
that case they would have the same driver_location and are
distinguished by location_frac.
This patch makes it additionally sort by location_frac when the driver
locations are equal. This can happen when the compiler packs varyings
that are sized less than vec4. Without this fix, when the VPM is used to
transmit data free-form between the stages (such as VS->GS) then it
would end up writing to inconsistent locations.
Fixes dEQP tests such as:
dEQP-GLES31.functional.primitive_bounding_box.lines.global_state.
vertex_geometry_fragment.default_framebuffer_bbox_equal
Fixes: 5d578c27ce ("v3d: add initial compiler plumbing for geometry shaders")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5787>
(cherry picked from commit deefebc55b)
If we clear depth only texture via glClearTex(Sub)Image it may cause:
../src/intel/blorp/blorp_genX_exec.h:1554: blorp_emit_surface_states: Assertion `params->depth.enabled || params->stencil.enabled' failed.
due to clear_depth_stencil calling blorp_clear_depth_stencil when
depth is already fast-cleared and there is no stencil.
Fixes piglit test: arb_clear_texture-depth
Fixes: 51638cf18a
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5770>
(cherry picked from commit 77844690be)
The offset for the VPM write for storing outputs from the geometry
shader isn’t necessarily uniform across all the lanes. This can happen
if some of the lanes don’t emit some of the vertices. In that case the
offset for the subsequent vertices will be different in each lane. In
that case we need to use the stvpmd instruction instead of stvpmv
because it will scatter the values out.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3150
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5621>
(cherry picked from commit 3b1c511b09)
Previously we would match the start of the compatible string, in
a couple of cases, in order to match compatible strings like
"qcom,adreno-630.2". But these cases would always list a more
generic compatible (ie. "qcom,adreno") as a later choice. So if
we parse all the compatible strings, we can do a more precise
exact match.
This avoids us accidentially matching on "qcom,adreno-smmu" and
the hilarity that ensues.
Fixes: 5a13507164 ("freedreno/perfcntrs: add fdperf")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5762>
(cherry picked from commit 385d036f58)
This was causing vkAcquireNextImageKHR to not signal the fences and
semaphores. In the case where the semaphore was brand new, this could
cause an unsignalled syncobj to be passed into execbuffer2 which it will
reject with -EINVAL leading to VK_ERROR_DEVICE_LOST. Thanks to Henrik
Rydgård who works on the PPSSPP project for helping me figure this out.
Fixes: ca3cfbf6f1 "vk: Add an initial implementation of the actual..."
Fixes: 778b51f491 "vulkan/wsi: Add a hooks for signaling semaphores..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5672>
(cherry picked from commit b0bbb62325)
Some Piglit tests (rightfully) fail because of min >= max when exposed
to perf counters that do not explicitly define their max value.
Failing tests:
spec/amd_performance_monitor/api/test_counter_info
spec/amd_performance_monitor/vc4/test_counter_info
u32/u64 changes are no-ops.
Fixes: 4cd1cfb983 ("st/mesa: implement GL_AMD_performance_monitor")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5473>
(cherry picked from commit 2f4a112ec4)
Dot product is multiplication followed by addition, and absolute value
does not distribute into addition.
Only vec4 platforms are affected by this change as scalar-only platforms
never have any of the fdot_replicated instructions. In the shader-db
results, below, shaders in MANY different applications are affected.
Trine, Doom3, Enemy Territory: Quake Wars, Counter Strike: Global
Offensive, Mad Max, Metro Last Light, and on and on... I'm really
shocked that there were no test regressions!
All Haswell and earlier platforms had similar results. (Haswell shown)
total instructions in shared programs: 16219743 -> 16219820 (<.01%)
instructions in affected programs: 12171 -> 12248 (0.63%)
helped: 1
HURT: 78
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.78% max: 0.78% x̄: 0.78% x̃: 0.78%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.35% max: 2.38% x̄: 0.91% x̃: 1.06%
95% mean confidence interval for instructions value: 0.92 1.03
95% mean confidence interval for instructions %-change: 0.78% 1.00%
Instructions are HURT.
total cycles in shared programs: 538481383 -> 538491045 (<.01%)
cycles in affected programs: 470796 -> 480458 (2.05%)
helped: 149
HURT: 142
helped stats (abs) min: 1 max: 1338 x̄: 71.13 x̃: 4
helped stats (rel) min: 0.06% max: 40.99% x̄: 2.76% x̃: 0.67%
HURT stats (abs) min: 1 max: 2092 x̄: 142.68 x̃: 12
HURT stats (rel) min: 0.07% max: 55.38% x̄: 5.07% x̃: 1.07%
95% mean confidence interval for cycles value: -5.28 71.69
95% mean confidence interval for cycles %-change: -0.07% 2.19%
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Closes: #3129
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5581>
(cherry picked from commit 8591adea38)
This code was broken, but it worked by accident, as the
pad and the edgeflag were reversed, however when Roland
removed the cliptest field back in 2015, he stopped copying
the pad which actually stopped copy the edgeflag.
Fix the function to actually copy the edgeflag.
Fixes: 1b22815af6 ("draw: don't pretend have_clipdist is per-vertex")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5679>
(cherry picked from commit 2bf2e6c83d)
The code was expecting the lower 32-bits of the 64-bit to be
what it wanted, don't be implicit, pull the value from the union.
This should fix rendering on big endian systems since NIR was
introduced.
Fixes: 44a6b0107b ("gallivm: add nir->llvm translation (v2)")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5677>
(cherry picked from commit 237d728418)
Now that we have scripts in place to do baremetal testing of cheza, switch
it over. As of this writing, we have 5 chezas for baremetal and 4 for the
old docker CI setup (just 2 fewer than we originally had before this work,
since some had had filesystem failures and I switched those first), and
once we are sure of this we can backport to stable branch CI and move the
rest of them to baremetal.
I've run a lot of jobs through the baremetal scripts as I worked on
sorting out vulkan CTS stability, so I feel good about the stability of
the GLES CTS here.
The options job is now split out to separate jobs, as we don't currently
have a way to stack multiple sets deqp runs with different env vars in a
single baremetal run, and just chaining cros_servo.sh invocations runs
into a lack of cleanup of the serial-watching scripts which we rely on
container exit sorting out for us. This means a little less than 2x the
artifacts downloads we had before for a630 and a few more container
instantiations.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit 6f4fc4ff71)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
This will let us:
- deploy kernels for testing code depending on new kernel featuers
- Ensure a pristine state in the HW before starting our tests
- Avoid disk rot on the chezas taking them out (we'd lost 3/9 in a few
months).
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit c89a749f66)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
This is a set of kernel options I've come up with mostly cribbing from
chrome os's kernel config snippet. We also build an lzma kernel, as
uncompressed kernel is big but lzma is the only compression supported by
the bootloader. With that image, we have to pack it into a FIT formatted
image+dtb blob.
CONFIG_SUNRPC_DEBUG is added so that you can set "nfsrootdebug" to figure
out what's going wrong with your nfs mount (mine were "both the tcp and
nfsvers options were required, and don't try to use 'default' as the root
path to defer to DHCP's answer because otherwise you get
/tftpboot/default, just use an empty root path which doesn't prepend
/tftpboot.")
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit 3a1010e21a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
When you're bringing up a new driver in CI with significant number of
failures (or when a CI run breaks a driver), the QPA extraction can easily
take the whole job timeout as we go about processing each QPA (100 of them
in my early VK CI fails) per unexpected result we're saving (50), which
involves reading and each line of the file in shell. By quickly filtering
out the QPA files not including our test, we can save all that shell
overhead, bringing QPA extract time down to a couple of minutes.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
(cherry picked from commit f0c102c075)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
We were doing sed -i /filter/p, which printed everything but printed the
filtered things twice (though they'd only get tested once). Now that the
filter works, run all the UBO tests instead of doing a 1/5 run, revealing
new failures.
(cherry picked from commit 90cf494338,
commit message and UBO fail list updated since we don't have the UBO
fixes in)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
The runner is an x86 system, so running the ARM image meant doing
everything at runtime under qemu, and for the xz of the test rootfs that
was quite expensive. Also, we can rebuild x86 images much faster than we
can rebuild arm images for container development, which will help unblock
some of the other feature parity work I have to do versus the old docker
system that cheza is using.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
(cherry picked from commit 68b3b5bcab)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
v2: Python's nitpick (Andres)
In case of an empty list of traces, the results.yml will contain an empty
curly braces. In YAML, an associative array can also be specified by text
enclosed in curly braces ({}),
This commit also adds the corresponding test to check the behavior of
tracie when no traces are added in the traces.yml file.
Signed-off-by: Pablo Saavedra <psaavedra@igalia.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 4504d6374d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
test_tracie_skips_traces_without_checksum does the logic previous to
the commit 8546d1dd78. The traces.yml includes
several traces, only the one without checksum is ignored by tracie.
As a complementary action, this change adds an new test
(test_tracie_only_traces_without_checksum) to verify the behavior for
cases where the traces.yml only contains traces without checksum.
Finally, test_tracie_skips_traces_without_checksum is renamed as
test_tracie_traces_with_and_without_checksum
Signed-off-by: Pablo Saavedra <psaavedra@igalia.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit e85dc9a240)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
v2: Verbatim translation from the original shell script
Make the corrections visible in explicit commits (Andres)
Remove redundant code (Alexandros)
Code style nitpick (Rohan)
Reimplementation of the tracie's self-tests using a pythonic test suit
(pytest).
The new tracie/test.py module is almost a direct translation of the
tests defined in the tracie/test.sh. This new implementation of the
test provides a more common framework where define the tests.
Also allows a better introspection for the tests results and/or
resulting errors.
This patch also adds python3-pytest as dependency for the built images
and adapts the tracie-runner scripts to run the self-test using pytest.
Signed-off-by: Pablo Saavedra <psaavedra@igalia.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> [v1]
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com> [v1]
(cherry picked from commit 550a4f7764)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
RESULTS_PATH and RESULTS_PATH, as variables in the module context, are
resolved one single time, only during the first module loading. If the
the Python code in execution changes the current dir at some point,
those paths are not going to be updated anymore keeping the paths
wrongly pointing to the old working dir.
This change modify the definition of those variables to use simply
relative paths.
Signed-off-by: Pablo Saavedra <psaavedra@igalia.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
(cherry picked from commit eb1f22fb01)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
All the other files in bin/ are not used by any build system and as such
cannot affect the build.
I've been working on maintainer tools lately and it's frustrating to have
the CI wait for 45 minutes to rebuild everything and not even read/run
the files in the MR when it could've just been merged and moved on to
the next MR 45 minutes ago.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 576bff5c73)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
We don't want LLVM 8 packages to be pulled in from testing though (it
would make installing llvm-8-dev for cross architectures a lot more
complicated), so explicitly select buster-backports for them (they were
already implicitly installed from there before, since they're not
available in buster proper).
(cherry picked from commit fd9b445145)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
robclark found that we needed unique IDs when multiple runners were trying
to report flakes at the same time, but it turns out due to nick limits (16
chars on freenode) we were just getting all the runners appended with
"-142" (or whatever the prefix of the pipelines are these days). And, for
the new flake reporting from baremetal, all the runners ended up being
just "google-freedreno".
Reviewed-by: Rob Clark <robdclark@chromium.org>
(cherry picked from commit 2637961d29)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
We were incorrectly taking the merge-request on non-MR pipelines (the
master build after merge) due to a missing '$'. And, for those pipelines,
it would be nice to note whether they're for master or a stable branch.
Reviewed-by: Rob Clark <robdclark@chromium.org>
(cherry picked from commit 2c50176dfe)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5548>
layout_resource_for_modifier() needs to handle DRM_FORMAT_MOD_INVALID
as well, since src/gallium/frontends/dri/dri2.c uses this to indicate
"no modifier" when it's called through the older non-modifier entry
points.
This is similar to 334788d4 ("freedreno: allow INVALID modifier") but
for the generic implementation.
Fixes: 98910626 ("freedreno/a6xx: Implement layout for DRM_FORMAT_MOD_QCOM_COMPRESSED")
Closes: #3154
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5611>
(cherry picked from commit bf92f041fe)
Previously all nir_intrinsic_load_uniform that were used as sources were
considered to be dynamically_uniform but when offsets of load_uniform
are indirect it can not be determined.
This fixes artefacts in Google Maps 3D view in V3D.
Fixes: 886d46b089 ("nir: Add a function to determine if a source is dynamically uniform")
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5587>
(cherry picked from commit ba15bb383f)
We were failing to tell the allocator about the restriction that scalar
texture instructions (allocated as scalar regs) couldn't be allocated such
that the start of the full unwritemasked vector started before r0. There
was a patch in select_reg_callback on a6xx that tried to work around that,
but you could still end up backed into a corner you shouldn't be because
we didn't tell the RA what it needed.
Fixes compiler assertion failures on a300-a400's blit_z shader, used for
Z32F gmem blits.
Looks like as a result we get tighter register allocation but more nops:
instructions in affected programs: 757945 -> 760356 (0.32%)
nops in affected programs: 317983 -> 320468 (0.78%)
non-nops in affected programs: 27525 -> 27451 (-0.27%)
mov in affected programs: 3098 -> 3023 (-2.42%)
dwords in affected programs: 109664 -> 110656 (0.90%)
last-baryf in affected programs: 112701 -> 112847 (0.13%)
full in affected programs: 4326 -> 4011 (-7.28%)
sstall in affected programs: 120550 -> 120836 (0.24%)
(ss) in affected programs: 13939 -> 13918 (-0.15%)
(sy) in affected programs: 3006 -> 2786 (-7.32%)
(cherry picked from commit b420d04e1f)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5612>
We were failing to tell the allocator about the restriction that scalar
texture instructions (allocated as scalar regs) couldn't be allocated such
that the start of the full unwritemasked vector started before r0. There
was a patch in select_reg_callback on a6xx that tried to work around that,
but you could still end up backed into a corner you shouldn't be because
we didn't tell the RA what it needed.
Fixes compiler assertion failures on a300-a400's blit_z shader, used for
Z32F gmem blits.
Looks like as a result we get tighter register allocation but more nops:
instructions in affected programs: 757945 -> 760356 (0.32%)
nops in affected programs: 317983 -> 320468 (0.78%)
non-nops in affected programs: 27525 -> 27451 (-0.27%)
mov in affected programs: 3098 -> 3023 (-2.42%)
dwords in affected programs: 109664 -> 110656 (0.90%)
last-baryf in affected programs: 112701 -> 112847 (0.13%)
full in affected programs: 4326 -> 4011 (-7.28%)
sstall in affected programs: 120550 -> 120836 (0.24%)
(ss) in affected programs: 13939 -> 13918 (-0.15%)
(sy) in affected programs: 3006 -> 2786 (-7.32%)
(cherry picked from commit b420d04e1f)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5612>
wsi_release_display() implements vkReleaseDisplayEXT() which
is supposed to return control to the lessor of an output
upon call.
We need to terminate the wsi->wait_thread when close()'ing
the wsi->fd, otherwise the wait_thread holds another reference
to the wsi->fd, keeping the lease active, and thereby the
leased output blocked, until vkDestroyInstance() is called.
This gives users their GUI back, instead of extended darkness.
Fixes: 352d320a07 ("vulkan: Add EXT_direct_mode_display [v2]")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5396>
(cherry picked from commit 2cc51b0dff)
If on a nested loop
- the outer loop needs WQM but
- the inner loop doesn't need WQM and
- the break condition of the inner loop is computed in the outer loop
then it could happen that we transitioned to Exact before entering the inner loop
which could create an empty exec mask and lead to an infinite loop.
Fixes a GPU hang with RDR2
Cc: 20.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5518>
(cherry picked from commit 3817fa7a4d)
The approach taken in this commit only works on drivers that expose
the PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT capability. For drivers
that don't, the buffer has been unmapped by the time we get to
hud_draw_colored_prims, leading to crashes.
It's not easy to fix the code, but drivers that do support coherent
mapping will most likely do the right think themseleves, so let's just
go back to using user-buffers here.
This reverts commit 4fe1fd4df4.
Fixes: 4fe1fd4df4 ("gallium/hud: don't use user vertex buffers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3106
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5417>
(cherry picked from commit 7b86920ae2)
Unreal Engine 4 has a bug in usage of glDrawRangeElements,
causing it to be called with a number of vertices in place
of "end" parameter (which specifies the maximum array index
contained in indices).
Since there is unknown amount of games affected and we
could not identify that a game is built with UE4 - we are
forced to make a blanket workaround, disregarding max_index
in range calculations. Fortunately all such calls look like:
glDrawRangeElements(GL_TRIANGLES, 0, 3, 3, ...);
So we are able to narrow down this workaround.
This was uncovered after b684030c3a
broke a bunch of UE4 games.
Cc: 20.1 <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2917
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5203>
(cherry picked from commit a751051248)
The current "ds = state->bs" seems broken, and the "vs = state->bs" is
unnecessary (already set above). Since it was added as part of a GS-related
patch, I think this is what was intended.
Note: tesselation disables GMEM rendering so we shouldn't have to worry
about hs/ds + binning interaction.
Fixes: 0eebedb619 ("freedreno/a6xx: Emit program state for GS")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5370>
(cherry picked from commit 6cc95abb27)
Tessellation shader in llvmpipe depends on llvm coroutines support. So do not
advertise tessellation shader support in llvmpipe if GALLIVM_HAVE_CORO is FALSE.
This fixes assertion in LLVMTokenTypeInContext() running tessellation shader
tests with llvm version < 6.
Fixes: eb522717 "llvmpipe: add support for tessellation shaders"
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5366>
(cherry picked from commit dd81f4853c)
the virgl CI code was using the noopt path and crashing with a
wierd can't select llvm.coro.subfn.addr error, turns out we have
to call the cleanup pass no matter what.
This enable a lot more virgl gles31 passes, but we have
to disable tessellation shaders as now they executed, they
crash due to missing OES_gpu_shader5, I should try and reenable
them when llvmpipe is further along
Fixes: d32690b43c ("gallivm: add coroutine pass manager support")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Elie Tournier <elie.tournier@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5320>
(cherry picked from commit c8c7450fc7)
The xml.etree.cElementTree module will be removed in Python 3.9. Since
Python 3.3 the xml.etree.cElementTree module has been deprecated, the
xml.etree.ElementTree module uses a fast implementation whenever
available.
Builds using Python 2.7 can still work but with the slower
implementation.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5349>
(cherry picked from commit faa339e666)
The script was failing for me (python 3.8), not sure if this is a recent
python version break or not as I don't know how often people have been
running this script:
Processing ./gen9.xml... Traceback (most recent call last):
File "./gen_sort_tags.py", line 177, in <module>
main()
File "./gen_sort_tags.py", line 170, in main
genxml[:] = enums + sorted_structs.values() + instructions + registers
TypeError: can only concatenate list (not "odict_values") to list
Turning the odict into a list fixes it for me, and the resulting xml
file are identical to before :)
Fixes: 903e142f0d ("genxml: add a sorting script")
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5352>
(cherry picked from commit 981d07c74a)
We reuse DRM file descriptors internally. Therefore when we export a
GEM handle we must do so in the file descriptor used externally.
This change also fixes a file descriptor leak of the FD given at
screen creation.
v2: Don't bother checking fd equals, they're always different
Fix dmabuf leak
Fix GEM handle leaks by tracking exported handles
v3: Check os_same_file_description error (Michel)
Don't create multiple exports for a given GEM table
v4: Add WARN_ONCE (Ken)
Rename external_fd to winsys_fd
v5: Remove export lock in favor of bufmgr's
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2882
Fixes: 7557f16059 ("iris: share buffer managers accross screens")
Tested-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4861>
(cherry picked from commit aba3aed96e)
mesa/st is initializing pipe_shader_state for user define shaders.
This patch intialized pipe_shader_state for all passthough
and transform shaders.
This fixes crashes for several opengl apps. Issue is found in vmware
internal testing
Fixes: f01c0565bb ("draw: free the NIR IR.")
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5240>
(cherry picked from commit 838666a41d)
Currently the nir linker resizes the amount of storage needed to hold
uniform information on the fly while linking. As shaders can contain
thousands of uniforms this can be very slow. For example some Godot
shaders can take 30 seconds to compile on some machines.
In this change we count the amount of storage needed before we start
processing the uniforms. This is what the GLSL IR linker does also.
Fixes: 95f555a93a ("st/glsl_to_nir: make use of nir linker for linking uniforms")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2996
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5137>
(cherry picked from commit f1acf492de)
Generally we do not completely stop compilation as soon as we see an error,
instead we continue on to attemp to find any futher errors.
This means we shouldn't be checking state->error to see if any error has
happened during the compilation process, doing so was causing
process_parameters() to fail on completely valid functions if there was
any error found in the shader previously. This then caused the valid
functions not to be found because the paramlist was considered empty,
resulting in the compiler spewing out misleading error messages.
Here we simply add the IR error value to the param list when we have
an issue with processing a parameter, this leads to much better error
messaging.
Fixes: 53e4159eaa ("glsl: stop processing function parameters if error happened")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5205>
(cherry picked from commit f6214750eb)
This was implemented in version 6 of the VK_ANDROID_native_buffer
extension and we only implement version 5. However, the Android
Vulkan loader only checks whether vkGetInstanceProcAddr for the
function is not NULL.
This all went wrong when we switched to the layer code from ANV.
Because the function may now be different per device, it adds fallback
functions that dispatch to the dispatch table. So if we didn't implement
the function we still returned a pointer to the dispatch function,
which made the Android Vulkan loader believe it was supported.
Dispatch functions:
d555794f30/src/amd/vulkan/radv_entrypoints_gen.py (L328)
Fixes: d555794f30 "radv: update entrypoints generation from ANV"
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2936
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5198>
(cherry picked from commit be784cc77b)
Fix warning reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member m_num_clip_dist is not
initialized in this constructor nor in any functions that it calls.
Fixes: f7df2c57a2 ("r600/sfn: extract class to handle the VS export to different stages")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5180>
(cherry picked from commit 73c0f60d8c)
Issue description from Matt's commit e7c376ad:
"var_range_end(v, n) loops over the n components of variable number v and
finds the maximum value, giving the last use of any component of v.
Therefore it expects v to correspond to the variable associated with the
.x channel of the VGRF.
var_from_reg() however returns the variable for the first channel of the
VGRF, post-swizzle.
So, if the last register had a swizzle with y, z, or w in the swizzle
component, we would read out of bounds. For any other register, we would
read liveness information from the next register.
The fix is to convert the src_reg to a dst_reg in order to call the
dst_reg version of var_from_reg() that doesn't consider the swizzle."
Closes: #3003
Fixes: 48dfb30f ('intel/compiler: Move all live interval analysis results into vec4_live_variables')
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrii Simiklit <asimiklit.work@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4941>
(cherry picked from commit d1b7462849)
Meson 0.55.0 will set the MESON_EXE_WRAPPER environment variable to the
joined version of that wrapper if it is needed. Our tests that take
compiled targets as arguments can use that information to run cross
built binaries, or if there isn't a wrapper and we get an ENOEXEC, we
can skip the tests gracefully.
We try to use mesonlib.split_args, which handles windows arguments
better than python's builtin shlex module, but fall back to that if the
meson module isn't available for some reason.
Cc: 20.0 20.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5103>
(cherry picked from commit 5580322486)
On EGL 1.4, one had to check for the existence of EGL_EXT_platform_base
before querying the eglGetPlatformDisplayEXT() and
eglCreatePlatformWindowSurfaceEXT() symbols, to then use them if the
EGL_EXT_platform_* extension for the given platform was exposed.
Since EGL 1.5, the platform functionality was made core, which means we
can obtain the symbols unconditionally, but we can't know the EGL
version before having created a display, at which point we've already
done a platform selection by passing an EGLNativeDisplay. The
EGL_KHR_platform_* extensions thus are used by clients to know whether
it's safe or not to dlsym() the EGL 1.5 symbols.
This commit adds those extensions when the given platform is enabled.
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5052>
(cherry picked from commit a3fb064e00)
Since commit 8b8af6d398 there is a
performance regression in dirt 4 on picasso APUs.
The game ends up feeding a large value into this which overflows on the
conversion to 16bit float. With the old implementation (which now lives
in util_float_to_half_rtz) it would be clamped to inf-1, while the new
one returns inf. This causes a performance hit somehow at some point
down the line.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: 8b8af6d398 "gallium/util: Switch util_float_to_half to _mesa_float_to_half()'s impl."
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5062>
(cherry picked from commit 78615dcca1)
If we have a separate render resource, it may contain more up-to-date
data than what is available in the base resource, so we need to retarget
the transfer to this resource. As the most likely reason for the
existence of the render resource is a multi-tiled render layout we need
to allow this transfer to go through the resolve/blit copy path, as we
can't de-/tile those layouts in software.
Fixes: b962776530 (etnaviv: rework compatible render base)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5051>
(cherry picked from commit 9d1821adf0)
The GC880 on iMX6DL indicates in it's minorFeatures2 register that it
does support SEAMLESS_CUBE_MAP, however when the TE.SAMPLER_CONFIG1
VIVS_TE_SAMPLER_CONFIG1_SEAMLESS_CUBE_MAP bit is set on GC880 on iMX6DL,
the result is corrupted image. In particular, the following ~112 dEQPs
are affected and fail:
dEQP-GLES2.functional.texture.filtering.cube.*
This only happens on MX6DL GC880, MX6Q GC2000 and STM32MP1 GC400(GCnano)
do not report the minorFeatures2 SEAMLESS_CUBE_MAP bit and ignore the
TE_SAMPLER_CONFIG1 VIVS_TE_SAMPLER_CONFIG1_SEAMLESS_CUBE_MAP bit (note
that ss->seamless_cube_map is unconditionally set by mesa at times even
PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE returns 0), so there is no visible
problem and there are no failing dEQP tests on the GC2000 and GCnano.
This might imply that the minorFeatures2 SEAMLESS_CUBE_MAP has some
different meaning on GC880 or the SEAMLESS_CUBE_MAP behaves differently
on the GC880.
This patch does not set the SEAMLESS_CUBE_MAP bit on hardware which does
not indicate support for seamless cube map and on GC880, which results
in reduction in failed dEQPs: 635 to 186 on GC880, 274 to 270 on GC2000
and no change on GC400(GCnano).
Fixes: 8dd26fa2f0 ("etnaviv: support GL_ARB_seamless_cubemap_per_texture")
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4865>
(cherry picked from commit 2b535ac61b)
We can no longer assume that `state->ranges[0]` is block 0. It *often*
is, but when we encounter a "real" ubo that we lower to `load_uniform`
before a block 0 `load_ubo`, it could end up another entry in the table.
Resulting in the second pass after gathering ubo ranges, not finding a
valid range. Which results in a `load_ubo` for a thing that is not
actually a ubo making it's way into ir3 frontend. Resulting in grabbing
what we think is a ubo address out of some unrelated const register, and
trying to dereference that. Which as you can imagine, fails in amusing
ways.
Fixes: fc850080ee ("ir3: Rewrite UBO push analysis to support bindless")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4954>
(cherry picked from commit d69f6fd852)
src/panfrost/shared is shared with lima driver, build
bifrost_compiler for lima driver is meaningless and
get link error when only lima driver is enabled.
So only build bifrost_compiler when configued with:
meson -Dtools=panfrost
Fixes: ec2a59cd7a "panfrost: Move non-Gallium files outside of Gallium"
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4960>
(cherry picked from commit 07b0fbea92)
This is currently broken hard, because this code is being used in more
places that it used to be, and fixing that is prohibitively hard right
now.
This is far from ideal, as it leaves the same inconsistency in the
EMBEDDED_DEVICE code-path. But that only used by VMWare, so it's
probably better if they fix it, as they know their requirements better
than we do.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2911
Fixes: 76f79db3f5 ("util: stop including files from mesa/main")
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4919>
(cherry picked from commit 7ba2333cc1)
DXT1_RGBA and sRGB variants of DXT[135] formats are enabled as
valid format on V3D.
Once all S3TC formats supported by V3C are enabled the following
extensions become exposed by gallium.
* GL_ANGLE_texture_compression_dxt3
* GL_ANGLE_texture_compression_dxt5,
* GL_EXT_texture_compression_dxt1
* GL_EXT_texture_compression_s3tc
* GL_S3_s3tc
* GL_EXT_texture_compression_s3tc_srgb
This enables 206 passing piglit test related to gl_compressed.*s3tc_dxt
Cc: 20.0 20.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4934>
(cherry picked from commit 905edc376d)
Swizzles were ignoring the W component of the format DXT3_RGBA and
DXT5_RGBA.
This fixes 15 piglit tests:
spec/!opengl 1.1/copyteximage 2d
spec/!opengl 1.2/copyteximage 3d
spec/arb_texture_compression/fbo-generatemipmap-formats/gl_compressed_rgba
spec/arb_texture_compression/fbo-generatemipmap-formats/gl_compressed_rgba npot
spec/arb_texture_compression/texwrap formats bordercolor-swizzled/gl_compressed_rgba, swizzled, border color only
spec/arb_texture_compression/texwrap formats bordercolor/gl_compressed_rgba, border color only
spec/arb_texture_cube_map/copyteximage cube
spec/arb_texture_cube_map/copyteximage cube samples=2
spec/arb_texture_cube_map/copyteximage cube samples=4
spec/arb_texture_rectangle/copyteximage rect
spec/arb_texture_rectangle/copyteximage rect samples=2
spec/arb_texture_rectangle/copyteximage rect samples=4
spec/ext_texture_array/copyteximage 2d_array
spec/ext_texture_array/copyteximage 2d_array samples=2
spec/ext_texture_array/copyteximage 2d_array samples=4
Fixes: 469bbd8387 "broadcom/vc5: Move the formats table to per-V3D-version compile."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4934>
(cherry picked from commit e3ecf48dda)
../src/mesa/drivers/dri/i965/brw_wm_surface_state.c:1378:32: runtime error: index 3503345872 out of bounds for type 'uint32_t [149]'
brw_assign_common_binding_table_offsets has the following comment:
"Unused groups are initialized to 0xd0d0d0d0 to make it obvious that they're
unused but also make sure that addition of small offsets to them will
trigger some of our asserts that surface indices are < BRW_MAX_SURFACES."
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4350>
(cherry picked from commit 784358bd6e)
gen12 does away with the single patch dispatch mode for tcs, and
increases some limits so that 8_patch mode can always work. Make the
necessary changes so we don't try to fall back to single patch mode.
Fixes KHR-GL46.tessellation_shader.single.max_patch_vertices and others
Fixes: 44754279ac ("intel/fs/gen12: Use TCS 8_PATCH mode.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4843>
(cherry picked from commit 65b05ebdda)
The SFID field of the SHADER_OPCODE_MEMORY_FENCE and
SHADER_OPCODE_INTERLOCK instructions now indicates the target function
of the memory fence. Account the cycle-count cost to the right shared
unit.
Fixes: f858fa26b4 ("intel/fs,vec4: Pull stall logic for memory fences up into the IR")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4817>
(cherry picked from commit 0842758ec0)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.