If we made a copy deref, then we need to do dead-write elimination for the
pervious writes or we'll just emit the same copy deref again next time
around. And, at the end of the opt loop, we need to lower copy derefs
because later passes (locals_to_regs, notably) depend on it.
Fixes infinite opt loop on fs-function-inout-array with virgl on NTT.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15899>
rusticl (and clover) would like to get a graceful fail here so they can
fall back to a shadow copy instead of us asserting. We also start
rejecting arrayed surface because isl doesn't allow selecting a QPitch
yet. Even if it did, QPitch is horribly restrictive, even for linear
surfaces, that it likely wouldn't be that useful.
Fixes: e81f3edf76 ("iris: Allow userptr on 1D and 2D images")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15903>
Even if we're the first job on some queue, there may be no wait
semaphores but we still need to ensure things happen in-order. (See
the "Implicit Synchronization Guarantees" section of the Vulkan spec.)
The client can submit back-to-back command buffers with no semaphores
between them and it needs to adt the same as if there were a semaphore.
If job->serialize is set because of a barrier or something, we still
need to synchronize across HW queues by waiting on last_job_syncs.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
In order to properly wait for a query to be complete, we need to first
wait for the end query job to flush through on the queue. Since query
end is always handled on the CPU, we can do this with a condition
variable. The 2s timeout is taken from ANV.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
Instead of having the CPU job execute the CSD job, put both jobs on the
list with the CPU job first which modifies the GPU job which gets kicked
off next. This gives the queue code more visibility into what types of
jobs are actually in the list. In particular, if an indirect compute
job is the last job in a batch buffer, it currently appears as if the
batch ends with CPU work which isn't true because it kicks off GPU work.
In that case, the last job on the list is now a GPU job, which better
matches reality.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
The v3dv kernel driver doesn't support timelines yet but we want
threaded submit and that requires WAIT_PENDING. Fortunately, it should
never sit in this loop for long in practice. The primary use-case is
sorting out dependencies and these checks will always trivially succeed
for non-shared semaphores because v3dv only has a single queue.
Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
c78be5da30 ("intel/fs: lower ray query intrinsics") introduced a
helper function using nir_(push|pop)_if which invalidated dominance &
block_index for the replacement of nir_intrinsic_rt_trace_ray.
We can still keep dominance/block_index metadata for the lowering of
nir_intrinsic_rt_execute_callable though.
This change uses 2 different lowering function with correct metadata
preservation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15910>
FLUSH_HDC is sufficient to flush things out to L3, so we'd rather
use that where possible. It's also emulated via DATA_CACHE_FLUSH
on platforms where it isn't supported, so we can use it unconditionally.
We still use DATA_CACHE_FLUSH for invalidating the data cache, and to
flush the DC-tagged cachelines in L3 to be globally-observable.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
Push constant loading is not coherent with L3 according to the document
that describes the hardware change for the vertex buffer L3 Bypass
Disable field.
If we've updated a push constant buffer with say, a blorp_buffer_copy,
we may need to flush both the render cache and the tile cache.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
When stream output is active, we need to let the cache tracker know
about any SO buffers, which we access via IRIS_DOMAIN_OTHER_WRITE.
In particular, we may have written to those buffers via another
mechanism, such as BLORP buffer copies. In that case, previous writes
happened via IRIS_DOMAIN_RENDER_WRITE, in which case we'd need to flush
both the render cache and the tile cache to make that data globally-
observable before we begin writing via streamout, which is incoherent
with the earlier mechanism.
Fixes misrendering in Ryujinx.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6085
Fixes: d8cb76211c ("iris: Fix MOCS for buffer copies")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
Most clients are L3-coherent these days. However, there are some
notable exceptions, such as push constants, stream output, and command
streamer memory reads and writes.
With the advent of the tile cache, flushing the render or depth caches
alone are no longer sufficient for memory to become globally-observable.
For those, we need to flush the tile cache as well. However, we'd like
to avoid that for L3-coherent clients, as it shouldn't be necessary,
and is expensive.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
The render, depth, sampler, and data (HDC) caches are all coherent
with L3. We consider OTHER_READ and OTHER_WRITE to be non-coherent,
as they're kitchen-sink domains which include non-L3-clients.
Starting with Tigerlake, the VF cache is coherent with L3 (because we
set the L3BypassDisable bit in the vertex/index buffer packets).
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
On Tigerlake, we use the data cache for reading indirect UBOs instead
of the sampler. But we still use the constant cache for direct UBO
access, so unfortunately we may access it through two different domains.
To work around this, we add a new domain for pull constants (UBOs),
which will be either constant+texture or constant+data.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
The bulk of IRIS_DOMAIN_OTHER_READ domain usage was the 3D sampler, but
there were also a few oddball cases like command streamer reads, blitter
access, and so on. The sampler is definitely L3 coherent, but some off
the more esoteric reads may not be, so I'd like to separate them, so
that OTHER_READ can become a non-L3-coherent kitchen-sink domain.
The sampler cases only need TEXTURE_CACHE_INVALIDATE, and can skip the
CONSTANT_CACHE_INVALIDATE we had on IRIS_DOMAIN_OTHER_READ.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
We were using IRIS_DOMAIN_OTHER_READ for read-only depth/stencil access
in an attempt to avoid unnecessary flushing; IRIS_DOMAIN_DEPTH_WRITE
could indicate read-write access.
However, IRIS_DOMAIN_OTHER_READ is clearly the wrong domain. Depth and
stencil data is read via the depth cache, while IRIS_DOMAIN_OTHER_READ
currently corresponds to the sampler cache and constant cache together
(although this will change in future patches).
It's unclear whether this hack was useful. For now, just drop it and
use the correct depth cache domain, even if it's marked as read-write.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
For texture fetches and buffer load the fix is not needed,
and the override creates faulty TGSI.
In addition remove all modifiers from the src in the additional mov
instruction.
Fixes: d1c7a7b131
virgl: Add an extra mov for int outputs from constant and immediate inputs
v2: Move workaround after the use of
virgl_tgsi_rewrite_src_for_input_temp (Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15896>
This patch adds support for transcoding ASTC to BC7 (BPTC) and prefers
it over BC3 (DXT5) when hardware supports that format.
BC7 is a much newer format (~2009 vs. ~1999) and offers higher quality
than the older BC3 format. Furthermore, our encoder seems to be faster.
Tapani put together a small benchmark for transcoding a 1024x1024 ASTC
texture, and switching from BC3 to BC7 improves performance of that
microbenchmark by 25% on my Tigerlake NUC (with hardware ASTC disabled
so we can test this path). Presumably, this isn't fundamental to the
formats, but rather reflects the speed of our in-tree compressors.
So, we should use BC7 where possible.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15875>
now zink will init using a priority system if multiple devices are available
multiple devices will ONLY be available if:
* the user does not specify VK_ICD_FILENAMES as they should
* the user does not specify LIBGL_ALWAYS_SOFTWARE
* multiple drivers exist
I've prioritized the virtualized gpu here with the assumption that if
such a thing is detected, the environment is most likely virtualized
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15857>
It can happen that a single vector constant is referenced multiple times
by the same node, with different swizzles.
This needs to be taken into account by checking and updating the
swizzles for all the srcs of a target node when inserting the const
node to the same instruction.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15726>
When emitting the AR forces splitting an ALU group, and at the same time
a new CF instruction is started, then the last instrcution in the finished
CF block might not have the "last" bit set, which results in an invalid
shader that might hang, or crash SB.
So when a new CF is started, force the last bit in the last ALU instruction.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
ALU_SRC_PARAM_BASE is an inline constant that defines the
address for pulling data from LDS memory for interpolation
and not a value from the kcache, so there is no need to
take these values into account when allocating kcache
load slots.
v2: Fix the constant range check to not exclude the translated
ranges for kcache banks 2 and 3.
v3: limit range check to only include kcache values and and
rename relevant function (Emma).
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
We don't need to disable this path if there are indirect or 8/16/64-bit
push constant loads. We can just use the default path for them.
fossil-db (Sienna Cichlid):
Totals from 21 (0.02% of 134621) affected shaders:
CodeSize: 2028 -> 1884 (-7.10%)
Instrs: 366 -> 363 (-0.82%); split: -2.46%, +1.64%
Latency: 6630 -> 6579 (-0.77%)
InvThroughput: 26520 -> 26316 (-0.77%)
Copies: 84 -> 102 (+21.43%)
PreSGPRs: 141 -> 222 (+57.45%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145>
NIR doesn't propagate precise through moves, and with NTT the
last output is usually preceded by a move, so that we no longer
see that the evaluation of some value is supposed to be exact,
and, hence we can't decorate the outputs accordingly.
Fixes with NTT:
dEQP-GLES31.functional.tessellation.common_edge.
triangles_equal_spacing_precise
triangles_fractional_odd_spacing_precise
triangles_fractional_even_spacing_precise
quads_equal_spacing_precise
quads_fractional_odd_spacing_precise
quads_fractional_even_spacing_precise
v2: Don't clear the precise flag when we hit a mov, because we may
hit a if/else construct like below and we don't track branches
IF X
TEMP[0] = OP_PRECICE ...
ELSE
TEMP[0] = MOV CONST[]
ENDIF
Thanks Emma for pointing out the problem.
v2: allocate precise handling flags to transform_prolog (Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15836>
drm_shim.c undefines the _FILE_OFFSET_BITS macro, so plain off_t might
be 32 bits, while it's 64 bits in device.c. To avoid this mismatch,
use off64_t which will always be 64 bits in both source files.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
loader_open_render_node returns the first device in /dev/dri that it
can use. To make sure the drm-shim device always gets chosen, return
the fake entries in readdir before returning the real ones.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
Will allow us to drop more GLSL IR code in future once we switch
all drivers to NIR. Also stops the need for all drivers to call
this pass to remove indirect temps that may have been added during
the NIR varying linking lowering/optimisations.
This patch fixes some tests on i915, d3d12, lima and vc4.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15871>
We've wanted to remove destructors from ralloc's API for a long time (it's
an extra storage cost per ralloc for a rarely-used feature), and for the
suballoc change we'd need to spend more storage on storing the tu_device
pointer per result since destructors don't get anything else but the
pointer passed into them.
Fixes use-after-frees:
=================================================================
==2383==ERROR: AddressSanitizer: heap-use-after-free on address 0xffff88fe1940 at pc 0xffff934f427c bp 0xfffff5481e90 sp 0xfffff5481ea8
WRITE of size 8 at 0xffff88fe1940 thread T0
#0 0xffff934f4278 in list_del ../src/util/list.h:108
#1 0xffff934f4278 in result_destructor ../src/freedreno/vulkan/tu_autotune.c:237
#2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#4 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
#5 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#6 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#7 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
[...]
0xffff88fe1940 is located 80 bytes inside of 112-byte region [0xffff88fe18f0,0xffff88fe1960)
freed by thread T0 here:
#0 0xffff9c1c90d8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
#1 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
#2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#4 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
#5 0xffff935cf2ac in tu_queue_submit_locked ../src/freedreno/vulkan/tu_drm.c:997
[...]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
We don't return unused space to the suballocator, so it's a little useful
to limit how much we overallocate to reduce memory footprint. I took a
look through the tu_cs_emit_array() calls and accounted for a couple of
them in the variant-specific space calculation, then dropped the base
allocation by factors of 2 until we started throwing asserts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
Allocating a BO for each pipeline meant that for apps with many pipelines
(such as Asphalt9 under ANGLE), we would end up spending too much time in
the kernel tracking the BO references.
Looking at CS:Source on zink, before we had 85 BOs for the pipelines for a
total of 1036 kb, and now we have 7 BOs for a total of 896 kb.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
The intel_perf_query path used for performance queries on GL was
passing a bogus "end" pointer to intel_perf_query_result_accumulate(),
causing it to accumulate garbage values. This was causing the values
of many performance counters to be corrupted.
The "end" pointer was incorrect because the current code was assuming
that different OA reports were located TOTAL_QUERY_DATA_SIZE bytes
apart, which is a hard-coded preprocessor define. However recent
(Gfx12+) hardware generations use a variable query size determined by
the query layout. Use the size derived from it instead, and remove
the stale define.
Fixes: 3c51325025 ("intel/perf: switch query code to use query layout")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15783>
gallium queries have to be mapped onto multiple vulkan queries,
this can happen for two reasons.
1. primitives generated and overflow any don't map directly, and
multiple vulkan queries are needs per iteration. These are stored
inside the "starts" as zink_vk_query ptrs.
2. suspending/resuming queries uses multiple queries, these are
the "starts". Every suspend/resume cycle adds a new start.
Vulkan also requires that multiple queries of the same time don't
execute at once, which affects the overflow any vs xfb normal
queries, so vk_query structs are refcounted and can be shared
between starts. Due to this when the draw state changes, it's
simple to just suspend/resume all queries so the shared vulkan
queries get handled properly.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15785>
This enables a lower power mode in the sampler hardware in certain
common scenarios. On Tigerlake, SAMPLER_MODE is not programmable by
userspace but the kernel already sets this bit for us.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
While this register still exists, it's no longer a per-context register.
Instead, on Gfx12+, SAMPLER_MODE exists per dual-subslice and is
accessed as a "multicast" register, where you write control which
version is accessed by the "steering control register".
At any rate, userspace cannot write it any longer, and so there's not
much point to it existing in our genxml (which was missing most of the
fields anyway).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
This allows the sampler to perform faster filtering of 8-bit UNORM
textures by filtering them at a different precision. The filtering
is intended to still be OpenGL and DirectX spec compliant.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
according to spec, a barrier is required any time the pixels of the
framebuffer are changed. since zink defers clears and runs them at
a later time, it must also be responsible for handling the required
synchronization for such operations
fixes (radv):
KHR-GL46.blend_equation_advanced.blend_all*
fixes#5572
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15831>
according to spec, for fbfetch this should match the subpass self-dependency of
* stage FRAGMENT_STAGE -> FRAGMENT_STAGE
* access 0 -> INPUT_ATTACHMENT_READ
zs fbfetch doesn't seem to be a thing, so that code is left for historical
and/or future purposes
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15831>
this is super annoying since it means that a build of zink cannot
be mix-and-matched with an existing build of lavapipe, e.g., for faster
bisecting
the env var should be sufficient to handle this, and if someone sets it
and doesn't have swrast enabled then they can deal with it
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15844>
Previously, the caller allocated storage and tgsi_transform_shader() would
emit into that, returning how many tokens it emitted. All the callers had
to guess at how much storage was necessary, trying not to over-allocate
but also getting enough that you wouldn't (effectively) silently run out
of space.
Instead, make tgsi_transform_shader() do the allocation for you, taking
just a hint of how much space you think you need, and internally double
size when necessary. Fixes failures on virgl with fp64 since we've added
more fp64 virglrenderer workarounds and its old "XXX: is this enough?"
allocation wasn't any more.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15782>
virglrenderer maps atomic accesses to atomic counter declarations using
the .Index field. We were previously emitting a .Index of 0 for array
accesses, so virglrenderer would emit
atomicIncrement(first_counter[counter_offset+array_index]). This would
mostly work because hardware doesn't care about the bounds of counter
declarations, but if the first counter was a non-array, then the [] GLSL
emit gets dropped (can't array access a scalar!) and you'd access the
non-array first_counter instead.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15824>
this is an illegal alignment, so clamp the range to the nearest
texel offset since the shader should be hitting the robustness
case for the partial texel
affects:
dEQP-GLES31.functional.texture.texture_buffer.modify.mapbuffer_readwrite.range_size_65537
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15827>
This uses WRITE_DATA on the ME engine to reset the register, to match what
PAL does on GFX9+.
This fixes
KHR-GL45.transform_feedback_overflow_query_ARB.multiple-streams-one-buffer-per-stream
on zink/radv.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15812>
The array is allocated for VkDrmFormatModifierPropertiesEXT, so
writring entried with type VkDrmFormatModifierProperties2EXT is
bogus.
It seems this was a mistake added with a change intended to get
rid of VK_OUTARRAY_MAKE, that changed the type of the write by
mistake.
Fixes: 56a2ccf058 ('v3dv: Stop using VK_OUTARRAY_MAKE()')
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15819>
There are reliability problems with the RTL8153 ethernet driver under
certain network loads, related to incompatibility of the device with
Link Power Management.
Add usbcore.quirks=0bda:8153:k to the kernel command line to enable the
USB_QUIRK_NO_LPM option.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15791>
This happens when we have a sequence of multiple beginQuery / endQuery
with the same target and query identifier.
As a BO is created on beginQuery but not free on endQuery (we need to
wait until either explicitly deleting the query or querying the
results), in the above sequence we are basically leaking the BO.
This ensures that any BO created before is unreference before creating
the new one.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15796>
We don't know if an application developer wants error-dialog by default
or not, even when using a Mesa OpenGL driver.
So let's not assume this is a good thing, and instead make an option for
this behavior that we can enable for CI testing.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15485>
These dialogs only exist on Windows, so let's not even expose a util
function for this on other platforms.
The code is only ever called from Windows specific code anyway.
While we're at it, clean up the name a bit as well.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15485>
Move one of the is_supported() check before we start filling the
structure so we don't end up with a partially filled object when
we return VK_ERROR_FORMAT_NOT_SUPPORTED (which deqp doesn't seem to like,
so it's probably coming from a spec requirement).
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15698>
VK_REMAINING_ARRAY_LAYERS maps to -1 in the D3D12 world. Let's make sure
we set WSize to -1 in that case, because the layer_count calculated by
dzn_get_layer_count() won't work for 3D images which never have more
than one layer (in case of 3D images, we treat slices as layers).
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15698>
We have been reconstructing/rematerializing uniforms for a while, but we
can do this in more scenarios, namely instructions which result is
immutable along the execution of a shader across all channels.
By doing this we gain the capacity to eliminate TMU spills which not
only are slower, but can also make us drop to a fallback compilation
strategy.
Shader-db results show a small increase in instruction counts caused
by us now being able to choose preferential compiler strategies that
are intended to reduce TMU latency. In some cases, we are now also
able to avoid dropping thread counts:
total instructions in shared programs: 12658092 -> 12659245 (<.01%)
instructions in affected programs: 75812 -> 76965 (1.52%)
helped: 55
HURT: 107
total threads in shared programs: 416286 -> 416412 (0.03%)
threads in affected programs: 126 -> 252 (100.00%)
helped: 63
HURT: 0
total uniforms in shared programs: 3716916 -> 3716396 (-0.01%)
uniforms in affected programs: 19327 -> 18807 (-2.69%)
helped: 94
HURT: 50
total max-temps in shared programs: 2161796 -> 2161578 (-0.01%)
max-temps in affected programs: 3961 -> 3743 (-5.50%)
helped: 80
HURT: 24
total spills in shared programs: 3274 -> 3266 (-0.24%)
spills in affected programs: 98 -> 90 (-8.16%)
helped: 6
HURT: 0
total fills in shared programs: 4657 -> 4642 (-0.32%)
fills in affected programs: 130 -> 115 (-11.54%)
helped: 6
HURT: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15710>
Bit 20 isn't actually MERGEDREGS, the mode for the entire geometry
pipeline is controlled by SP_VS_CTRL_REG0::MERGEDREGS and it appears to
be something preamble-related instead since writing any register in the
preamble hangs if it's set. This fixes those hangs on freedreno and
turnip since we no longer set it.
Fixes: fccc35c2de ("ir3: Add preamble optimization pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15801>
since 1.0 is used in nearly every case, drivers requiring this exporting
can avoid potential shader variants by adding a 1.0 export to the base
shader variant and the only using the ubo upload when pointsize is explicitly
set for wide point functionality
drivers can then be responsible for removing unused pointsize exports
as needed (or desired)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15699>
We don't need to restrict our timeout to 5 seconds, because the kernel's
hangcheck will ensure that the wait completes in finite time if the GPU
gets wedged. If the GPU is making progress, we don't want to time out
early and have pipe_transfer_map() return an error, causing glReadPixels()
to throw a confusing GL_OOM even though we're not out memory.
The INFINITE arg to this function isn't actually infinite, it's limited to
an hour. But an hour of GPU processing to wait on is probably plenty.
This 5s timeout has caused problems with the CTS on freedreno at high
parallelism, and I suspect is the cause of recent issues in the closed
traces replay jobs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15805>
Much less noisy, and provides a path to further improvements. There is a slight
behaviour change: int-to-float conversions now use RTE instead of RTZ. For
32-bit opcodes, this affects conversions of integers with magnitude greater than
2^23 by at most 1 ulp. As this behaviour is unspecified in GLSL, this change is
believed to be acceptable.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15187>
To prepare for defeaturing round modes, replace uses of round-to-positive with
round-to-even in our unit tests. This doesn't meaningfully impact test coverage;
there is no way to generate that round mode.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15187>
There are new data structures that we need to (dirty) track. Add the
corresponding fields so we can proceed as with Bifrost dirty tracking.
Trivial increase in memory usage, but that should not matter as batches are
few.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15797>
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. The
maximum number of gradient components and coordinate components should
be 2. In spite of this limitation, the Bspec lists a mysterious R
component before the min_lod, so the maximum coordinate components is 3.
Fixes the following Vulkan CTS failures on DG2:
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2d_fragment
The Fixes: tag below is a bit misleading. This commit fixes some test
cases similar to ones fixed by the Fixes: commit. I just want to make
sure this commit gets applied everywhere that commit was also applied.
Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15781>
Valhall removed the ability to set an explicit primitive restart index as
required by desktop OpenGL, in favour of fixed primitive restart indices only as
required by OpenGL ES.
Set the CAPs accordingly so that mesa/st lowers unusual primitive restart
indices at draw call time.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15795>
Surface descriptors have been replaced by plane descriptors, which facilitate
the intermediate layout of textures. This allows for more sophisticated handling
of texture compressions, of particular to interest to copy_image. However, it
requires a considerable amount of new logic to handle.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15795>
Issue: When a video is decoded where the max_references is updated the
decoder keeps using same old value. This results into green patches and
decoding is not proper.
Root Cause: The max_references is updated only once when the instance is
created for the first time.
Fix: Added a check along with the context->decoder to check if the
context->templat.max_references has changed. If yes, then we go ahead
and create the decoder again.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Krunal Patel <krunalkumarmukeshkumar.patel@amd.corp-partner.google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15750>
Based on hardware behaviour, it appears vertex_id is zero-based with the legacy
geometry flow but not with the new malloc IDVS flow. Since the geometry flow is
per-shader (not per-machine), there's not a good way to communicate this to NIR.
Rather than trying to shoehorn this obscure detail into NIR, just do the
lowering ourselves instead of in NIR. It's not much more code anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>
Our swizzle lowering optimizations depend on replication of scalar fp16. This
holds on Bifrost (at least for now), but not on Valhall which has proper support
for write masks. For now, enforce Bifrost-compatible behaviour as we do not make
use of the write masks on Valhall yet.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>
Varying stores was changed in Valhall. Rather than using attribute descriptors
like on Bifrost and Midgard, on Valhall we store to memory directly with
hardware-allocated buffers. This requires a new implementation of store_output,
with special provisions for writing gl_PointSize from a position shader.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>
Something an instruction has two logic flow controls, namely wait + reconverge.
These are orthogonal -- we need to insert a NOP to handle this. Add a lowering
pass that works out flow control to replace the ad hoc previous va_pack_flow.
Fixes dEQP-GLES31.functional.ssbo.layout.single_basic_type.shared.lowp_vec3.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15756>
Similar to pipeline statistics but done for a single counter.
We use REG_A6XX_RBBM_PRIMCTR_7 to get generated primitives
and not PRIMCTR_8 because PRIMCTR_7 counts pre-clipped prims
while PRIMCTR_8 counts them after clipping.
OpenGL spec for GL_PRIMITIVES_GENERATED says:
"Subsequent rendering will increment the counter once for every
vertex that is emitted from the geometry shader, or from the
vertex shader if no geometry shader is present."
Passes tests:
dEQP-VK.transform_feedback.primitives_generated_query.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15746>
This has one negative side-effect; we're no longer able to print
validation errors without dxcompiler.dll. I doubt that's a real problem,
but if it is, we should add this ability to dxil_validator instead of
having a second implementation here.
The reasons I didn't try adding this in the first place is:
1. This code seems a bit janky; it doesn't consult the "known"-variable
to figure out if the encoding is OK, and it's lacking a fallback path
in that case.
2. It seems unlikely that the compiler varies the encoding of the output
in the first place; one of the two code-paths in here is probably
untested.
3. Since dxil_validator leaves reporting to the call-site, we'd need to
either add and output-encoding to the API (yuck), or re-encode the
string to UTF-8 using WinAPI.
Right now, it seems questionable if fixing all of the above is worth it.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15751>
by utilizing a separate slot map for patch variables, the entire i/o
assignment mechanism can be simplified to accurately manage i/o for all
types of variables and avoid location conflicts
affects:
KHR-Single-GL46.enhanced_layouts*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15770>
the previous methodology triggered warnings any time a rasterizer state
was created with unsupported features without determining whether those
features would actually be used
a more optimal process is to check for missing features at pipeline creation,
as all the necessary info is now available, and spurious warnings can be avoided
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15778>
deqp-runner uprevved to reduce memory usage on HW runners, let us
experiment with shader cache on tmpfs, and hopefully provide a tool for
virgl to be able to plausibly run piglit under crosvm instead of vtest.
piglit uprevved to avoid a flake in softpipe in glx-multithread-texture,
and improve performance of the test, too. This also brings in the
fbo-blending-format-quirks fix to properly initialize the buffers, fixing
some fails/flakes.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15419>
When porting lavapipe to the common sync framework, I stole the dummy
sync signal_for_memory idea from RADV but didn't actually do it
correctly. Unless you set wsi_device::signal_semaphore_with_memory and
wsi_device::signal_fence_with_memory, it doesn't actually signal
anything. If you do set those, it works but also results in dummy
syncs being created for present fences which we see as signals. We
could choose to just skip those like RADV does but that's too magic.
Instead, have our own AcquireNextImage2() again which sets dummy syncs.
Fixes: 3b547a9b58 ("lavapipe: Switch to the common sync framework")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15774>
Cube descriptors require a different sampling instruction in shader,
however we don't know whether image is a cube or not until the start
of a renderpass. We have to patch the descriptor to make it compatible
with how it is sampled in shader.
For the reference subpassLoad is currently translated into isaml.a
Blob v615 also doesn't handle this case correctly.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15734>
When we don't have a good dxcompiler.dll that we can load IDxcLibrary
from to help with diagnostics, we currently return true for validation
even if the validation actually failed.
Let's fix that, and also add a debug-message explaining what went wrong
for those who are debugging and wondering what's up.
Fixes: 2ea15cd661 ("d3d12: introduce d3d12 gallium driver")
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15744>
In order to ensure consistent results when running performance tests,
lock the frequency for Intel GPUs to ~70% of the maximum allowed by
hardware.
This seems to offer a good balance between execution speed and results
consistency.
An increase of the frequency will also increase the rate of throttling
events, with a negative impact on consistency. Such events are logged,
as in the following example:
GPU throttling detected: act=200 min=850 cur=850 RPn=100
This shows the actual GPU frequency (200 MHz) dropped below the minimum
requested (850 MHz).
For more details about the various frequency information sources, please
see the script header comments in ".gitlab-ci/common/intel-gpu-freq.sh".
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15662>
Add script to manage Intel GPU frequencies.
It can be used for debugging performance problems or to lock a stable
frequency while executing benchmark tests.
Typical use cases:
- Get all available GPU frequency information
$ ./intel-gpu-freq.sh -g all
* Hardware capabilities
RP0: 1350 MHz
RPn: 100 MHz
RP1: 400 MHz
* Enforcements
max: 1350 MHz
min: 100 MHz
boost: 1350 MHz
* Actual
act: 100 MHz
cur: 400 MHz
- Lock frequency to 80% of the maximum allowed by hardware and enable
throttling detection
$ ./intel-gpu-freq.sh -s 80% -d
GPU throttling detected: act=1050 min=1350 cur=1350 RPn=100
GPU throttling detected: act=1100 min=1350 cur=1350 RPn=100
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15662>
Vulkan spec says:
"When an image view of a depth/stencil image is used as a depth/stencil
framebuffer attachment, the aspectMask is ignored and both depth and
stencil image subresources are used."
Since we use two planes for D32S8 format we have to add a special
case for depth in addition to already existing case for stencil.
Fixes hang in CTS:
dEQP-VK.renderpass.depth_stencil_write_conditions.stencil_kill_write_d32sf_s8ui
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15532>
- When resolving d32s8 to s8 we stored stencil with a wrong format.
- For unaligned multi-sample store we used wrong gmem offset for stencil.
If unaligined store is forced this change fixes a hang in:
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_32_32.samples_2.d32_sfloat_s8_uint_separate_layouts.compatibility_depth_zero_stencil_zero_testing_stencil
Fixes: b157a5d0d6
("tu: Implement non-aligned multisample GMEM STORE_OP_STORE")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15532>
this interaction requires read-only sampler access from
depth component with writes to the stencil component, which can only
be done in the GENERAL layout
affects:
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_color_and_stencil_blit
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15716>
ssbo binds are always at least readable, so without deeper shader
inspection, never assume that write binds mean no read access
this is different than image descriptors which can explicitly get
write-only access set
cc: mesa-stable
fixes#6231
THANKS TO @daniel-schuermann FOR SOLVING THIS WITH HIS INCREDIBLE TALENT AND WIT
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15697>
The common Vulkan sync framework will do most of the queueing for us.
It will even sort out timeline semaphore dependencies and ensure
everything executes in-order. All we have to do is make sure our
vk_sync type implements VK_SYNC_FEATURE_WAIT_PENDING. This lets us get
rid of a big pile of code.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15651>
dEQP-VK.binding_model.buffer_device_address.set3.depth3.basessbo.convertuvec2.nostore.multi.scalar.vert
runtime -24.4002% +/- 1.94375% (n=7). The win (I think) is in LLVM not
having to chew through handling the extra loops on every constant-offset
SSBO load, not in actual rendering time.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14999>
If we know the offset is constant, we don't have ask LLVM to loop over the
elements pulling the same value out over and over.
This doesn't seem to have produced a win in the testcase I was looking at,
but it was an easier entrypoint to figuring out how to do scalar memory
access than load_memory, and will probably affect some workload.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14999>
In a fragment shader or inside of control flow, invocation 0 might be
inactive, and so our use-first-invocation-and-broadcast optimizations
would be invalid, and the loop logic of an emit_read_invocation would
defeat the point of these optimized paths.
For load_kernel_input, I didn't guard the uniform path with the check
because some CL tests that are currently passing are doing the
load_kernel_input under (presumably) uniform control flow. Instead, I
dropped in a once warning for the next person to be debugging CL.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14999>
You're not supposed to include a '\n' in mesa_log*() messages because
android logging will log what you provide on its own line anyway, so each
mesa_log() should be the body of a log line. But also, getting everyone
to consistently not do that is hopeless because we're all so trained by
printf(). So, just detect an existing \n and don't add a new one.
Cleans up deqp-vk debug output a bunch from turnip.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15332>
We do require C++20 still, because designated initializers is part of
that standard. This is almost a revert, but conditionally selecting
between c++latest or c++20 when available, as that's what we really want.
Fixes: 55ca1c8db3 ("vulkan/microsoft: Remove `override_options: ['cpp_std=c++latest']` option for visual studio")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15706>
Designated initializers are a C++20 feature, but we don't use C++20. In
fact, enabling C++20 for ACO triggers new compiler errors due to some
equality semantics details.
So let's instead stop using designated initializers here.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15706>
We can now use the same cache tracking mechanism for synchronizing QBO
writes instead of the unconditional PIPE_CONTROL performed currently,
which is unable to invalidate any incoherent caches which may contain
stale data for the buffer object.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15738>
The unconditional flushing performed by
iris_flush_and_dirty_for_history() is now redundant with the memory
barriers introduced previously in this series, which should be in a
better position to determine from which domain the buffer will
actually be used in the future, and whether an additional flush or
invalidation is required or redundant with other PIPE_CONTROL commands
emitted elsewhere.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15738>
We don't support 'Update After Bind', however, the limits for this
model also include the ones without it. See the with or without remark
in the spec below:
"maxPerStageDescriptorUpdateAfterBindInlineUniformBlocks is similar to
maxPerStageDescriptorInlineUniformBlocks but counts descriptor bindings
from descriptor sets created with or without the
VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT bit set."
Fixes:
dEQP-VK.api.info.vulkan1p2_limits_validation.ext_inline_uniform_block
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15732>
Since the packing functions generated by csbgen use a void pointer
for the buffer in which to pack, it's possible to easily write out
of bounds. This commits attempts to reduce the chances by
having the pack macro check that the pointer passed points to an
element sized equally to the state word being packed. Catching
these errors earlier.
As can be seen in this commit, there already was a case of this:
"pds_ctrl". The word size is meant to be 64 bits but the pointer
was pointing to a 32 bit field.
Although it's fine for the word size to be smaller than the
storage pointed to by the pointer, this is not allowed just to
be extra careful.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15687>
If we know the height is 1, then it would be a waste to align each
miplevel to tile height. For non-mipmapped textures, it doesn't save us
memory (since you still align to 4 on the last miplevel), but it should be
better cache locality by not loading those unused lines.
Incidentally, this gets us some more coverage of swap != WZYX cases in CTS
tests, which often use optimal tiling without also testing linear.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15293>
This field does appear to work as expected: with 1D/1DArray turnip storage
images switched to be always linear, it fixes the dEQP-VK.image.*store*
tests using a color swapped format (once we allow color swap).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15293>
This reports all of our storage formats as supporting read/write without
format, since we don't have any in-shader format conversions. Similarly,
shadow comparisons were already supported on all the depth formats.
This extension is required for VK 1.3.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15293>
This is a potential performance optimization, from the docs:
The PVS Instruction which updates the clip coordinatea position for
the last time. This value is used to lower the processing priority
while trivial clip and back-face culling decisions are made. This
field must be set to valid instruction.
Right now it is set at the last instruction. However, there is no
measurable performance effect in either Unigine Sanctuary, Lightsmark
or GLmark.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6045
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15677>
This ensures that any channels used for helper invocations are
also spilled/filled correctly.
Alternatively, we could recursively track all temps that get
involved in computing values that are then used in explicit
(dfdx,dfdy) or implicit (texture coordinates for mipmap or
anisotropic filtering, etc) derivatives, and only enable
per-quad on these (or disable spilling of any of these
values).
Fixes:
dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15705>
This extension has been promoted to Vulkan 1.2 which means it has been
silently enabled when we implemented Vulkan 1.2.
Enable it explicitely to make mesamatrix happy and also for consistency.
This extension was designed for potential performance improvements of
MSAA depth/stencil images but it's currently a no-op in RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15665>
The current code was allowing multiple query pools of the same type to be
active on the same commmand buffer which creates validation warnings.
Restructure the query handling, so the query pools are allocated on the
context on first use, then have queries allocate query_id from those pools.
This approach creates a new start dynamic array that contains each time a
vulkan query is started for a gallium query, and holds the flags for it rather
than storing them in a massive array.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15702>
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. Cube
maps and 3D surfaces were already handled, but 1D array and 2D array
surfaces were not.
Fixes the following Vulkan CTS failures on DG2:
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2darray_fragment
The Fixes: tag below is a bit misleading. This commit adds another
lowering, similar to the one in the Fixes: commit, that probably should
have been added at the same time. I just want to make sure this commit
gets applied everywhere that commit was also applied.
Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15681>
Moving duplicate vk_format helper functions to common
vulkan/util/vk_format.h and also renaming
vk_format_get_component_size_in_bits to match how amd and
freedreno name the same function. Not moving this function
to common code as freedreno's implementation is a bit different.
Signed-off-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15696>
In cases when no immutable samplers were present but sampler
descriptor set layout bindings were, a seg fault was being caused
by an attempt to get the immutable sampler from within the
immutable sampler array while the array was not allocated.
This commit also remove the binding type check since only those
specific types can have immutable samplers. The check is covered
by the descriptor set layout creation and assignment of
has_immutable_samplers.
This commit also makes the immutable samplers const, since they're
meant to be immutable.
This commit also adds has_immutable_samplers field to descriptor
set layout. Previously to check whether immutable
samplers were present or not you'd check for the layout binding's
descriptor count to be 0 and the immutable sampler offset to be 0.
This doesn't tell you everything. If you have a descriptor of the
appropriate type (VK_DESCRIPTOR_TYPE_{COMBINED_,}IMAGE_SAMPLER).
So descriptor count of >1. The offset can be 0 if you don't have
immutable sampler, or have them at the beginning of the immutable
samplers array. So you can't determine if you really have immutable
samplers or not. One could attempt to perform a NULL check on the
array but this would not work in cases where you have following set
layout bindings with immutable samplers as the array stores the
whole layout's immutable samplers.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15688>
radv_image_queue_family_mask was still using queue family index values
for the special cases, while being passes a radv_queue_family enum.
This adds the bits to the enum and vk_queue_to_radv so we can implement
radv_image_queue_family_mask completely in terms of the enum.
Fixes: 1ec4e568 ("radv: abstract queue family away from queue family index.")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15423>
This is for compatibility with loaders that don't know about
__DRI_DRIVER_GET_EXTENSIONS. xserver has supported that since 1.15.0,
which is almost nine years old now.
While we're about it, fix a comment in meson.build that used to be about
megadrivers to reflect reality.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15649>
In the case where u_index_generator returns zero new vertices, we never
filled tmp_indices before trying to duplicate the last veretx. This
causes us to read unitialized memory.
This fixes a Valgrind issue triggering in glxgears on Zink:
---8<---
==296461== Invalid read of size 2
==296461== at 0x570F335: compile_vertex_list (vbo_save_api.c:733)
==296461== by 0x570FEFB: wrap_buffers (vbo_save_api.c:1021)
==296461== by 0x571050A: upgrade_vertex (vbo_save_api.c:1134)
==296461== by 0x571050A: fixup_vertex (vbo_save_api.c:1251)
==296461== by 0x57114D1: _save_Normal3f (vbo_attrib_tmp.h:315)
==296461== by 0x10B750: ??? (in /usr/bin/glxgears)
==296461== by 0x10A2CC: ??? (in /usr/bin/glxgears)
==296461== by 0x4B3F30F: (below main) (in /usr/lib/libc.so.6)
==296461== Address 0x11ca23de is 2 bytes before a block of size 1,968 alloc'd
==296461== at 0x4845899: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==296461== by 0x570E647: compile_vertex_list (vbo_save_api.c:604)
==296461== by 0x570FEFB: wrap_buffers (vbo_save_api.c:1021)
==296461== by 0x571050A: upgrade_vertex (vbo_save_api.c:1134)
==296461== by 0x571050A: fixup_vertex (vbo_save_api.c:1251)
==296461== by 0x57114D1: _save_Normal3f (vbo_attrib_tmp.h:315)
==296461== by 0x10B750: ??? (in /usr/bin/glxgears)
==296461== by 0x10A2CC: ??? (in /usr/bin/glxgears)
==296461== by 0x4B3F30F: (below main) (in /usr/lib/libc.so.6)
---8<---
Fixes: dcbf2423d2 ("vbo/dlist: add vertices to incomplete primitives")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15633>
Now that we have a threading mode in the device, we can set that based
on the environment variable instead of delaying it to submit time. This
allows us to avoid the static variable trickery we use to avoid reading
environment variables over and over again. We also move the enabling of
the submit thread up a level or two and give it a bit more obvious
condition.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15566>
Fossilize used the git:// protocol for fetching submodules
but as of January 11, 2022, GitHub has tempirarily disabled
acceptance of the Git protocol until March 15, 2022 whereby
it will be permanantly disabled.
This patch uprevs to the Fossilize commit that switches
submodule URLs to https:// instead. Otherwise, we would get
an error stating that "The unauthenticated git protocol on
port 9418 is no longer supported." when trying to clone
submodules.
See https://github.blog/2021-09-01-improving-git-protocol-security-github
for more info.
Signed-off-by: Omar Akkila <omar.akkila@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15626>
if the driver requires pointsize uploads, only flag the last vertex
stage for updates, not all vertex stages
this should be functionally equivalent but without the unnecessary overhead
of also scanning the other stages
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15590>
the vulkan spec doesn't explicitly state whether this function progresses
a given semaphore's timeline, and apparently there are some cases where
it's assumed that progress occurs if this function is called in a loop
instead of WaitSemaphores, so check the current fence for completion
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15453>
according to KHR_gl_texture_3D_image:
If <target> is EGL_GL_TEXTURE_3D_KHR, <buffer> must be the name of a
complete, nonzero, GL_TEXTURE_3D (or equivalent in GL extensions) target
texture object, cast
into the type EGLClientBuffer. <attr_list> should specify the mipmap
level (EGL_GL_TEXTURE_LEVEL_KHR) and z-offset (EGL_GL_TEXTURE_ZOFFSET_KHR)
which will be used as the EGLImage source; the specified mipmap level must
be part of <buffer>, and the specified z-offset must be smaller than the
depth of the specified mipmap level.
thus a 2d view of a 3d surface is not only legal, it's part of the spec and
must be supported when available
cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15584>
Right now we always copy pos to wpos regardless of if the fragment
shader needs it or not. Instead just do it only for the shaders
that really need it.
Shader-db is not able to measure preciselly the effect as we
build only the non-wpos variants, but given that majority of fragment
shaders don't needs the wpos, the real effect should be close:
total instructions in shared programs: 105427 -> 104313 (-1.06%)
instructions in affected programs: 53098 -> 51984 (-2.10%)
total temps in shared programs: 13991 -> 13958 (-0.24%)
temps in affected programs: 114 -> 81 (-28.95%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15321>
Make the r300_vertex_shader structure resemble the r300_framgment_shader
in order to allow support for shader variants. There is no functional
change, but it will be used in the next commit to only output wpos
when needed from vertex shaders.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15321>
Right now we always write to a temp and than emit movs to both
POS and WPOS.
When the result is trivial, like:
MOV temp[0], in[0]
MOV output[0], temp[0]
MOV output[1], temp[0]
the dataflow analysis passes can take care of it, this is however
one of the last optimizations we need the pass for. Additionally
it fails even for some still trivial cases like:
MAD temp[2], const[3], temp[0].wwww, temp[1];
MOV output[0], temp[2];
MOV output[2], temp[2];
This patch will just duplicate the output instuction when there
is only a single output write.
The long-term plan would be to just trust NIR, do as little in
the backend as possible and optimize the remaining backend passes
to not need any additional cleanups.
total instructions in shared programs: 105862 -> 105427 (-0.41%)
instructions in affected programs: 20527 -> 20092 (-2.12%)
total temps in shared programs: 14029 -> 13991 (-0.27%)
temps in affected programs: 152 -> 114 (-25.00%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15321>
Instead just emit both outputs as soon as possible.
If the last write is inside a loop or a branch, emit it after
the ENDLOOP or ENDIF. This saves some temps and also allows us
to potentially benefit from R300_PVS_XYZW_VALID_INST as right
now the position output write is always penultimate with the
WPOS output being the last.
total temps in shared programs: 14101 -> 14029 (-0.51%)
temps in affected programs: 435 -> 363 (-16.55%)
helped: 72
HURT: 0
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15321>
Now that we're using 3DSTATE_BINDING_TABLE_POOL_ALLOC to set the base
address for the binding table pool separately from surface states, we
don't actually need to update surface state base address anymore.
Instead, we can just set STATE_BASE_ADDRESS once at context creation,
and never bother updating it again, saving some heavyweight flushes
and freeing us from the need for address offsetting trickery.
This patch was originally written by Jason Ekstrand, with fixes from
Lionel Landwerlin, but was targeting Icelake. Doing it there requires
additional changes (15:5 -> 18:8 binding table pointer formats) which
also involve some trade-offs, whereas the XeHP change is purely a win,
so we'll do it here first.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15616>
You should probably resize the sources array before accessing entries
that might be out of bounds. inst->resize_sources() always allocates
enough space for at least 3 sources, so this is really only an issue
when there are 4+ sources.
Fixes: a920979d4f ("intel/fs: Use split sends for surface writes on gen9+")
Fixes: 4f86a70599 ("intel/fs: Lower DW untyped r/w messages to LSC when available")
Fixes: d372abe397 ("intel/fs: Add surface OWORD BLOCK opcodes")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15632>
Thix fixes two bugs. First, we stop leaking in/out fences with
multisync. Because the in_syncs and out_syncs parameters to
set_multisync were arrays and not pointers to arrays, the caller's
in_syncs and out_syncs pointers never got set and remained NULL so
multisync_free() always sees to NULL pointers and does nothing, leaking
both arrays. Not sure how this isn't showing up in the dEQP leak check
tests.
Second, the struct drm_v3d_multi_sync was in the scope of the then
clause of the `if (device->pdevice->caps.multisync)` so it goes out of
scope before the ioctl. This is, effectively, a use-after-free and,
depending on stack allocation details, may result in the multisync
extension struct getting stompped before the ioctl.
Fixes: ff8586c345 ("v3dv: enable multiple semaphores on cl submission")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15512>
The spec states that descriptor set layouts can be destroyed almost
at any time:
"VkDescriptorSetLayout objects may be accessed by commands that
operate on descriptor sets allocated using that layout, and those
descriptor sets must not be updated with vkUpdateDescriptorSets
after the descriptor set layout has been destroyed. Otherwise,
descriptor set layouts can be destroyed any time they are not in
use by an API command."
Based on a similar fix for RADV.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5893
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15634>
iris may rely on 3DSTATE_BINDING_TABLE_POOL_ALLOC's address being
inherited from a previous batch. So, we need to tell the decoder
about the address in case it sees binding tables to decode before
it encounters such a command in the batch.
We used to update surface_base, now we update bt_pool_base instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15625>
Use a unified si_create_copy_image_cs() to generate both the 1D and 2D
copy shaders; the shaders themselves remain separate.
Also:
- add a helper deref_ssa() for nir deref
- add a helper set_work_size() for setting workgroup and grid sizes
- pass si_context (instead of pipe_context) to the copy_image generator
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15268>
Fix defect reported by Coverity Scan.
Evaluation order violation (EVALUATION_ORDER)
write_write_typo: In queue_create = queue_create = &pCreateInfo->pQueueCreateInfos[0], queue_create is written twice with the same value.
Fixes: 8991e64641 ("pvr: Add a Vulkan driver for Imagination Technologies PowerVR Rogue GPUs")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15604>
It seems that a660 has the same bug. Without the workaround there
are a lot of flakes with depth-stencil tests, e.g. in:
dEQP-VK.pipeline.extended_dynamic_state.*
dEQP-VK.renderpass.depth_stencil_write_conditions.*
dEQP-VK.pipeline.stencil.format.d24_unorm_s8_uint.states.*
Or guaranteed failures like of:
dEQP-VK.pipeline.render_to_image.core.2d.huge.width.r8g8b8a8_unorm_d32_sfloat_s8_uint
Enabling the workaround fixes all of them.
cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15548>
in scenarios like:
vec1 32 ssa_151 = deref_var &shadow_map (uniform sampler2D)
vec1 32 ssa_152 = deref_var &shadow_map (uniform sampler2D)
vec2 32 ssa_153 = vec2 ssa_151, ssa_152
vec1 32 ssa_154 = deref_var ¶m@4 (function_temp uvec2)
intrinsic store_deref (ssa_154, ssa_153) (wrmask=xy /*3*/, access=0)
vec1 32 ssa_160 = deref_var ¶m@4 (function_temp uvec2)
vec2 32 ssa_164 = intrinsic load_deref (ssa_160) (access=0)
vec1 32 ssa_167 = mov ssa_164.x
vec1 32 ssa_168 = deref_cast (texture2D *)ssa_167 (uniform texture2D) /* ptr_stride=0, align_mul=0, align_offset=0 */
vec1 32 ssa_169 = mov ssa_164.y
vec1 32 ssa_170 = deref_cast (sampler *)ssa_169 (uniform sampler) /* ptr_stride=0, align_mul=0, align_offset=0 */
vec1 32 ssa_172 = (float32)tex ssa_168 (texture_deref), ssa_170 (sampler_deref), ssa_171 (coord), ssa_166 (comparator)
the real variable is stored to a function_temp and then loaded back again,
which means it isn't a direct deref and lower_vri_instr_tex_deref() will
crash because the variable can't be found
BUT running only the passes needed to eliminate derefs will break other tests,
so just run the whole optimize loop again here to avoid such issues
for #5945
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15511>
Previously we were just doing a group barrier for both membar and barrier.
This sometimes worked out, because atomics and reads waited for ack
already, but writes were not waiting for ack. Use the need_wait_ack
pattern that scratch writes used, with a little refactoring for
reusability.
The refactor also incidentally fixes the atomics waiting for outstanding
acks to be > 1 instead of > 0.
Cc: mesa-stable
Fixes: #6028
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14429>
Prevents several regressions when NIR-to-TGSI is enabled where it was
allocating arrays on top of each other.
Fixes vec3 fails on RV770,
dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_1 and 2
in general, and fixes another piglit but breaks two others. Still, this
seems to be a win.
Cc: mesa-stable
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14429>
The two types of instructions get added to the same CF list, but not the
same instr list within the CF list. So, if you SSBO fetched your
texcoord, the emission of the SSBO fetch would come *after* the texcoord
fetch.
Avoids regressions when NIR-to-TGSI starts optimizing more.
Cc: mesa-stable
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14429>
Avoids a regression when enabling shader precompilation, where the
precompile would happen with MSAA disabled (so no sample mask export) but
we'd never catch up to the shader being rendered with MSAA.
Doesn't fix any current testcases, though.
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14427>
This test was never supposed to be skipped, and the referenced commit
just exposed a bug in turnip fixed by the previous commit. It was
hanging due to a CTS bug making the submit take way too long, which will
be fixed once the CTS change lands.
Also, add it to the a630 skips.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15563>
In this case we should relax checks based on the format, since the user
will be responsible for them when creating an image view.
This gets dEQP-VK.image.sample_texture.*_bit_compressed_format_* not
skipping again after VK-GL-CTS 736eec57dc0c ("Fix checkSupport in
compressed texture sampling tests").
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15563>
We have to add WFM to pending bits when we are flushing into CP
for indirect draw to know when they should apply WFM workaround.
Fixes CTS tests:
dEQP-VK.draw.renderpass.indirect_draw.*_data_from_compute.indirect_draw_count*
Fixes: abf0ae014a
("tu: Properly handle waiting on an earlier pipeline stage")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15577>
We handle uniforms by copying them into the uniform stream to be
consumed with ldunif when they have a constant offset. Otherwise
we fallback to general TMU access, which has more latency.
However, just like we did for UBOs and read-only SSBOs, we can
also try to use the unifa mechanism to handle indirect accesses
in certain cases instead of the TMU fallback.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>
Inline uniform blocks store their contents in pool memory rather
than a separate buffer, and are intended to provide a way in which
some platforms may provide more efficient access to the uniform
data, similar to push constants but with more flexible size
constraints.
We implement these in a similar way as push constants: for constant
access we copy the data in the uniform stream (using the new
QUNIFORM_UNIFORM_UBO_*) enums to identify the inline buffer from
which we need to copy and for indirect access we fallback to
regular UBO access.
Because at NIR level there is no distinction between inline and
regular UBOs and the compiler isn't aware of Vulkan descriptor
sets, we use the UBO index on UBO load intrinsics to identify
inline UBOs, just like we do for push constants. Particularly,
we reserve indices 1..MAX_INLINE_UNIFORM_BUFFERS for this,
however, unlike push constants, inline buffers are accessed
through descriptor sets, and therefore we need to make sure
they are located in the first slots of the UBO descriptor map.
This means we store them in the first MAX_INLINE_UNIFORM_BUFFERS
slots of the map, with regular UBOs always coming after these
slots.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>
shader_storage_blocks_write_access was computed using the buffer indices
in the program but ShaderStorageBlocksWriteAccess is used with the shader
buffers.
So if a VS had 3 SSBOs and a FS had 4, the mask for VS was 0x3 (correct) but
the mask for the FS was 0x78 instead of 0x15.
Fix this by substracting the index of the first shader buffer in the program's
buffers.
Fixes: 79127f8d5b ("glsl: set ShaderStorageBlocksWriteAccess in the nir linker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6184
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15552>
In general, an atomic intrinsic may perform separate atomics for every
enabled SIMD channel, as each channel may operate on different memory.
However, an extremely common case is for all channels to access the same
memory location. In this case, we can simply perform a reduction/scan
across the subgroup, and perform one atomic for the whole subgroup,
rather than one per channel. For example, if an intrinsic says to take
the minimum value of the existing memory and the value in each channel,
we can do a thread-local minimum of all enabled channels, then do a
single atomic to take the minimum of that and the existing memory.
Our hardware doesn't optimize the case where multiple channels ask for
atomics on the same memory location; it assumes the compiler will do so.
nir_opt_uniform_atomics() uses divergence analysis to detect this case,
adds the necessary subgroup operations, and moves the atomic inside a
conditional that disables all but a single invocation. It even detects
cases where the shader code already performs this kind of optimization,
and avoids doing it a second time.
This may not be the optimal solution for us. In the backend, we could
detect this case and emit send(1) instructions with NoMask, rather than
generating if...send(16)...endif, and a lot of unnecessary ALU ops. But
it's simple to do, reuses the same path as ACO, and still provides most
of the benefit by cutting up to 16x atomics down to a single atomic,
which is more merciful to the memory bus.
Improves performance of Shadow of the Tomb Raider by 5.5% on XeHP.
Improves performance of a customer-internal benchmark on XeHP at
3840x2160 and low settings by approximately 30%.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
We haven't exposed this intrinsic as it doesn't directly correspond to
anything in SPIR-V. However, it's used internally by some NIR passes,
namely nir_opt_uniform_atomics().
We reuse most of the infrastructure in brw_find_live_channel, but with
LZD/ADD instead of FBL. A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is
like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
- load_reloc_const is just an immediate constant load, it's convergent.
- nir_intrinsic_load_global_const_block_intel should be convergent,
it says the address must be uniform, and we uniformize the predicate
- Lowered image intrinsics: image_deref_load_param_intel just reads
information about an image, as long as the image variable is
convergent it should be too. load_raw_intel...if the address we
come up with is convergent, it ought to be as well.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
Instead of reusing the in/out slot mechanism, use a separated NIR
variable mode. This will make easier later to implement staging the
output in shared memory (and storing all at the end to the URB).
Note to get 64-bit type support we currently rely on the
brw_nir_lower_mem_access_bit_sizes() pass.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15022>
The only ways that function can return NULL are:
- the xcb connection was closed
- the window for the swapchain was destroyed
- the special event listener was unregistered from another thread
- malloc failure
All of these are permanent errors, the swapchain is no longer in a
usable state, so we should treat this as VK_ERROR_SURFACE_LOST_KHR.
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15558>
Logic is lifted from bi_layout.c, adapted to work on instructions (not
clauses) and for Valhall's off-by-one semantic which is annoyingly
different than Bifrost. (But the same as Midgard -- Bifrost was
annoyingly different than Midgard!)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15223>
We already collect enums in the ISA description XML. Export them for use in the
compiler backend, particularly the packing code.
Usually we'd use Mako for templating. In this case, the script is so trivial a
template engine didn't seem worth it. (The obvious version with Mako was about
10 lines longer than just prints and f-strings used here.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15223>
If there are modifiers only used by pseudo instructions, not the real
instructions, bi_packer can get out-of-sync with bi_opcodes, causing
hard-to-debug issues. Do the stupid-simple thing to ensure this doesn't happen.
This may be a temporary issue, depending whether ISA.xml and the IR get split
out for better Valhall support.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15223>
NIR load_consts/inputs tend to happen together at the top of the program.
In the TGSI backend the loads got emitted at use time, while the NIR
backend was emitting the loads at load intrinsic time. By sinking the
intrinsics, we can greatly reduce register pressure.
nv92 NIR results:
total local in shared programs: 2024 -> 2020 (-0.20%)
local in affected programs: 4 -> 0
total gpr in shared programs: 790424 -> 735455 (-6.95%)
gpr in affected programs: 215968 -> 160999 (-25.45%)
total instructions in shared programs: 6058339 -> 6051208 (-0.12%)
instructions in affected programs: 410795 -> 403664 (-1.74%)
total bytes in shared programs: 41820104 -> 41660304 (-0.38%)
bytes in affected programs: 7147296 -> 6987496 (-2.24%)
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15542>
Fix 'error C4576: a parenthesized type followed by an initializer
list is a non-standard explicit type conversion syntax' errors by
declaring an actual variable and returning it in
vk_image_view_subresource_range().
All those MSVC/c++ related-constraints are quite annoying to be honest,
but it looks like the D3D12 headers have been updated to plain C
recently, which will allow us to write the driver in C, and hopefully
get all this sort of issues behind us.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14766>
Implements vkCreateSampler and vkDestroySampler APIs.
Also fixes maxSamplerLodBias value from 15.0f to 16.0f
as it's the max supported by our hardware.
Also changing maxSamplerAnisotropy from 16.0f to 1.0f to
temporarily disable anisotropy as we are missing software
support for it.
Signed-off-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15557>
the previous method of using affected_states to trigger constant updates
was ineffectual in the scenario where a ubo pointsize was needed on
the first time a non-precompiled shader was used after being the not-last
vertex stage:
* have vs+gs -> gs precompiles with pointsize lowering -> gs constants get updated
* remove gs -> vs was precompiled without pointsize lowering -> vs constants broken
now just do a quick check as in st_atom_shader.c and set the flag manually to
ensure the update is done correctly every time
cc: mesa-stable
fixes#6207
fixes (radv):
KHR-GL46.texture_cube_map_array.image_op_fragment_sh
KHR-GL46.texture_cube_map_array.sampling
KHR-GL46.texture_cube_map_array.texture_size_fragment_sh
KHR-GL46.constant_expressions*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15570>
Freedreno will check if the virtgpu supports the pass-thru context, and
if not will bail, falling back to virgl.
TODO this requires that virgl is also enabled in the mesa build, even if
it is not needed.. maybe there is a better way to handle this?
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
Add a new backend to enable using native driver in a VM guest, via a new
virtgpu context type which (indirectly) makes host kernel interface
available in guest and handles the details of mapping buffers to guest,
etc.
Note that fence-fd's are currently a bit awkward, in that they get
signaled by the guest kernel driver (drm/virtio) once virglrenderer in
the host has processed the execbuf, not when host kernel has signaled
the submit fence. For passing buffers to the host (virtio-wl) the egl
context in virglrenderer is used to create a fence on the host side.
But use of out-fence-fd's in guest could have slightly unexpected
results. For this reason we limit all submitqueues to default priority
(so they cannot be preepmted by host egl context). AFAICT virgl and
venus have a similar problem, which will eventually be solveable once we
have RESOURCE_CREATE_SYNC.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
We are going to want basically the identical thing, other than
flush_submit_list, for virtio backend. Now that we've moved various
other dependencies into the base classes, extract out an abstract base
class for submit/ringbuffer.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
Add a hint for buffers that we won't need to mmap. With the virtio
backend, virglrenderer needs to create a dmabuf fd for mapping into
the host, which we want to avoid when possible.
Low hanging fruit is to use this hint for anything tiled/ubwc. There
are probably more bo's that can be flagged as such.
TODO add fd_bo_upload() for memcpy to bo.. this would be useful for
uploads, for example, shaders which we just write once and never touch
again.. for virtio this could be implemented with a TRANSFER_TO_HOST
ioctl.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
Decoupling handle and fd_bo creation simplifies things for "normal" drm
drivers, avoiding duplication for the create vs import paths. But this
is awkward for the virtio backend when wants to do multiple things in
the same guest<->host round trip.
So instead, split the paths in the interface backend and move the code
sharing for the two different paths into the msm backend itself.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14900>
Based on profiling, these 10us sleeps are behaving closer to ~60us
and causing higher-than-necessary CPU overhead. 120us seems like a
good balance between latency management and overhead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15565>
This transitions the resource to TRANSFER_SRC_OPTIMAL, but
never updates the res->layout field, so subsequent transitions
are wrong and throw validation errors.
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x6660f60, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x6660f60[] expects VkImage 0x2f000000002f[] (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15545>
This patch adds the PowerVR driver to the following shared builds:
* debian-arm64
* debian-clang
* debian-release
* debian-vulkan
* fedora-release
It also adds the associated "tools" to these builds:
* debian-arm64-build-test
* debian-release (already uses -Dtools=all)
* fedora-release
And removes them from these builds by expanding -Dtools=all:
* debian-gallium
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15284>
unreachable() doesn't lead to executing the code that follows it,
neither in debug nor release builds. So falling through doesn't make any
sense.
This fixes a compile-error on clang.
Let's move the default-block to the end to make it clearer that there's
no intended fallthrough.
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15531>
unreachable() doesn't lead to executing the code that follows it,
neither in debug nor release builds. So falling through doesn't make any
sense.
This fixes a compile-error on clang.
Let's move the default-block to the end to make it clearer that there's
no intended fallthrough.
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15531>
First, there is a problem if you do the following
vkCmdSetColorWriteEnableEXT(attachmentCount = 8)
vkCmdBindPipeline(GFX, with attachmentCount = 4)
vkCmdDraw()
vkCmdBindPipeline(GFX, with attachmentCount = 8)
vkCmdDraw()
Because in the dynamic state emission code we rely on the first
pipeline to figure the number of BLEND_STATE entries to prepare. This
is wrong, we should fill all entries so that the dynamic state works
regardless of the number of attachments in the pipeline. With regard
to the dynamic values, we should retain enable/disable values that do
not concern the current pipeline.
Second, 3DSTATE_WM was not always reemitted when the pipeline changed.
But since it is not emitted as part of the pipeline, this results in
inconsistent state being programmed.
Third, we end up disabling the fragment stage completely in some
cases. And that is programming the pipeline inconsistently and
triggering a hang on TGL.
v2: Fix comment (Tapani)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b15bfe92f7 ("anv: implement VK_EXT_color_write_enable")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
The problem is that we missed looking at pipeline changes. Pipelines
hold bits of dynamic states and when it changes we might need to
reemit a packet.
v2: fix comment (Tapani)
Add missing anv_cmd_buffer_needs_dynamic_state() use (Tapani)
Cc: mesa-stable
Fixes: 505d176a8e ("anv: disable baked in pipeline bits from dynamic emission path")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
error log:
```
[2/43] Compiling C object src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj
FAILED: src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj
"cc" "-Isrc/gallium/drivers/llvmpipe/libllvmpipe.a.p" "-Isrc/gallium/drivers/llvmpipe" "-I../../src/gallium/drivers/llvmpipe" "-I../../src/gallium/include" "-Isrc/gallium/auxiliary" "-I../../src/gallium/auxiliary" "-Iinclude" "-I../../include" "-Isrc" "-I../../src" "-Isrc/compiler/nir" "-I../../src/compiler/nir" "-Isrc/util" "-I../../src/util" "-IC:/CI-Tools/msys64/clang64/include" "-fvisibility=hidden" "-fcolor-diagnostics" "-Wall" "-Winvalid-pch" "-std=c11" "-O0" "-g" "-DPACKAGE_VERSION=\"22.0.0-devel\"" "-DPACKAGE_BUGREPORT=\"https://gitlab.freedesktop.org/mesa/mesa/-/issues\"" "-DHAVE_WINDOWS_PLATFORM" "-DHAVE_SURFACELESS_PLATFORM" "-DUSE_ELF_TLS" "-DUSE_TLS_BEHIND_FUNCTIONS" "-DENABLE_ST_OMX_BELLAGIO=0" "-DENABLE_ST_OMX_TIZONIA=0" "-DEGL_NO_X11" "-DDEBUG" "-DHAVE___BUILTIN_BSWAP32" "-DHAVE___BUILTIN_BSWAP64" "-DHAVE___BUILTIN_CLZ" "-DHAVE___BUILTIN_CLZLL" "-DHAVE___BUILTIN_CTZ" "-DHAVE___BUILTIN_EXPECT" "-DHAVE___BUILTIN_FFS" "-DHAVE___BUILTIN_FFSLL" "-DHAVE___BUILTIN_POPCOUNT" "-DHAVE___BUILTIN_POPCOUNTLL" "-DHAVE___BUILTIN_UNREACHABLE" "-DHAVE___BUILTIN_TYPES_COMPATIBLE_P" "-DHAVE_FUNC_ATTRIBUTE_CONST" "-DHAVE_FUNC_ATTRIBUTE_FLATTEN" "-DHAVE_FUNC_ATTRIBUTE_MALLOC" "-DHAVE_FUNC_ATTRIBUTE_PURE" "-DHAVE_FUNC_ATTRIBUTE_UNUSED" "-DHAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT" "-DHAVE_FUNC_ATTRIBUTE_WEAK" "-DHAVE_FUNC_ATTRIBUTE_FORMAT" "-DHAVE_FUNC_ATTRIBUTE_PACKED" "-DHAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL" "-DHAVE_FUNC_ATTRIBUTE_ALIAS" "-DHAVE_FUNC_ATTRIBUTE_NORETURN" "-DHAVE_FUNC_ATTRIBUTE_VISIBILITY" "-DHAVE_UINT128" "-D_WINDOWS" "-D_WIN32_WINNT=0x0A00" "-DWINVER=0x0A00" "-DPIPE_SUBSYSTEM_WINDOWS_USER" "-D_USE_MATH_DEFINES" "-DUSE_SSE41" "-DUSE_GCC_ATOMIC_BUILTINS" "-DHAS_SCHED_H" "-DHAVE_CET_H" "-DHAVE_STRTOF" "-DHAVE_STRTOK_R" "-DHAVE_QSORT_S" "-DHAVE_ZLIB" "-DHAVE_ZSTD" "-DHAVE_COMPRESSION" "-DLLVM_AVAILABLE" "-DMESA_LLVM_VERSION_STRING=\"13.0.0\"" "-DLLVM_IS_SHARED=1" "-DDRAW_LLVM_AVAILABLE" "-DMESA_EXECMEM" "-DVK_USE_PLATFORM_WIN32_KHR" "-Werror=implicit-function-declaration" "-Werror=missing-prototypes" "-Werror=return-type" "-Werror=empty-body" "-Werror=incompatible-pointer-types" "-Werror=int-conversion" "-Wimplicit-fallthrough" "-Wno-missing-field-initializers" "-fno-math-errno" "-fno-trapping-math" "-Qunused-arguments" "-fno-common" "-Wno-microsoft-enum-value" "-Werror=format" "-Wformat-security" "-Werror=thread-safety" "-ffunction-sections" "-fdata-sections" "-pthread" "-D_FILE_OFFSET_BITS=64" "-D__STDC_CONSTANT_MACROS" "-D__STDC_FORMAT_MACROS" "-D__STDC_LIMIT_MACROS" "-Werror=pointer-arith" "-Werror=gnu-empty-initializer" -MD -MQ src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj -MF "src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj.d" -o src/gallium/drivers/llvmpipe/libllvmpipe.a.p/lp_setup_tri.c.obj "-c" ../../src/gallium/drivers/llvmpipe/lp_setup_tri.c
In file included from ../../src/gallium/drivers/llvmpipe/lp_setup_tri.c:37:
In file included from ../../src/gallium/drivers/llvmpipe/lp_setup_context.h:38:
In file included from ../../src/gallium/drivers/llvmpipe/lp_setup.h:31:
In file included from ../../src/gallium/drivers/llvmpipe/lp_jit.h:40:
In file included from ../../src/gallium/auxiliary/gallivm/lp_bld_limits.h:37:
In file included from ../../src/util/u_cpu_detect.h:41:
In file included from ../../src/util/u_thread.h:35:
In file included from ../../include/c11/threads.h:64:
In file included from ../../include/c11/threads_win32.h:58:
In file included from C:/CI-Tools/msys64/clang64/x86_64-w64-mingw32/include/windows.h:69:
In file included from C:/CI-Tools/msys64/clang64/x86_64-w64-mingw32/include/windef.h:9:
In file included from C:/CI-Tools/msys64/clang64/x86_64-w64-mingw32/include/minwindef.h:163:
In file included from C:/CI-Tools/msys64/clang64/x86_64-w64-mingw32/include/winnt.h:1555:
In file included from C:/CI-Tools/msys64/clang64/lib/clang/13.0.0/include/x86intrin.h:15:
In file included from C:/CI-Tools/msys64/clang64/lib/clang/13.0.0/include/immintrin.h:37:
C:/CI-Tools/msys64/clang64/lib/clang/13.0.0/include/tmmintrin.h:582:1: error: redefinition of '_mm_shuffle_epi8'
_mm_shuffle_epi8(__m128i __a, __m128i __b)
^
../../src/gallium/auxiliary/util/u_sse.h:159:1: note: previous definition is here
_mm_shuffle_epi8(__m128i a, __m128i mask)
^
1 error generated.
```
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14077>
Because of incomplete codepaths further down in this function, we end up
trying to use this variable without initialization. Let's initialize it
to zero, just to keep the compiler happy.
This avoids a compile-time warning, which would get treated as an error
on CI.
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15525>
There's no guarantee that we don't have more than 128 PHI values either,
so let's stop asuming so.
We do this by changing dxil_phi_set_incoming to dxil_phi_add_incoming,
which lets us add more incoming phi-values to the current one instead of
setting a new set of them.
This also lets us reduce these stack-arrays a bit, down to something
much more reasonable.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15519>
As this function doesn't check for any control-flow
dependence, it only returns true for statically
(or globally) uniform values.
The same holds true for is_binding_dynamically_uniform()
in nir_opt_gcm().
Rename to better reflect that property.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14994>
the fastpath here can only be taken if there is exactly one stream active,
as this will otherwise break nonzero stream primitives generated queries
in truth, this num_vertex_streams thing should be a bitmask so that the case
of num_streams=1,stream_id!=0 could also be fastpathed, but the complexity
probably isn't worth it given the infrequency of use
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15506>
this can't be determined from pipe_shader_state::stream_output,
as this only contains xfb info, which is not the same as the vertex
stream info, and may break primitives generated queries
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15506>
When using cubemaps and the u/v values are 0, then this point
can be arrived at with rho = nan, and if rho is NaN, then lod
calculations end up at the max lod, whereas the spec suggests
they should end up at the most negative lod.
This fixes
dEQP-VK.glsl.texture_functions.query.texturequerylod.samplercube_float_zero_uv_width_fragment
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15335>
Otherwise, if batch->scissor_culls_everything is set for a single draw,
every draw after it in the batch will be skipped because the new
scissor/viewport state will never be processed. Process scissor state
early in draw_vbo to fix this interaction.
We do need to be careful: setting something on the batch can only happen when
we've decided on a batch. If we have to select a fresh batch due to too many
draws, that must happen first. This is pretty clear in the code but worth noting
for the diff.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Icecream95 <ixn@disroot.org>
Reviewed-by: Icecream95 <ixn@disroot.org>
Fixes: 79356b2e ("panfrost: Skip rasterizer discard draws without side effects")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5839
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6136
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15365>
There is an out-of-sync approach regarding the location of the results
folder: some scripts refer to it via $CI_PROJECT_DIR/results, while
others just assume it is located in the current working directory.
Usually $PWD points to $CI_PROJECT_DIR, but in some cases this is not
the case, hence let's ensure the 'results' folder can always be found
in the current working directory.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
In order to run a VM (e.g. crosvm) through HWCI_TEST_SCRIPT on a LAVA
target, it's necessary to download a kernel image on the target device.
When HWCI_KVM is set to 'true', we can safely assume HWCI_TEST_SCRIPT
contains a command or the path to a script which expects the kernel
image to be available under /lava-files/${KERNEL_IMAGE_NAME}.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
Having CI_JOB_JWT_FILE pointing to path on /tmp makes it difficult to be
managed in VM contexts (e.g. crosvm) because the /tmp mountpoint usually
refers to a local filesystem rather than the host one.
Additionally, there is another restriction for 'piglit-traces-test' job
to have the file available on the root filesystem.
To avoid amending all the jobs that might be affected, let's just set
the variable to a fixed path '/minio_jwt'. Note we also need to do this
in the 'variables:' section instead of 'before_script:' in order to be
able to reference it in CI job variables, e.g. PIGLIT_REPLAY_EXTRA_ARGS
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
Use a dedicated DEQP_RUNNER_CARGO_ARGS variable instead of
EXTRA_CARGO_ARGS in build-deqp-runner.sh to pass custom arguments when
invoking 'cargo install'.
This is to avoid modifications of EXTRA_CARGO_ARGS which might have
negative side-effects in the scripts which rely on this variable and
import build-deqp-runner.sh instead of executing it in a subshell.
Fixes: 8729c6e981 ("ci: Support building and installing deqp-runner from source")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
R8G8 have a different block width/height and height alignment from other
formats that would normally be compatible (like R16), and so if we are
trying to, for example, sample R16 as R8G8 we need to demote to linear.
Follows the fix in Freedreno: b97e3bb2e1
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15465>
Replaced the existing internal TGSI compute shader, which clears
a read-modify-write buffer, with its NIR equivalent. The disassembly
shader generated by the new NIR variant is identical to the previous
implementation. These changes remove the additional conversion step
from TGSI to NIR for the shader at runtime. Tested on a Navi 23 card.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15356>
When CCS compression first came out on Skylake, we referred to it as
"renderbuffer compression", or RBC for short. However, that name has
long since fallen out of favor, and we refer to it as CCS nearly
everywhere.
This patch renames DEBUG_NO_RBC to DEBUG_NO_CCS inside the codebase
for clarity, and adds INTEL_DEBUG=noccs. The legacy INTEL_DEBUG=norbc
name continues to work, because it's one line of code and having both
names makes our lives easier in the interim.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15447>
by disabling color and depth write, the side effects of force-disabling discard can
be mitigated
fixes:
KHR-GL46.tessellation_shader.single.isolines_tessellation
KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.data_pass_through
KHR-GL46.tessellation_shader.tessellation_invariance.invariance_rule3
KHR-GL46.tessellation_shader.tessellation_shader_point_mode.points_verification
KHR-GL46.tessellation_shader.tessellation_shader_quads_tessellation.degenerate_case
KHR-GL46.tessellation_shader.tessellation_shader_quads_tessellation.inner_tessellation_level_rounding
KHR-GL46.tessellation_shader.tessellation_shader_tessellation.gl_InvocationID_PatchVerticesIn_PrimitiveID
KHR-GL46.tessellation_shader.vertex.vertex_spacing
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15392>
From docs:
The PVS Instruction which uses the Input Vertex Memory for the last
time. This value is used to free up the Input Vertex Slots ASAP.
This field must be set to a valid instruction.
Right now it is set to the last instruction. When the last read is
inside a loop, set it on the outhermost ENDLOOP. This could in theory
help performance, but none of my usual benchmarks including GLmark,
Unigine Sanctuary or Lightsmark show any measurable performance difference.
Suggested in: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6045
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15252>
this is a lot of machinery to propagate the block sizes down from the
descriptor layout to the pipeline layout to the rendering_state
block data is appended to ubo0 immediately following push constant
data (if it exists), which requires that a new buffer be created and
filled any time either type of data changes
shader handling is done by propagating the offset of each block relative
to the start of its descriptor set, then accumulating the sizes of
every uniform block in each preceding descriptor set into the offset,
then adding on the push constant size, and finally adding that on to
the existing load_ubo deref offset
update-after-bind is no longer an issue since each instance of pc+block
data is its own immutable buffer that can never be modified
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15457>
now instead of having static per-stage buffer regions and letting llvmpipe
do the upload, lavapipe creates a new pipe_resource and chucks it away
with take_ownership=true to allow it to be destroyed once it's no longer
in use
this also alters ubo0 mechanics such that the buffer is now sized exactly to
the size of the push constants in the pipeline and push constants are only
updated when the appropriate shader stage is flagged
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15457>
the converted number of vertices is 2x that of the "real" number of vertices,
so divide the results by 2
this works nicely since zink performs cpu readback for all queries, and shader
aggregation should be simple in the future as well
fixes:
KHR-GL46.pipeline_statistics_query_tests_ARB.functional_primitives_vertices_submitted_and_clipping_input_output_primitives
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15401>
Without this, 128-bit or 64-bit register spills can generate unaligned
loads and stores if tlsBase is unaligned.
Fixes glsl-1.50/execution/variable-indexing/gs-input-array-vec3-index-rd
with NV50_PROG_USE_NIR=1 on kepler
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13693>
ir3_compile_shader_nir() should set local_size[] and local_size_variable
fields not only for compute shaders, but for the OpenCL kernels too.
v2: use gl_shader_stage_is_compute() instead of explicit comparison with
MESA_SHADER_[COMPUTE,KERNEL].
Signed-off-by: Andrey Konovalov <andrey.konovalov@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14863>
bare-metal can reboot boards into an existing rootfs on intermittent
device failure, but traces-db doesn't do any sanity-checking of the local
downloads of traces and would proceed to just trying to replay them.
Nuke any existing trace db so that it re-downloads every time, same as
LAVA or docker container tests do.
Fixes: #5585
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15440>
We get a lot of useful coverage from running graphicsfuzz with spilling
enabled, but it's also pretty slow and can cause intermittent hangcheck
failures. I thought I'd categorized them when merging !14839 (device loss
on reset), but it looks like not all of them and we're now more likely to
have flakes take out the whole test run when a single flake makes the rest
of the caselist a flake.
This is a little unfortunate in that it means our test environment is not
the same as a stock system you would want to run deqp on to submit
conformance, but I think it's an improvement in the test maintenance work
vs needing to fix things up later.
We have some other tests besides turnip that can trigger hangchecks which
we might also like this increase for (some disabled traces, for example).
However, freedreno GL has a 5-second timeout waiting for idle when
mapping, and a couple of 2-second timeouts in a row can result in spurious
failures in other tests!
Fixes: #6163
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15435>
We use bi_dontcare() to specify any encoding where we don't care about
the value, with a preference for power-efficient encodings. On Bifrost,
a (possibly nonexistant) FAU read is the best encoding. On Valhall, that
encoding doesn't exist so just use a zero. That should be good enough in
practice.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15461>
When emitting code during or after register allocation, we need to be able to
emit constants without running the constant->{LUT, move, uniform} pass running
after. In particular, we need to access the constant 0 to implement spill code.
Add a Valhall-specific zero for this purpose.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15461>
Instead of one MANUAL_COMMANDS, we now have two deny-lists:
MANUAL_COMMANDS and NO_ENQUEUE_COMMANDS. The former is for things which
have a manually typed implementation in vk_cmd_enqueue.c and the later
is for things we want to ignore entirely. This lets us auto-generate
vk_cmd_enqueue_unless_primary_Cmd* entrypoints for the manually typed
vk_cmd_enqueue_Cmd* entrypoints.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14406>
This fixes bugs with lavapipe's hand-rolled pCount handling. The driver
is supposed to set *pCount to the number of queues actually written in
the case where it's initialized to a value that's too large. It's also
supposed to handle *pCount being too small.
Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15459>
The Vulkan spec was recently clerified to say that transitions only
happen to the bound layers:
"Automatic layout transitions apply to the entire image subresource
attached to the framebuffer. If multiview is not enabled and the
attachment is a view of a 1D or 2D image, the automatic layout
transitions apply to the number of layers specified by
VkFramebufferCreateInfo::layers. If multiview is enabled and the
attachment is a view of a 1D or 2D image, the automatic layout
transitions apply to the layers corresponding to views which are
used by some subpass in the render pass, even if that subpass does
not reference the given attachment."
This is in the context of render passes but it applies to dynamic
rendering because the implicit layout transition stuff is a Mesa pseudo-
extension and inherits those rules.
For clears, the Vulkan spec says:
"renderArea is the render area that is affected by the render pass
instance. The effects of attachment load, store and multisample
resolve operations are restricted to the pixels whose x and y
coordinates fall within the render area on all attachments. The
render area extends to all layers of framebuffer."
Again, this is in the context of render passes but the same principals
apply to dynamic rendering where the layerCount and renderArea are
specified as part of the vkCmdBeginRendering() call.
Fixes: 3501a3f9ed ("anv: Convert to 100% dynamic rendering")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15441>
Any thread we create may end up creating/submitting at least a
noop job, which is a shared object. Before multisync, this was
an issue only for the creation of the job itself, but with
multisync we can also modify parameters of the noop job
every time it is used (for signaling and serialization
configuration).
This change adds a noop mutex that all threads (main, wait and
master) take before submitting a noop job to ensure concurrent
access is not an issue.
Fixes flakyness observed with multisync with the following test:
dEQP-VK.api.command_buffers.secondary_execute_twice
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
If a CPU job comes first in a command buffer with a semaphore wait operation
we need to wait on the CPU for the semaphore to be signaled before we process
the job.
We have been doing this with a WaitForIdle operation, but that only works
if the semaphore has been submitted for signaling from the same instance
of the driver. If we have an imported payload from another instance in our
semaphore however, waitForIdle may return too early since the submission
to signal the semaphore may have been submitted by a different instance
of the driver as well, and our wait for idle checks only know about this
instance submissions.
To fix this, we always submit a noop job from our instance that waits on
the semaphores on the GPU and follow up with WaitForIdle to wait for that
to complete.
Fixes test failures and/or assert crashes in:
dEQP-VK.synchronization.cross_instance.*
(when enabling support for semaphore imports)
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
When we have a wait thread we can't ensure that the last job in the last
command buffer will be the one to signal semaphores because in this case
there is no gurantee that jobs from command buffers in the batch will be
submitted to the GPU in order, as those put in a wait thread will be
submitted later when the event wait operation is completed.
Instead, we need to wait for all outstanding wait threads to complete
and only then we should signal any semaphores or fences.
This also fixes a bug where the wait for events was the last job in
the command buffer. In this case, once the event wait is completed
we have no additional jobs to submit and thus would never try to
signal semaphores or fences.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
This is preparatory work to expose support for importing semaphores, which
was waiting on kernel multisync support.
When we implemented user-space multisync support we didn't handle
temporary fence/semaphore payload imports at all, so we fix that here.
Also, we add a has_temp boolean flag to identify the case where we have
a temporary payload in a fence/sempahore instead of just checking if
temp_sync is not 0. This is necessary to support semaphore imports
(for which we are not exposing support yet) because these need to drop
the temporary payload when they are used as wait semaphores in a submit,
but we can't destroy the underlying temp_sync at that point because it
needs to survive at least until the submit is finished, so instead
we use a flag to tell if we have an active temporary payload or not,
and we simply destroy any temp_sync on a semaphore destroy or any new
import on the same semaphore. We only strictly need this flag for
semaphores because fences drop the temporary payload when they are
reset, which happens in the CPU and can only be done if the GPU is not
using the fence, but we add the same flag for the fence for consistency.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
This path uses a shader blit to implement the copy which is only
supported for tiled images (except 1D). While blit_shader() already
checks for this, this path does a lot of heavy lifting to prepare for
the blit_shader call so we rather avoid that if possible when we know
blit_shader won't be able to implement the blit.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
We had some code that considered the possibility that the destination
might be linear when configuring TFU jobs, but we never actually allow
for this to happen since we avoid hitting these paths in that case, as
the TFU always produces UIF results. Instead, add an assert when
producing the TFU packet to ensure we are expecting a UIF result.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
this fixes desync+crash when:
1. usage is added for bs A
2. tracking is added for bs B
3. tracking is removed for bs B
4. context is destroyed
5. usage A is now dangling and will crash if accessed
as seen in glmark2
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15429>
In addition to use the helper, we also remove some of the lowering we
had at preprocess_nir, as they are called now by the helper.
As we are here we also move the call to nir_lower_sysvals_to_varyings,
that for some reason we were calling it before preprocess_nir.
It is worth to note that with this change we lose the ability to debug
the NIR just after spirv_to_nir using V3D_DEBUG, as now this is done
on vk_spirv_to_nir, and as mentioned that includes several lowerings
now. The workaround to that is to use NIR_DEBUG.
We also needed to change how to check the entrypoint on the broadcom
compiler, checking just if it is an entrypoint, instead of assuming
that the name will be "main".
v2: tweak comment, squash v3dv and compiler change (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15449>
This fixes the test_resolve_non_issued_query_data vkd3d-proton test.
This change is required on TGL+ (maybe ICL?) because on all platforms
3D pipeline writes are not coherent with CS. On previous platform we
fixed this by flushing the render cache to make sure data is visble to
CS before it writes to memory. But on more recently platforms,
flushing the render cache leaves the data in the tile cache which is
still not coherent with CS, so we need to flush that one too.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14552>
It stores the compiled shaders on disk so further executions will load
them from disk instead of compiling, improving the overall performance.
This is noticeable in games where they suddenly get stuck for a while
because they start to compile one or more shaders.
v2:
- Remove comment (Alejandro)
- Use malloc() + helper to simplify code (Alejandro)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15380>
This has been implemented for a while but we could not expose it on
Vulkan 1.0 because the extension declares a dependency on
VK_KHR_sampler_ycbcr_conversion, which we don't implement, and
CTS would complain.
On Vulkan 1.1 however, VK_KHR_sampler_ycbcr_conversion was promoted
to core as an optional feature, and this is enough for the the
dependency to be satisfied, even if the feature is not supported,
meaning that we can now expose the extension.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15426>
Most of the time, this doesn't matter. On the versions with _sat, if
the destination type is incorrect, the clamping will not happen
correctly.
Fixes the following CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_uu_v4i8_out32
v2: Update anv-tgl-fails.txt.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Fixes: 0f809dbf40 ("intel/compiler: Basic support for DP4A instruction")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15417>
The previous alignment of 64 bytes, which we got from the blob,
indicates that single-texel alignment isn't supported. So just do a
trivial no-op implementation that returns the same alignment as before.
This matches what newer blobs that expose this extension do.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15427>
Now all special FAU handling is unified, which makes both assembler and
disassembler considerably nicer. This adds some more special FAU indices from
page 0 that were previously missing, allowing them to be assembled and
disasembled.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15364>
FAU pages do not need to be specified explicitly in the assembly. Rather, they
should be inferred by the assembler by the instructions used. Rewrite the code
handling this in alignment with new information about the hardware.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15364>
This reuses the same UBO analysis to do the pushing in the shader
preamble via the ldc.k instruction instead of in the driver via
CP_LOAD_STATE6. The const_data UBO is exempted as it uses a different
codepath that isn't as critical.
Don't do this on gallium because there are some regressions. Aztec Ruins
in particular regresses a bit, and nothing I've benchmarked benefits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
Since the NIR pass runs very late, it needs to be aware of preambles,
and when creating the instruction we need to move it to the start block
so that RA doesn't overwrite it in the preamble.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
Add in the type, even though it turns out to not be that useful. Add
in support for assembling it. Add some notes based on computerator
experiments. And add support for the indirect a1.x mode that's needed
for storing c64.x and later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
These will be used to implement the ir3-specific shader preamble
lowering in NIR. shps is conceptually similar to getone (although it
technically can't be duplicated) and shpe is similar to other barriers,
since it has to happen after any stores to the constant file in the
preamble. Add NIR intrinsics and plumbs them through ir3.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
Previously we included the reserved user consts (for Vulkan push
constants) as part of the pushed UBO contents, but that led to a problem
because when calculating the worst-case space for UBOs we didn't factor
in the reserved user consts. We'll have the same problem when doing the
same thing in the preamble optimization pass. Stop including the
reserved size in ubo_state::size, and have ir3_setup_consts() add it in
instead, so we won't forget to add it anywhere.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
This pass tries to move computations that are uniform for the entire
draw to the preamble. There's also an API for backends to insert their
own instructions into the preamble, for porting existing UBO pushing
passes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
These are functions that run before the entrypoint at least once per
draw and write their results via store_preamble, and then are loaded in
the rest of the shader via load_preamble.
We will add users in the following commits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13148>
If you hadn't already called wsi_GetPhysicalDeviceDisplayProperties2KHR or
wsi_GetDrmDisplayEXT before calling
GetPhysicalDeviceDisplayPlaneProperties2KHR, then the connectors list
wouldn't be populated and you'd get no plane properties. Fixes failure of
dEQP-VK.wsi.display.get_display_plane_capabilities when run on its own.
Fixes: #4575
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15353>
ANGLE-on-venus-on-turnip and zink-on-turnip want real data here for EGL's
reset tests.
This required moving the remaining GPU-reset-causing tests from flakes or
xfails to skips. Otherwise, the rest of the caselist associated with them
ends up being marked as fails as well. The alternative would be to put
these tests in their own test groups with tests_per_group = 1, but that
didn't seem worth the effort. Or, we could finally do something with
https://gitlab.freedesktop.org/anholt/deqp-runner/-/issues/14.
Fixes: #5955
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14839>
It's tricky to always get the render area to the viewport code. In
particular, it's not provided to secondary command buffers as part of
the inheritance info so we have to bend over backwards and look for a
framebuffer. With VK_KHR_dynamic_rendering, there is no framebuffer and
this approach won't work and we'll need something better if we want
competent guardbands in secondary command buffers.
The good news is that any client that's sloppily rendering and trusting
the clipper to keep things inside the render area will set a scissor and
that's something they have to set inside the secondary. We can dig
through the scissor state and also include the corresponding scissor (if
any) and use that for our render area. This should give us the same
secondary command buffer performance with VK_KHR_dynamic_rendering.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
This is about the only "small" change we can make in the process of
converting from render-pass-based to dynamic-rendering-based. Make
everything in pipeline creation work in terms of dynamic rendering and
create the dynamic rendering structs from the render pass as-needed.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
This was ill-conceived at best. Yes, it checks for a few error
conditions but it doesn't check much and what checks it has are very far
away from the code that relies on those invariants. If we care about
these invariants, we should add asserts near the code that makes those
assumptions rather than pretending to be the validation layers.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
These helpers are used by vkCreateGraphicsPipelines to get the
VkPipelineRenderingCreateInfo and in vkCmdBeginCommandBuffer to get the
VkCommandBufferInheritanceRenderingInfo. This is required because the
Vulkan runtime code can't yet hook and modify calls made to driver-
provided functions. Instead, we just provide a helper to be used in leu
of vk_find_struct_const(). The structs themselves are stored in the
render pass so we can pass back a pointer and there's no need to
construct one on the stack or stuff it in the pipeline.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
This implements vkCmdBeginRenderPass, vkCmdEndRenderPass, and
vkCmdNextSubpass in terms of the new vkCmdBegin/EndRendering included in
VK_KHR_dynamic_rendering and Vulkan 1.3. All subpass dependencies and
implicit layout transitions are turned into actual barriers. It does
require VK_KHR_synchronization2 because it always uses the 64-bit
version of the pipeline stage and access bitfields.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
this draw mode in particular requires driver-specific conversions
for queries (e.g., number of vertices), so pass that info through
the only limitation is that it doesn't work for dlists,
but I have yet to see a real use case of a statistics query being used with dlists
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15326>
If the base register is killed, it may be reused as the destination of a
ldp. In that case we should just skip resetting it afterwards.
Fixes regressions in dEQP-VK.ssbo.layout.random.scalar.38 later.
Fixes: 9912c61362 ("ir3/spill: Support larger spill slot offset")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15288>
We need to prepare for storage buffers having different sizes from
uniform buffers. This switches dynamic_offset_offset to have units of
bytes, the same as offset, and as a nice bonus we can more easily
combine the dynamic and non-dynamic paths in various different places.
This also entails rewriting the code that patches dynamic descriptors,
since we can no longer assume a linear mapping between indices in
dynamicOffsets and descriptor locations which the previous approach
heavily relied on.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15288>
From the Vulkan spec:
"drmFormatModifierTilingFeatures is a bitmask of
VkFormatFeatureFlagBits that are supported by any image created
with format and drmFormatModifier. The returned
drmFormatModifierTilingFeatures must contain at least one bit."
This fixes recent CTS dEQP-VK.drm_format_modifiers.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15383>
We're nowhere close to even having Vulkan 1.0 working yet, there's no
reason to get too excited about 1.1. It just means piles more test
crashes for features we're claiming to support but don't. If we want to
enable more tests, we can turn on the extensions for those features once
we actually have them working.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15334>
cmd_buffer->pool should never be NULL. Even if AllocateCommandBuffers()
fails, the successfully created cmdbuffers would have it set correctly.
From the Vulkan spec:
"VUID-vkFreeCommandBuffers-pCommandBuffers-parent
Each element of pCommandBuffers that is a valid handle must have
been created, allocated, or retrieved from commandPool."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15164>
For weird reasons, the COPY_DATA packet doesn't seem to copy anything
while on the compute queue. Instead, use PKT3_LOAD_SH_REG_INDEX which
seems to work as expected.
Note that LOAD_SH_REG_INDEX on the compute queue is only supported by
the CP on GFX10.3, so we need to implement a different solution (load
from the indirect BO in the shader) for older generations.
This should fix the Control RT GPU hang.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15053>
Handle arrays generically by using the last component of the coordinate source
as the array index. That works for both 2D arrays and cube arrays, fixing cube
arrays. Cube arrays were already handled correctly in core Panfrost code.
This code path is not tested in dEQP-GLES31 without exposing OES_cube_map_array,
which depends on OES_geometry_shader, which we don't have. Yet we do expose
PIPE_CAP_CUBE_ARRAY, so ARB_cube_map_array is exposed.
Disabling PIPE_CAP_CUBE_ARRAY would be an easy band-aid fix, but it's easy
enough to handle correctly.
dEQP-GLES31 passes with a hack enabling OES_cube_map_array [without geometry
shaders].
Also fixes 1D arrays on Bifrost for the same reasons.
Fixes: 70d6c5675d ("pan/bi: Emit TEXC with builder")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15254>
Hardware support was removed with Midgard. Use mesa/st to emulate GL_CLAMP with
nir_lower_tex automatically (the Zink lowering), and disable GL_MIRROR_CLAMP
which isn't lowered correctly.
Fixes *texwrap* Piglit tests on G52.
Fixes: f9ceab7b23 ("panfrost: Fix CLAMP wrap mode")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15253>
Instead of two magic ternary operations, define a new `axis` temporary
which is 1 for X and 2 for Y. Then define everything else in terms of
this variable. In particular, the mask operation we do on LANE_ID is a
mask so it makes more sense to use ~axis than 1/2 but in the other
order.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15352>
The Vulkan spec for VK_KHR_depth_stencil_resolve allows a format
mismatch between the primary attachment and the resolve attachment
within certain limits. In particular,
VUID-VkSubpassDescriptionDepthStencilResolve-pDepthStencilResolveAttachment-03181
If pDepthStencilResolveAttachment is not NULL and does not have the
value VK_ATTACHMENT_UNUSED and VkFormat of
pDepthStencilResolveAttachment has a depth component, then the
VkFormat of pDepthStencilAttachment must have a depth component with
the same number of bits and numerical type
VUID-VkSubpassDescriptionDepthStencilResolve-pDepthStencilResolveAttachment-03182
If pDepthStencilResolveAttachment is not NULL and does not have the
value VK_ATTACHMENT_UNUSED, and VkFormat of
pDepthStencilResolveAttachment has a stencil component, then the
VkFormat of pDepthStencilAttachment must have a stencil component
with the same number of bits and numerical type
So you can resolve from a depth/stencil format to a depth-only or
stencil-only format so long as the number of bits matches.
Unfortunately, this has never been tested because the CTS tests which
purport to test this are broken and actually test with a destination
combined depth/stencil format.
Fixes: 5e4f9ea363 ("anv: Implement VK_KHR_depth_stencil_resolve")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15333>
This changes our error handling to be compatible with upcoming
specification change that unifies glTex[Sub]Image error handling
between OpenGL ES 2.0 vs ES 3.0+ specifications, see:
https://gitlab.khronos.org/opengl/API/-/issues/147
OpenGL ES 2.0.25 spec states:
"Specifying a value for internalformat that is not one of
the above values generates the error INVALID_VALUE. If
internalformat does not match format, the error
INVALID_OPERATION is generated."
This fixes following new tests:
KHR-GLES31.core.compressed_format.*
KHR-GLES32.core.compressed_format.*
v2: GL_INVALID_OPERATION -> GL_INVALID_VALUE in extension
checks, remove (now overlapping) extension checks from
_mesa_gles_error_check_format_and_type (Eric Anholt)
v3: take GLES version in to account in internalformat checks
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12936>
In 4001d9ce1a we started lazily allocating surface states in the
descriptor sets rather than upfront in the descriptor pool. This was
to workaround vkd3d-proton allocating more than we could handle at the
HW level.
The issue introduced in that change is that we didn't protect the
descriptor pool free list as well as the anv_state_stream which are
now potentially used from different threads through the descriptor set
write functions.
This reverts the lazy allocation part of that change. Host only
descriptor sets changes remain.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4001d9ce1a ("anv: Handle VK_DESCRIPTOR_POOL_CREATE_HOST_ONLY_BIT_VALVE for descriptor sets")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15241>
When I switched iris over to use 3DSTATE_BINDING_TABLE_POOL_ALLOC, I
stopped flagging things dirty when allocating a new binder, because
the contents of the binding table were still valid, thanks to us not
having to subtract Surface State Base Address anymore.
This unfortunately missed the point that the old binding table is in the
old buffer, which is no longer what the binder pool base address points
to. So we'd either need to copy it over, or just flag it dirty and
re-emit it on the next draw.
Fixes misrendering in Ryujinx.
Fixes: 8b9045e7a4 ("intel: Use 3DSTATE_BINDING_TABLE_POOL_ALLOC exclusively on Gfx11+")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15314>
If we introduce another queue type (video decode) we can have a
disconnect between the RADV_QUEUE_ enum and the API queue_family_index.
currently the driver has
GENERAL, COMPUTE, TRANSFER which would end up at QFI 0, 1, <nothing>
since we don't create transfer.
Now if I add VDEC we get
GENERAL, COMPUTE, TRANSFER, VDEC at QFI 0, 1, <nothing>, 2
or if you do nocompute
GENERAL, COMPUTE, TRANSFER, VDEC at QFI 0, <nothing>, <nothing>, 1
This means we have to add a remapping table between the API qfi
and the internal qf.
This patches tries to do that, in theory right now it just adds
overhead, but I'd like to exercise these paths.
v2: add radv_queue_ring abstraction, and pass physical device in,
as it makes adding uvd later easier.
v3: rename, and drop one direction as unneeded now, drop queue_family_index
from cmd_buffers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13687>
in the initial implementation, a stream like:
* CmdBeginTransformFeedbackEXT
* CmdSetRasterizerDiscardEnableEXT
* CmdDraw
* CmdEndTransformFeedbackEXT
* CmdBeginTransformFeedbackEXT
* CmdDraw
* CmdEndTransformFeedbackEXT
would never enable transform feedback, as it only checked for the change
in rasterizer_discard state
Fixes: 4d531c67df ("anv: support rasterizer discard dynamic state")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15269>
for genuine early depth tests, the samplecount must be updated after depth
test but before samplemask is applied
for inferred-early or regular depth tests, the samplemask can be applied
before the depth test
Fixes: d9276ae965 ("llvmpipe: handle gl_SampleMask writing.")
fixes:
dEQP-VK.fragment_operations.early_fragment.sample_count_early_fragment_tests_depth_samples_4
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15319>
If a driver sets driver_data but not driver_free_cb, driver_data will
get freed along with the command. If a driver sets driver_free_cb,
driver_data will not get automatically freed but the callback will get
called before the rest of the data structure is freed.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15329>
we can effectively skip any kind of checks here and just assume that one
of two scenarios is in effect:
* the user is about to attempt some incredibly illegal behavior that VVL will catch
* the user is about to attempt a pro gamer move and we'll be fine
in either case, it's EXTENDED_USAGE, so hopefully we're about to make a texture
view from a compatible and supported format
cc: mesa-stable
fixes:
dEQP-VK.image.extended_usage_bit_compatibility.image_format_properties*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15320>
subpass->color_count is (obviously) not set yet, so this would just clobber
the color attachments any time resolves were used
Fixes: 8a6160a354 ("lavapipe: VK_KHR_dynamic_rendering")
fixes:
dEQP-VK.draw.dynamic_rendering.multiple_interpolation.structured.with_sample_decoration.4_samples
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15330>
Normally this wouldn't matter, but it will matter for the upcoming scan
macro because the running tally is communicated through a shared
register across a physical edge. It may also matter if a live-range
split occurs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
We weren't considering the other destinations when allocating a
destination, so we could allocate overlapping destinations. This wasn't
done before because we never had a need for it, but the subgroup
reduction macros will need it.
The trickiest part of this is that we have to rewrite the
compress_regs_left fallback, because we may have to move around the
other already-allocated destinations. We now have a list of destinations
to (re)allocate in addition to the popped live intervals. For the rest
of the destination handling, we can just bail out if the proposed spot
for something overlaps another destination, but for the fallback we have
to handle all the cases gracefully. I also added support for odd
combinations of multiple destinations where some of them are tied, which
we'll use in the next commit to handle early-clobber destinations and
which will actually be used because one of the destinations of the
subgroup reduction macro will be early-clobber. The result is that the
order of intervals to allocate is now a lot more complicated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
For pcopies we only care about the register's type, i.e. whether its a
half-register and whether it's an array (plus its size). Copying over
other flags like IR3_REG_RELATIV just leads to sadness and validator
assertions.
Fixes: 0ffcb19b9d ("ir3: Rewrite register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
Before, we were careful to
1. Get the source physreg.
2. Allocate the destination.
3. Insert a copy with the source being the physreg from step 1.
and this guaranteed that if the tied source were moved in step 2 we'd
still insert a copy from the correct place. However this won't work with
multiple destinations because an earlier destination could've already
moved the tied source around. Instead flip steps 2 and 3 (we'll insert
the copy before we allocate the interval, but that's ok) and run the
first two steps in a separate loop before any destinations are
allocated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
Note: this is a behavior change for arrays, because it will count the
entire array instead of just the components written in the register
pressure calculation. However this is more accurate since this matches
how RA works.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14107>
Re-use auto-generated vk_cmd_enqueue entrypoints instead of generating
our own version doing the same thing. In order to effectively do this,
we also add an allow-list of which entrypoints lavapipe actually handles
to avoid issues where the autogenerated one stomps a vkCmdFoo2 wrapper.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15311>
The test suite is full of flakes around transform feedback, atomics, and
tess. But, I hope it can be useful for regression testing core Mesa
reworks.
This required updating the kernel to 5.16.12 to get a more stable boot
process. That kernel rebuild caused an update of the container with
piglit which that was missed in a previous MR, so we got new xfails in x86
swrast.
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> (nouveau)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15201>
This required updating the kernel to 5.16.12 to get a more stable boot
process. That kernel rebuild caused an update of the container with
piglit which that was missed in a previous MR, so we got new xfails in x86
swrast. Also, including modules on arm64 exposed a bug in v3d's
poe-powered.sh rsyncing of modules.
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15201>
this hardware won't return the correct value from dmod instructions,
so lower it to ensure that cts passes
nobody else will ever hit this, so perf isn't an issue and regular fmod
can be left alone
fixes (amd):
KHR-GL46.gpu_shader_fp64.builtin.mod_d*
Fixes: 5fae35fb17 ('zink: fix 64bit float shader ops ')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15306>
From the VK_KHR_shader_float_controls extension:
"5) Do any of the “Pack” GLSL.std.450 instructions count as
conversion instructions and have the rounding mode applied?"
"RESOLVED: No, only instructions listed in “section 3.32.11.
Conversion Instructions” of the SPIR-V specification count as
conversion instructions."
This is also the same logic as the LLVM backend.
No fossils-db changes on Sienna Cichlid.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15301>
The multiViewport feature isn't required for GL 4.3, it's required for
GL 4.1. Technically speaking, we could have just dropped it because we
already list the maxViewports requirement. But it seems better to be
very clear here to me.
Fixes: 29f8f21bff ("docs: document zink GL 4.3 requirements")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15300>
We have twice the registers in this case so it makes sense to double
this as well. While this causes slight regressions in shader-db
stats (due to additional register pressure), it helps us hide latency
of memory reads better on 2-thread compiles, where the thread switch
mechanism will be less effective. This shows a ~3% performance
improvement on the UE4 SunTemple demo.
total instructions in shared programs: 12642413 -> 12656164 (0.11%)
instructions in affected programs: 2272652 -> 2286403 (0.61%)
helped: 2924
HURT: 3389
total uniforms in shared programs: 3703861 -> 3704776 (0.02%)
uniforms in affected programs: 213729 -> 214644 (0.43%)
helped: 823
HURT: 1272
total max-temps in shared programs: 2150686 -> 2153505 (0.13%)
max-temps in affected programs: 191332 -> 194151 (1.47%)
helped: 1900
HURT: 1891
total spills in shared programs: 3255 -> 3274 (0.58%)
spills in affected programs: 166 -> 185 (11.45%)
helped: 3
HURT: 6
total fills in shared programs: 4630 -> 4656 (0.56%)
fills in affected programs: 367 -> 393 (7.08%)
helped: 7
HURT: 15
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
This can add quite a bit of register pressure so it makes sense to disable it
to prevent us from dropping to 2 threads or increase spills:
total instructions in shared programs: 12672813 -> 12642413 (-0.24%)
instructions in affected programs: 256721 -> 226321 (-11.84%)
helped: 719
HURT: 77
total threads in shared programs: 415534 -> 416322 (0.19%)
threads in affected programs: 788 -> 1576 (100.00%)
helped: 394
HURT: 0
total uniforms in shared programs: 3711370 -> 3703861 (-0.20%)
uniforms in affected programs: 28859 -> 21350 (-26.02%)
helped: 204
HURT: 455
total max-temps in shared programs: 2159439 -> 2150686 (-0.41%)
max-temps in affected programs: 32945 -> 24192 (-26.57%)
helped: 585
HURT: 47
total spills in shared programs: 5966 -> 3255 (-45.44%)
spills in affected programs: 2933 -> 222 (-92.43%)
helped: 192
HURT: 4
total fills in shared programs: 9328 -> 4630 (-50.36%)
fills in affected programs: 5184 -> 486 (-90.62%)
helped: 196
HURT: 0
Compared to the stats before adding scheduling of non-filtered
memory reads we see we that we have now gotten back all that was
lost and then some:
total instructions in shared programs: 12663186 -> 12642413 (-0.16%)
instructions in affected programs: 2051803 -> 2031030 (-1.01%)
helped: 4885
HURT: 3338
total threads in shared programs: 415870 -> 416322 (0.11%)
threads in affected programs: 896 -> 1348 (50.45%)
helped: 300
HURT: 74
total uniforms in shared programs: 3711629 -> 3703861 (-0.21%)
uniforms in affected programs: 158766 -> 150998 (-4.89%)
helped: 1973
HURT: 499
total max-temps in shared programs: 2138857 -> 2150686 (0.55%)
max-temps in affected programs: 177920 -> 189749 (6.65%)
helped: 2666
HURT: 2035
total spills in shared programs: 3860 -> 3255 (-15.67%)
spills in affected programs: 2653 -> 2048 (-22.80%)
helped: 77
HURT: 21
total fills in shared programs: 5573 -> 4630 (-16.92%)
fills in affected programs: 3839 -> 2896 (-24.56%)
helped: 81
HURT: 15
total sfu-stalls in shared programs: 39583 -> 38154 (-3.61%)
sfu-stalls in affected programs: 8993 -> 7564 (-15.89%)
helped: 1808
HURT: 1038
total nops in shared programs: 324894 -> 323685 (-0.37%)
nops in affected programs: 30362 -> 29153 (-3.98%)
helped: 2513
HURT: 2077
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
We do a few changes over NIR's defaults:
1. Lower delay for texture reads. Empirically, we don't observe any
benefits with delays over 50 and since this delay value is still
used by the scheduler in the "favor register pressure" case it is
benefitial to avoid overestimating it too much.
2. Adjust delay for non-filtered TMU reads to the delay selected for
texture reads.
3. In our case, UBO reads from dynamically uniform addresses don't
use the TMU and have a latency of 1 instruction in the best case
scenario or 4 at worse, so we go with 1 so we don't try to move
this early.
This helps us get back some of what we lost when updating the
default scheduler configuration to add a delay for non-filtered
memory reads:
total instructions in shared programs: 13126587 -> 12671765 (-3.46%)
instructions in affected programs: 3764097 -> 3309275 (-12.08%)
helped: 14664
HURT: 4244
total threads in shared programs: 407208 -> 415522 (2.04%)
threads in affected programs: 8716 -> 17030 (95.39%)
helped: 4224
HURT: 67
total uniforms in shared programs: 3812698 -> 3711224 (-2.66%)
uniforms in affected programs: 335170 -> 233696 (-30.28%)
helped: 2816
HURT: 3551
total max-temps in shared programs: 2318430 -> 2159345 (-6.86%)
max-temps in affected programs: 539991 -> 380906 (-29.46%)
helped: 13173
HURT: 1440
total spills in shared programs: 49086 -> 5966 (-87.85%)
spills in affected programs: 48306 -> 5186 (-89.26%)
helped: 1655
HURT: 28
total fills in shared programs: 55810 -> 9328 (-83.29%)
fills in affected programs: 54821 -> 8339 (-84.79%)
helped: 1659
HURT: 22
LOST: 0
GAINED: 3
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
This has been pending for a long time. It is not very consistent to
add a significant delay for textures and not do it for UBOs, etc
The reason we have not been doing this so far is the accumulated effect
on register pressure for V3D as shown by shader-db results below, but
from the point of view of a generic scheduler it makes sense to do this.
Later patches will address V3D specific issues with register pressure
derived from this by letting the driver control its instruction delay
settings.
total instructions in shared programs: 12662138 -> 13126587 (3.67%)
instructions in affected programs: 1813091 -> 2277540 (25.62%)
helped: 2410
HURT: 10499
total threads in shared programs: 415858 -> 407208 (-2.08%)
threads in affected programs: 17348 -> 8698 (-49.86%)
helped: 8
HURT: 4333
total uniforms in shared programs: 3711483 -> 3812698 (2.73%)
uniforms in affected programs: 128012 -> 229227 (79.07%)
helped: 3474
HURT: 2143
total max-temps in shared programs: 2138763 -> 2318430 (8.40%)
max-temps in affected programs: 318780 -> 498447 (56.36%)
helped: 588
HURT: 11997
total spills in shared programs: 3860 -> 49086 (1171.66%)
spills in affected programs: 709 -> 45935 (6378.84%)
helped: 23
HURT: 1595
total fills in shared programs: 5573 -> 55810 (901.44%)
fills in affected programs: 1067 -> 51304 (4708.25%)
helped: 23
HURT: 1595
LOST: 3
GAINED: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
We can get a generic nir_intrinsic_memory_barrier to represent a
barrier involving multiple semantics (instead of getting individual
specific barriers for each semantic). This means that we need to
consider these as potentially affecting shared memory access as well.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
This doesn't have any significant impact shader-db stats and would
reduce our capacity to hide latency from the loads, so it is probably
undesirable:
total instructions in shared programs: 12663189 -> 12663186 (<.01%)
instructions in affected programs: 4222 -> 4219 (-0.07%)
helped: 9
HURT: 4
total uniforms in shared programs: 3711624 -> 3711629 (<.01%)
uniforms in affected programs: 186 -> 191 (2.69%)
helped: 0
HURT: 2
total max-temps in shared programs: 2138822 -> 2138857 (<.01%)
max-temps in affected programs: 569 -> 604 (6.15%)
helped: 1
HURT: 9
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
On Icelake and later, we can use a new 3DSTATE_BINDING_TABLE_POOL_ALLOC
command to update the location of the binder (buffer containing binding
table entries), rather than having to move Surface State Base Address
via a STATE_BASE_ADDRESS command. This has less stalling and also means
our surface addresses can remain relative to a fixed 4GB address range,
meaning we don't have to re-stream them any time the binder changes.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
This workaround is needed on all Gfx12.0 parts, but doesn't appear to be
necessary on XeHP. The other drivers do not appear to be applying this
workaround on those parts. As further evidence, we accidentally added
the 3DSTATE_BINDING_TABLE_POOL_ALLOC commands after switching back to
GPGPU mode, which would be an incorrect way to implement the workaround,
and things seem to be working.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
On Gfx11+, we're going to stop changing Surface State Base Address
and instead start changing the Binding Table Pool address instead.
So, rename a few things to track the last binder address, which is
what we're actually changing, regardless of how we program it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
Skylake and older use a 15:5 binding table pointer format, which means
our binder can be at most 64kB in size. Each binding table within the
binder must be aligned to 32B.
XeHP uses a new 20:5 binding table format, which allows us to increase
the binder size to 1MB while retaining the nice 32B alignment. Larger
binders mean fewer stalls as we update the base address for the binder.
Icelake and Tigerlake can either use the 15:5 format or an 18:8 format.
18:8 mode requires the base of each binding table to be aligned to 256B
instead of 32B, but it gives us a maximum binder size of 512kB.
We can store 64 binding table entries in a 256B chunk (256B / 4B = 64),
but only 8 entries in a 32B chunk (32B / 4B = 8). Assuming that most
binding tables have fewer than 64 entries, this means that with the 18:8
format, we're likely to be able to fit 2048 (512KB / 256B) tables into a
a buffer before needing to allocate a new one and stall.
Technically, the old format could also store 2048 binding tables per
buffer as well (64KB / 32B = 2048). However, tables that needed more
than 8 entries would need multiple 32B chunks. A single table would
take multiple aligned chunks, while with the larger 256B format, it
could fit in a single one.
This cuts binder resets by 6.3% on a Shadow of Mordor benchmark trace.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
this comes in two flavors:
* streamout of array<struct/block>
* partial streamout of array/struct/block
for the former:
* arrays of structs can just be blasted out in the initial var declaration (easy)
* arrays of blocks must be output to separate xfb buffers for each array block,
which requires skipping initial xfb blast-off and instead propagating the values
using tmp variables at a later point
for the latter:
* the optimal way to do this is to unwrap the struct first to figure out what's being
emitted, at which point the value can be extracted and exported
fixes the rest of spec@arb_gl_spirv@execution@xfb
...except spec@arb_gl_spirv@execution@xfb@vs_block_array, which I'm suspecting is broken
due to vtn bugs
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>
this now handles inlining of stupid types (dvec3, dmatX) and complex
types (goku) as seen in cts
fixes:
KHR-Single-GL46.enhanced_layouts.xfb_explicit_location
KHR-Single-GL46.enhanced_layouts.xfb_struct_explicit_location
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>
This commit fixes the following flaws in the implementation:
* when a resource was re-allocated, the guest side storage
was also allocated
* when a source needs a readback before being written to, then
the call would go through vws->transfer_get, thereby bypassing the
staging resource, and this would fail on the host, because no
the allocated IOV was too small (just one byte)
* if the texture write would need neither flush nor readback, the
old code path would be used expecting that guest side backing stogage
for the texture.
v2: - actually do a readback to the stageing resource when it is required
- fix typo (Lepton)
v3: Don't use stageing transfers if the host can't read back the data
by rendering to an FBO or calling getTexImage, because in this case
we rely on the IOV to hold the date.
v4: Also don't use staging transfers if the format is no readback
format. Otherwise we have to deal with the resolve blit, and
this is currently not working correctly.
v5: add a new flag that indicates whether non-renderable textures can
be read back (either via glGetTexImage or GBM)
v6: Restrict the use of staging texture transfers to textures that can
be read back, and on GLES also if the they are bound to scanout and
the host uses minigbm to allocate such textures.
For that replace the flag indicating the capability to read back
non-renderable textures with a cap that indicates whether scanout
textures can be read back.
v7: update virglrenderer version in the CI
v8: update use of stageing (Chia-I)
v9: remove superflous check and assignment (Chia-I)
v10: disable stageing textures for arrays with stencil format. This is a
workaround for failures of the CI.
Fixes: cdc480585c
virgl/drm: New optimization for uploading textures
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14495>
this was being set from back before zink actually supported 64bit
natively and only 32bit was functional, but it breaks 64bit support
cc: mesa-stable
fixes (lavapipe):
KHR-GL46.gpu_shader_fp64.builtin.mod_dvec2
KHR-GL46.gpu_shader_fp64.builtin.mod_dvec3
KHR-GL46.gpu_shader_fp64.builtin.mod_dvec4
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15274>
Run crosvm as a background process in order to allow intercepting
interrupt signals (INT, TERM) and properly release/cleanup any allocated
resources.
This is particularly helpful when one or more crosvm tasks hang, which
will eventually prevent subsequent instances to be started - currently
we can handle up to 128 concurrent crosvm instances per runner.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15238>
When using indirect textures, some lanes may not be active,
particularly in a loop, so as with some other areas, extracting
the correct lane is needed here. This extracts the last valid one.
KHR-GL45.texture_barrier.* on zink.
Fixes: e168d148d7 ("gallivm/nir: handle non-uniform texture offsets")
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15259>
Experiments suggest that e.g.
add.u r0.y, hr0.x, hr0.y
will result in the summed value in both the high and low words of r0.y.
This only happens with odd registers, not even ones (r0.x works fine).
Seen in the bit_count lowering (which turns out to be unnecessary, but
this is still a larger problem).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15251>
Along with fixing the grammar, this allows it to be deduplicated since
the properly worded description exists in later generations' XMLs.
Cuts 96 B from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
917613 0 0 917613 e006d meson-generated_.._intel_perf_metrics.c.o (before)
917511 0 0 917511 e0007 meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14131044 365708 210004 14706756 e06844 iris_dri.so (before)
14130948 365708 210004 14706660 e067e4 iris_dri.so (after)
text data bss dec hex filename
8124321 214264 22820 8361405 7f95bd libvulkan_intel.so (before)
8124225 214264 22820 8361309 7f955d libvulkan_intel.so (after)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
The compiler does a good job of deduplicating strings already, but we
can eliminate the pointers to each string by combining the strings into
a single char array and storing only an index into that array.
The longest of the char arrays is the descriptions array, which is a
little over 45 KiB, so still under MSVC's 64 KiB string literal limit
[0]. Because the string length is under 64 KiB we can use uint16_t as
the index type, which roughly doubles our savings as compared to an int.
This cuts 77 KiB from iris_dri.so (0.5%) and libvulkan_intel.so (0.9%).
text data bss dec hex filename
926811 25920 0 952731 e899b meson-generated_.._intel_perf_metrics.c.o (before)
924401 0 0 924401 e1af1 meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14190852 391628 210004 14792484 e1b724 iris_dri.so (before)
14137732 365708 210004 14713444 e08264 iris_dri.so (after)
text data bss dec hex filename
8184097 240184 22820 8447101 80e47d libvulkan_intel.so (before)
8131009 214264 22820 8368093 7fafdd libvulkan_intel.so (after)
relinfo:
iris_dri.so (before): 17765 relocations, 17545 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
iris_dri.so (after) : 15605 relocations, 15385 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (before): 10720 relocations, 6989 relative (65%), 355 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (after) : 8560 relocations, 4829 relative (56%), 355 PLT entries, 1 for local syms (0%), 0 users
[0] https://docs.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-170&viewFallbackFrom=vs-2019
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
intel_perf_query_counter contains fields for things we can't or don't
want to store in our static data (like runtime-determined max values) or
oa_read_counter function pointers which are dependent on the GPU gen and
would make deduplication very ineffective.
Cuts 16 KiB from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
926811 43200 0 970011 ecd1b meson-generated_.._intel_perf_metrics.c.o (before)
926811 25920 0 952731 e899b meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14190852 408908 210004 14809764 e1faa4 iris_dri.so (before)
14190852 391628 210004 14792484 e1b724 iris_dri.so (after)
text data bss dec hex filename
8184097 257464 22820 8464381 8127fd libvulkan_intel.so (before)
8184097 240184 22820 8447101 80e47d libvulkan_intel.so (after)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
And specifically mark it with ATTRIBUTE_NOINLINE. Otherwise it will be
inlined and actually slightly increase code size.
Cuts 505 KiB from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
1538720 0 0 1538720 177aa0 meson-generated_.._intel_perf_metrics.c.o (before)
926811 43200 0 970011 ecd1b meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14751700 365708 210004 15327412 e9e0b4 iris_dri.so (before)
14190852 408908 210004 14809764 e1faa4 iris_dri.so (after)
text data bss dec hex filename
8744913 214264 22820 8981997 890ded libvulkan_intel.so (before)
8184097 257464 22820 8464381 8127fd libvulkan_intel.so (after)
Relocations increase because the counter initializations are moved from
code (in .text) to pointers (in .text) to .rodata, which require
relocations.
relinfo:
iris_dri.so (before): 15605 relocations, 15385 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
iris_dri.so (after) : 17765 relocations, 17545 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (before): 8560 relocations, 4829 relative (56%), 355 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (after) : 10720 relocations, 6989 relative (65%), 355 PLT entries, 1 for local syms (0%), 0 users
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
No changes in resulting code (yes, seriously!). GCC constant propagates
the static const arrays into the code, yielding bit for bit identical
results. This will however enable further cleanups.
Before this patch, we emit 11916 different initializations of
intel_perf_query_counter. With this patch we emit an array of 539 and
initialize the intel_perf_query_counters in terms of those.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
Instead of allocating cpu_storage in threaded_resource_init, defer the
allocation to first use (in tc_buffer_map).
This avoids needless memory allocation if tc_buffer_disable_cpu_storage is
called before tc_buffer_map.
map_buffer_alignment is stored and serves as a "can cpu_storage be used" flag.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15074>
When "dri2_wl_formats_init" fails in "dri2_initialize_wayland_swrast",
the "dri2_display_destroy" function is called for clean up. However, the
"dri2_egl_display" was not associated with the display in its
"DriverData" field yet.
The following cast in "dri2_display_destroy":
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
Expands to:
_EGL_DRIVER_TYPECAST(drvname ## _display, _EGLDisplay, obj->DriverData)
Crashing.
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13972>
When "dri2_wl_formats_init" fails in "dri2_initialize_wayland_drm", the
"dri2_display_destroy" function is called for clean up. However, the
"dri2_egl_display" was not associated with the display in its
"DriverData" field yet.
The following cast in "dri2_display_destroy":
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
Expands to:
_EGL_DRIVER_TYPECAST(drvname ## _display, _EGLDisplay, obj->DriverData)
Crashing.
Addresses-Coverity-ID: 1494541 ("Resource leak")
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13972>
when migrating a recycled set here, the set was previously invalid and
in the recycled table, meaning it can be reused directly so long as
it's first invalidated
the previous code would instead pop a different set off the allocation array,
leaking this one
cc: mesa-stable
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15226>
this got mixed up during some refactor and started indexing based on the
number of bindings instead of the number of descriptors, which means
that array descriptor bindings would have overlapping array memory
cc: mesa-stable
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15226>
the variant data may have changed in the meanwhile, so do a pass to
ensure that the correct variant is used
cc: mesa-stable
fixes caselist:
dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_two_samples.multisample_texture_4
dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_two_samples.singlesample_rbo
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15226>
Another instruction might write to the second source, and then an
INSTR_INVALID_ENC fault will be raised because the tuple will write to
and read from the register at the same time.
Fixes: 795638767d ("pan/bi: Use fused dual source blending")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
BITSET_MASK returns ~0 when given an input of zero, when we need it to
return 0 instead.
Fixes shaders with only sysvals but no UBOs when push constants are
disabled.
This breaks when 31 or 32 UBOs are used, but PAN_MAX_CONST_BUFFERS is
currently set to 16.
Fixes: c246af0dd8 ("panfrost: Only upload UBOs when needed")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
This could be made slightly more efficient by only setting the dirty
state that is needed, but eventually you reach a point where it's
cheaper to re-emit everything than work out what can or can't be kept.
Fixes rendering issues in Duckstation.
Fixes: cd2c1ef9da ("panfrost: Dirty track textures/samplers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
TEXC can have two destinations; the value for neither of them can be
used in the same bundle, so extend the code to check for this to
iterate over both destinations.
Fixes artefacts in the game "LIMBO".
Fixes: a303076c1a ("pan/bi: Add bi_instr_schedulable predicate")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
Trying to write to overlapping register ranges from a single
instruction is undefined behaviour, so add interference between the
nodes to avoid this.
Hit in a dual-texture instruction in LIMBO.
Fixes: 9146bafbb4 ("pan/bi: Add dual texture fusing pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
When discarding the whole resource to create a new one, if this resource
is used by a sampler view, a rebind must be done to use the new
resource.
But this must be done when setting the sampler views, because we don't
have access to those samplers before.
v2:
- Pack shader state on setting sampler views (Iago)
- Use a serial ID to know when to rebind sampler views (Juan)
v3:
- Move check to caller (Iago)
- Keep rebind sampler view on BO change (Iago)
v4:
- Rename "serial_bo" to "serial_id" (Iago)
- Add comments (Iago)
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6027
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15171>
The Gallium pipe video "frame_num" variable is internally used as a
counter of elapsed reference frames since the last IDR. The incoming
frame_num field from VA picture parameters is not equivalent; the VA
value may wrap to zero prematurely, as it is a 16-bit struct field with
a documented max value of 2^(log2_max_frame_num_minus4 + 4)-1.
This change improves "infinite GOP" single-client live streaming, where
it is reasonable for the server to desire an endless series of P-frames
without IDR. Without this change, it is difficult/impossible for an
application to encode a P- or B-frame after the VA frame_num field wraps
around to zero, depending on the backend encoder implementation.
This change has no effect on existing applications that always signal an
IDR frame and reset the VA frame_num to zero before it wraps around. For
example, the FFmpeg vaapi encoder ignores the VA documentation and sends
an un-wrapped VA frame_num, which results in identical computation of
the internal frame_num (as long as each GOP is less than 65536 frames).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5768
Reviewed-by: Thong Thai <thong.thai@amd.com>
patch revision 3: correctly avoid incrementing frame_num when the encoded
frame is not a reference, per h264 spec and ffmpeg behavior
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14332>
Only allow this in situations where we know it's safe. In particular, this
stops removal of unconditional branches like with
block_kind_continue_or_break.
Fixes dEQP-VK.graphicsfuzz.fragcoord-control-flow hang.
fossil-db (Sienna Cichlid):
Totals from 34 (0.02% of 162293) affected shaders:
Instrs: 84115 -> 84178 (+0.07%); split: -0.00%, +0.08%
CodeSize: 463372 -> 463624 (+0.05%); split: -0.00%, +0.06%
Latency: 3467316 -> 3467652 (+0.01%)
InvThroughput: 3085493 -> 3085578 (+0.00%)
Branches: 3221 -> 3284 (+1.96%); split: -0.03%, +1.99%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: f030b75b7d ("aco: relax condition to remove branches in case of few instructions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15214>
vecN instructions which are only used by other ALU
will now get duplicate channels removed.
i915g:
total instructions in shared programs: 396309 -> 396294 (<.01%)
instructions in affected programs: 186 -> 171 (-8.06%)
r300:
total instructions in shared programs: 1165059 -> 1164354 (-0.06%)
instructions in affected programs: 35884 -> 35179 (-1.96%)
total temps in shared programs: 165497 -> 165326 (-0.10%)
temps in affected programs: 2990 -> 2819 (-5.72%)
softpipe:
total instructions in shared programs: 2860028 -> 2859084 (-0.03%)
instructions in affected programs: 55539 -> 54595 (-1.70%)
total temps in shared programs: 516939 -> 516546 (-0.08%)
temps in affected programs: 6623 -> 6230 (-5.93%)
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12468>
This gives
MESA-VIRTIO: debug: stuck in ring seqno wait with iter at 4096
MESA-VIRTIO: debug: stuck in ring seqno wait with iter at 8192
MESA-VIRTIO: debug: stuck in ring seqno wait with iter at 12288
MESA-VIRTIO: debug: stuck in ring seqno wait with iter at 16384
MESA-VIRTIO: debug: aborting
Aborted
which should be more friendly than printing the messages forever.
On my i7-7820HQ, this aborts after roughly 4+8+16+32=60 seconds
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15200>
NIR-to-TGSI produces partial output writes contrary to the old paths
that always wrote the full outputs. Therefore if there is now a partial
output write ready to be scheduled and nothing else besides a tex
is ready, we would schedule the output write first. This was not a
problem before as usually at last some component of the full output write
depended on the tex result.
This is not optimal from the performance point of view and resulted in
~20% slowdown in the Unigine demos. The docs say:
The first OUTPUT instruction will reserve space in the output register
fifo. This space is limited, therefore issuing an OUTPUT earlier than
necessary may cause threads to stall earlier than necessary. You
should not set an ALU instruction as type OUTPUT unless it is actually
writing to an output register, or it is the last instruction of
the program.
Fix it by explicitly prefering a TEX before OUT and restore the
performance: 9.66 -> 12.12 fps (as compared to 11.83 with the old
glsl-to-TGSI path) in Unigine Sanctuary. No change in Lightsmark or
GLmark.
This is also a win from the intructions point of view as we are usually
able to schedule the partial output writes in a single pair at the end.
total instructions in shared programs: 106009 -> 105891 (-0.11%)
instructions in affected programs: 10153 -> 10035 (-1.16%)
helped: 118
HURT: 0
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5840
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15165>
The max_score == -1 condition is already before so this
will never trigger. Its unclear what was the intention anyway. Now we
emit either:
- if we have accumulated enough tex intructions for a full block
- if we have nothing else to emit
- or if we can emit all remaining tex instructions already.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15165>
New extensions properties/feature are being put in the `vn_physical_device`
which is not ideal from an organization point of view.
Here the `vn_physical_device_{features,properties}` are two new struct to
help the `vn_physical_device` organzation.
Signed-off-by: Igor Torrente <igor.torrente@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15170>
Something is slightly off in the integer values returned. It passes many
tests without the fixup, but the dEQP-GLES31 tests complain. The blob
ends up doing 3x gathers, and selects between them based on getinfo
results. Since we already have a per-sampler key with some spare bits,
just stick the bit-size info in there. And we can derive signedness from
the associated type info.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14670>
It can't be done. This just provides bad results. The blob had a
comparable approach where they fixed up coordinates, but that also can't
work with a separate texture definition with nearest filtering. By then,
might as well provide a unswizzled variant instead, and using native
functionality.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14670>
From Section 4.4.1 (Input Layout Qualifiers) of the GLSL 4.50 spec:
"For some blocks declared as arrays, the location can only be applied
at the block level: When a block is declared as an array where
additional locations are needed for each member for each block array
element, it is a compile-time error to specify locations on the block
members. That is, when locations would be under specified by applying
them on block members, they are not allowed on block members. For
arrayed interfaces (those generally having an extra level of
arrayness due to interface expansion), the outer array is stripped
before applying this rule"
From Section 1.2.1 (Changes from Revision 6 of GLSL Version) of the GLSL 4.50 spec:
"Private Bug 15678: Don’t allow location = on block members where
the block needs an array of locations"
From Section 4.4.1 (Input Layout Qualifiers) of the GLSL ES 3.20 spec
"If an input is declared as an array of blocks, excluding per-vertex-arrays
as required for tessellation, it is an error to declare a member of
the block with a location qualifier"
From Section 1.1.3 (Changes from GLSL ES 3.2 revision 3) of the GLSL ES 3.20 spec:
"Arrayed blocks cannot have layout location qualifiers on members"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11522>
if one of these states change then it affects which result needs to be
used for that query, so split it up over multiple query ids to make sure
the correct result is obtained
fixes (lavapipe):
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_pause_resume
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_states
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15227>
When the AOS/linear code was added it only worked with TGSI which
meant nothing in mesa upstream was really using it.
This adds support to analyse NIR shaders, and adds aos support
to the backend.
AOS support is limited to mov,vec,fmul,tex sampling in order to
accelerate mostly compositing operations. I've tested weston uses
the fast path. gnome-shell can't use it yet as we can't optimise
the depth test paths.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15140>
For Bifrost, we model load/store segments, for example for thread local storage.
We need something similar on Valhall -- access modifiers. There are four access
modifiers on Valhall, controlling memory subsystem optimizations for the access:
none: Nothing may be assumed. Corresponds to "global".
istream: Internally streaming within the GPU. Corresponds to "pos", as it's
used for position stores.
estream: Externally streaming outside the GPU. Corresponds to "vary", as it's
used for varying stores.
force: Force access in discarded threads. Corresponds to "tl", as it's required
for correct behaviour of helper invocations that use the stack.
If these access modifiers end up being useful outside these fixed purposes, we
may need to rework this part of the IR. For now, this should suffice.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15216>
This can avoid some cases where a constant has to be loaded into a
temporary register.
v2: Update i915-g33-fails.txt.
total instructions in shared programs: 788625 -> 782376 (-0.79%)
instructions in affected programs: 166269 -> 160020 (-3.76%)
helped: 1578
HURT: 0
helped stats (abs) min: 3 max: 21 x̄: 3.96 x̃: 3
helped stats (rel) min: 1.56% max: 33.33% x̄: 4.82% x̃: 3.45%
95% mean confidence interval for instructions value: -4.06 -3.86
95% mean confidence interval for instructions %-change: -5.00% -4.64%
Instructions are helped.
LOST: 0
GAINED: 35
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15210>
I noticed the SGE case while looking at the output of
shaders/closed/steam/trine-2/fp-3.shader_test on i915g. These are
especially bad on i915 that needs two instructions to implement SNE.
An alternative would be to duplicate the sne(sXX(a, b), 0.0) rules in an
algebraic pass that occurs after bool_to_float. Doing the work earlier
seems preferable.
i915
total instructions in shared programs: 788274 -> 788223 (<.01%)
instructions in affected programs: 666 -> 615 (-7.66%)
helped: 5
HURT: 0
helped stats (abs) min: 9 max: 12 x̄: 10.20 x̃: 9
helped stats (rel) min: 5.00% max: 11.11% x̄: 8.12% x̃: 8.16%
95% mean confidence interval for instructions value: -12.24 -8.16
95% mean confidence interval for instructions %-change: -10.81% -5.43%
Instructions are helped.
LOST: 0
GAINED: 2
The two gained shaders are assembly fragment programs in Euro Truck
Simulator 2.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15210>
Many users of nir_vec() do so by nir_channel()-ing a new ssa defs as movs
from other vectors to put the new vector together, which then just have to
get copy-propagated into the ALU srcs and DCEed away the temporary movs.
If they instead take nir_ssa_scalar, we can avoid that extra work.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14865>
"queueFamilyIndexCount is the number of queue families having access to the image(s) of the
swapchain when imageSharingMode is VK_SHARING_MODE_CONCURRENT.
pQueueFamilyIndices is a pointer to an array of queue family indices having access to the
images(s) of the swapchain when imageSharingMode is VK_SHARING_MODE_CONCURRENT."
If the type isn't concurrent, don't attempt to access the arrays.
dEQP-VK.wsi.xlib.swapchain.create.exclusive_nonzero_queues on lavapipe.
Fixes: 5b13d74583 ("vulkan/wsi/drm: Break create_native_image in pieces")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15101>
We were just emitting the bad reloc (either an assert fail on a debug
build or for a release build likely a GPU hang from the resulting fault).
Given that the GLES 3.2 spec's robust context requirement says we should
return undefined data but not terminate for element indices outside of the
VB, ignoring the offset in this case seems like a better behavior to have
in all cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15198>
We only have a single prog_data::total_scratch for all shader variants
(SIMD 8, 16, 32). Therefore we should always max the total_scratch on
top of existing variant.
We probably haven't run into that issue before because we compile by
increasing SIMD size and higher SIMD size is more likely to spill. But
for bindless shaders with return shaders, if the last return part
doesn't spill, we completely ignore the previous parts' scratch
computation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15193>
Now that we don't sort our nodes we can arrange them so we can
easily translate between nodes and temps without a mapping table,
just applying an offset.
To do this we have a single array of nodes where twe put first the nodes
for accumulators and then the nodes for temps. With this setup we can
ensure that for any given temp T, its node is always T + ACC_COUNT.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
Nodes are allocated in order to registers so initially sorting
was used to ensure that nodes with smaller life ranges would
be assigned first and therefore be more likely to get
accumulators.
However, since d81a6e5f1d now we don't rely on order to make
decisions about accumulators and instead we make policy decisions
based on actual liveness, so sorting is no longer strictly
relevant to this decision.
Furthermore, we are not re-sorting nodes after each spill either,
since that would probably require that we rebuild the interference
graph after each spill (the graph identifies nodes by their index).
Shader-db results show a significant improvement in instruction
counts, due to more optimal accumulator assignments. The reason for
this is that we use a round-robin policy for choosing the next
accumulator to assign. The idea behind this is preventing nearby
temps to be assigned to the same accumulator so that QPU scheduling
is more flexible, but if we sort our nodes, we are basically not
assigning temps in program order any more and the round-robin policy
becomes less effective:
total instructions in shared programs: 13000420 -> 12663189 (-2.59%)
instructions in affected programs: 11791267 -> 11454036 (-2.86%)
helped: 62890
HURT: 19987
total threads in shared programs: 415874 -> 415870 (<.01%)
threads in affected programs: 20 -> 16 (-20.00%)
helped: 2
HURT: 4
total uniforms in shared programs: 3711652 -> 3711624 (<.01%)
uniforms in affected programs: 43430 -> 43402 (-0.06%)
helped: 134
HURT: 173
total max-temps in shared programs: 2144876 -> 2138822 (-0.28%)
max-temps in affected programs: 123334 -> 117280 (-4.91%)
helped: 4112
HURT: 1195
total spills in shared programs: 3870 -> 3860 (-0.26%)
spills in affected programs: 1013 -> 1003 (-0.99%)
helped: 14
HURT: 12
total fills in shared programs: 5560 -> 5573 (0.23%)
fills in affected programs: 1765 -> 1778 (0.74%)
helped: 14
HURT: 17
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
For us they are basically uniforms too so we want to make their
lifespans short to facilitate allocating them to accumulators.
total instructions in shared programs: 13043585 -> 13015385 (-0.22%)
instructions in affected programs: 8326040 -> 8297840 (-0.34%)
helped: 24939
HURT: 19894
total threads in shared programs: 415860 -> 415858 (<.01%)
threads in affected programs: 4 -> 2 (-50.00%)
helped: 0
HURT: 1
total uniforms in shared programs: 3721953 -> 3720451 (-0.04%)
uniforms in affected programs: 96134 -> 94632 (-1.56%)
helped: 744
HURT: 435
total max-temps in shared programs: 2173431 -> 2154260 (-0.88%)
max-temps in affected programs: 264598 -> 245427 (-7.25%)
helped: 10858
HURT: 841
total spills in shared programs: 4005 -> 4010 (0.12%)
spills in affected programs: 700 -> 705 (0.71%)
helped: 5
HURT: 10
total fills in shared programs: 5801 -> 5817 (0.28%)
fills in affected programs: 1346 -> 1362 (1.19%)
helped: 6
HURT: 11
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
If we are compiling with a strategy that does not allow TMU spills
we should not allow spilling anything that is not a uniform.
Otherwise the RA cost/benefit algorithm may choose to spill a
temp that is not uniform and that will cause us to immediately
fail the strategy and fallback to the next one, even if we
could've instead chosen to spill more uniforms to compile the
program successfully with that strategy.
Some relevant shader-db stats:
total instructions in shared programs: 13040711 -> 13043585 (0.02%)
instructions in affected programs: 234238 -> 237112 (1.23%)
helped: 73
HURT: 172
total threads in shared programs: 415664 -> 415860 (0.05%)
threads in affected programs: 196 -> 392 (100.00%)
helped: 98
HURT: 0
total uniforms in shared programs: 3717266 -> 3721953 (0.13%)
uniforms in affected programs: 12831 -> 17518 (36.53%)
helped: 6
HURT: 100
total max-temps in shared programs: 2174177 -> 2173431 (-0.03%)
max-temps in affected programs: 4597 -> 3851 (-16.23%)
helped: 79
HURT: 21
total spills in shared programs: 4010 -> 4005 (-0.12%)
spills in affected programs: 55 -> 50 (-9.09%)
helped: 5
HURT: 0
total fills in shared programs: 5820 -> 5801 (-0.33%)
fills in affected programs: 186 -> 167 (-10.22%)
helped: 5
HURT: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
Our cost was 5 which matches the number of instructions we have to
add for a TMU spill (a fill is 4 instructions).
Uniform spills on the other hand add an extra instruction for each
fill and remove one instruction for the spill itself. These have
a cost of 1.
Therefore, if we have a single spill+fill, we end up with +9
instructions if it is a TMU spill and +0 instructions with a uniform
spill, so making the former only 5 times more costly is probably
not a good idea, and this is without even considering the added
latency of the TMU accesses.
Relevant shader-db changes show this causes as a marginal instruction
count increase in a few shaders but better thread counts and lower
TMU spilling overall:
total instructions in shared programs: 13037315 -> 13040711 (0.03%)
instructions in affected programs: 370106 -> 373502 (0.92%)
helped: 187
HURT: 321
total threads in shared programs: 415090 -> 415664 (0.14%)
threads in affected programs: 574 -> 1148 (100.00%)
helped: 287
HURT: 0
total uniforms in shared programs: 3706674 -> 3717266 (0.29%)
uniforms in affected programs: 63075 -> 73667 (16.79%)
helped: 40
HURT: 395
total max-temps in shared programs: 2176080 -> 2174177 (-0.09%)
max-temps in affected programs: 15838 -> 13935 (-12.02%)
helped: 316
HURT: 34
total spills in shared programs: 4247 -> 4010 (-5.58%)
spills in affected programs: 2599 -> 2362 (-9.12%)
helped: 107
HURT: 14
total fills in shared programs: 6121 -> 5820 (-4.92%)
fills in affected programs: 3622 -> 3321 (-8.31%)
helped: 108
HURT: 13
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
This is for drivers that have separate store instructions for varyings,
system value outputs (such as clip distances), and transform feedback.
The flags tell the driver not to store the output to those locations.
This will be used by radeonsi initially, and then maybe by a new linker.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14388>
NIR now fully contains pipe_stream_output_info in shader_info and IO
intrinsics if lower_io_variables is true. radeonsi will not use
pipe_stream_output_info after this, and other drivers are free to follow
that.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14388>
This will allow compaction of transform feedback varyings because they
are no longer tied to varying slots with this information.
It will also make transform feedback info available to all NIR passes
after IO is lowered. It's meant to replace pipe_stream_output_info.
Other intrinsics are not used with transform feedback.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14388>
These are unified in the hardware, so let's unify them in pan_shader_info.
Hoisting this logic to pan_shader.c avoids the need to duplicate this logic for
Midgard/Bifrost (RSD packing) and Valhall (SPD packing).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15204>
Instead of specifying the tiling on the texture descriptor, Valhall specifies it
on the plane descriptor. There is a new flag on the texture descriptor
specifying only whether the planes are interleaved or not.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15204>
Add support of GUEST_VRAM type of blob. These are dedicated heap memory
allocations required for vk support on hypervisors that don't support
runtime injections of host memory into guest physical address space.
The flow of usage:
1) Host VM reserves dedicated heap memory
2) Device get info about memory reservations and report it to guest
using mmio registers
3) Guest virtio-gpu driver on starts checks mmio registers for
physical address and length of reserved region. Then it reserves it
in guest.
4) On each call of vkAllocateMemory() guest driver gets chunk of
required memory and send it to host using sg list. It uses one sg
entry for 1 blob call. Heap is managed on guest using drm memory
manager (drm_mm).
Signed-off-by: Oleksandr.Gabrylchuk <Oleksandr.Gabrylchuk@opensynergy.com>
Signed-off-by: Andrii Pauk <Andrii.Pauk@opensynergy.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14536>
v2.
- Build the runner image as part of the CI for the boot2container
project, rather than as a manually step using the build instructions
in valve-trigger.dockerfile.
- Depend on a non-default kernel build hosted in the valve-infra
package repository. This does reduce the current caching feature of
local artifacts, but makes it easier to chop and change kernels on a
per-project or even per-test basis.
v3.
- Depend on a kernel built and stored in the valve-infra generic
package repo.
- Build the runner container using ci-templates as part of the CI in
valve-infra.
- Now that the runner container is built in the valve-infra CI, I
dropped the source import of client.py and message.py. They are
built in the runner container.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14660>
- Add the scripts to the prepared Mesa artifacts for use in later
runner stages.
- Add a template generator (generate_b2c.py) which reads and
validates (very lightly for now) the Gitlab job environment and then
spits out a YAML file describing the necessary test workload to be
sent to a Valve CI gateway.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14660>
nir_var_shader_out writes are only used for later TES invocations, so I
don't think there's any need for the TCS workgroup to wait for them.
fossil-db (Sienna Cichlid):
Totals from 1691 (1.04% of 162293) affected shaders:
Instrs: 710699 -> 709008 (-0.24%)
CodeSize: 3830168 -> 3823404 (-0.18%)
Latency: 3396997 -> 3007934 (-11.45%)
InvThroughput: 1212094 -> 1082823 (-10.67%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15195>
This reverts commit 382f6ccda8.
The spec requires that all color images created with the same tiling
(and a few other properties) support the same memoryTypeBits. So this
wasn't a valid change. It also wasn't necessary - we already have a
mechanism in anv_BindImageMemory2 for disabling compression if the BO
doesn't support it.
With this, XeHP passes the tests in
dEQP-VK.memory.requirements.*tiling_optimal
Fixes: 382f6ccd ("anv: Require the local heap for CCS on XeHP")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15068>
When an image configured for HIZ_CCS/HIZ_CCS_WT is bound to a BO lacking
implicit CCS, we disable any compression it may have had. Such images
are still compatible with ISL_AUX_USAGE_HIZ however. Fall back to that
aux usage to retain the performance benefit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15068>
When an image is bound to a BO lacking implicit CCS, we disable any
compression it may have had. This is unnecessary in the cases where the
compression type doesn't depend on the BO having implicit CCS support.
Avoid this disabling for ISL_AUX_USAGE_MCS and ISL_AUX_USAGE_HIZ.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15068>
In the bindless case, we don't have to keep any shadow descriptors and
can just reuse the IBO descriptor as a texture descriptor. Now that
we're emitting the swizzle we can just flip this on.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
ncoords includes the array index, and the NIR source has the array index
as its last component, so we have to insert the extra y coordinate in
the middle in this case.
Fixes: 0bb0cac ("freedreno/ir3: handle image buffer")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
Since these were reverse-engineered, it's become clear that IBO
descriptors are just a subset of texture descriptors, and bindless reads
of readonly images actually use isam on the IBO descriptor, further
confirming that the two are always compatible, even if not all of the
texture fields exist for IBOs. It's pointless to have a separate type
for IBOs, and just leads to things getting out-of-sync unnecessarily
which has already happened. Just remove it and use TEX_CONST insted.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
CAN_REORDER takes volatile into account, and is closer to what we
actually require to use texture instructions, which is that we can
arbitrarily reorder loads.
Fixes: aa93896 ("freedreno/ir3: adjust condition for when to use ldib")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15114>
From Vulkan spec for VkDrmFormatModifierProperties2EXT:
"drmFormatModifierTilingFeatures is a bitmask of VkFormatFeatureFlagBits
that are supported by any image created with format and drmFormatModifier."
"The returned drmFormatModifierTilingFeatures must contain at least one bit."
"Therefore, if the returned drmFormatModifier is DRM_FORMAT_MOD_LINEAR,
then drmFormatModifierPlaneCount must equal the format planecount, and
drmFormatModifierTilingFeatures must be identical to the
VkFormatProperties2::linearTilingFeatures returned in the same pNext chain."
Relevant tests: dEQP-VK.drm_format_modifiers.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15032>
now when gl_PointSize and gl_PointSizeMESA are both present, the former
will be used for xfb with a new location and the latter will be
exported by the shader
fixes (zink):
GTF-GL46.gtf30.GL3Tests.transform_feedback.transform_feedback_builtins
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15184>
creating this at the start of the shader means it will get optimized out
when the pass is used to overwrite existing psiz values, and creating it
at the end means it will get optimized out in geometry shaders, so instead
just walk the instructions and create another store right after the existing one
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15184>
On mips64, the compiler does not allow use of non-zero argument with
__builtin_frame_address(). However, the returned frame address is only
used when PIPE_ARCH_X86 is defined. The compile error can be avoided
by making #ifdef PIPE_ARCH_X86 cover the getting of frame address too.
The argument checking of __builtin_frame_address() has been present
as a debug assert in clang 8. In clang 10, there is a proper runtime
check for the argument. This is why the build has not failed before.
Fixes: dc94a0506f ("gallium: Do not add -Wframe-address option for gcc <= 4.4.")
from Visa Hankala
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6511>
Make this build on clang architectures that don't have 64-bit atomic
instructions. Clang doesn't allow redeclaration (and therefore
redefinition) of the __sync_* builtins. Use #pragma redefine_extname
to work around that restriction. Clang also turns __sync_add_and_fetch
into __sync_fetch_and_add (and __sync_sub_and_fetch into
__sync_fetch_and_sub) in certain cases, so provide these functions as
well.
Fixes: a6a38a038b ("util/u_atomic: provide 64bit atomics where they're missing")
patch from Mark Kettenis
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6511>
The structure wrapped around the rb tree node was being freed, but not the node
itself, which caused a segmentation fault when accessing its parent node.
Add rb tree node remove call to fix it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15188>
Since this text was written, VirGL has become a shipping, production
quality solution. It's no longer a research project. Let's update the
text to reflect that.
While we're at it, let's drop the project from the page title, as this
is no longer the docs for the entire project.
Acked-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13836>
need to iterate over the descriptors in the binding to invalidate the whole
thing here
=================================================================
==546534==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a0000ae6c0 at pc 0x7fe20e26fd9d bp 0x7ffd92be6bc0 sp 0x7ffd92be6bb8
READ of size 8 at 0x61a0000ae6c0 thread T0
#0 0x7fe20e26fd9c in zink_descriptor_set_refs_clear ../src/gallium/drivers/zink/zink_descriptors.c:950
#1 0x7fe20e401304 in zink_destroy_surface ../src/gallium/drivers/zink/zink_surface.c:340
#2 0x7fe20e21311b in zink_surface_reference ../src/gallium/drivers/zink/zink_surface.h:106
#3 0x7fe20e21a5b9 in zink_sampler_view_destroy ../src/gallium/drivers/zink/zink_context.c:835
#4 0x7fe20c41d35f in tc_sampler_view_destroy ../src/gallium/auxiliary/util/u_threaded_context.c:1848
#5 0x7fe20e210ff7 in pipe_sampler_view_reference ../src/gallium/auxiliary/util/u_inlines.h:216
#6 0x7fe20e22d592 in zink_set_sampler_views ../src/gallium/drivers/zink/zink_context.c:1532
#7 0x7fe20c41a3d8 in tc_call_set_sampler_views ../src/gallium/auxiliary/util/u_threaded_context.c:1393
#8 0x7fe20c411706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
#9 0x7fe20c4124ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
#10 0x7fe20c42b728 in tc_destroy ../src/gallium/auxiliary/util/u_threaded_context.c:4250
#11 0x7fe20b65176a in st_destroy_context_priv ../src/mesa/state_tracker/st_context.c:387
#12 0x7fe20b65669f in st_destroy_context ../src/mesa/state_tracker/st_context.c:1009
#13 0x7fe20b7055ab in st_context_destroy ../src/mesa/state_tracker/st_manager.c:944
#14 0x7fe20a9c75bd in dri_destroy_context ../src/gallium/frontends/dri/dri_context.c:256
#15 0x7fe20a9d4bef in driDestroyContext ../src/gallium/frontends/dri/dri_util.c:534
#16 0x7fe22361f25c in drisw_destroy_context ../src/glx/drisw_glx.c:429
#17 0x7fe223625d95 in glXDestroyContext ../src/glx/glxcmds.c:523
#18 0x7fe22636aaeb in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GLX/libglx.c:332
#19 0x7fe2269d9e7d in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GL/g_libglglxwrapper.c:384
#20 0x41b88a in tcu::lnx::x11::glx::GlxRenderContext::~GlxRenderContext() /home/zmike/src/VK-GL-CTS/framework/platform/lnx/X11/tcuLnxX11GlxPlatform.cpp:734
#21 0x41b8e9 in tcu::lnx::x11::glx::GlxRenderContext::~GlxRenderContext() /home/zmike/src/VK-GL-CTS/framework/platform/lnx/X11/tcuLnxX11GlxPlatform.cpp:735
#22 0x2323aa7 in deqp::gles31::Context::destroyRenderContext() /home/zmike/src/VK-GL-CTS/modules/gles31/tes31Context.cpp:77
#23 0x2323969 in deqp::gles31::Context::~Context() /home/zmike/src/VK-GL-CTS/modules/gles31/tes31Context.cpp:55
#24 0x232278e in deqp::gles31::TestPackage::deinit() /home/zmike/src/VK-GL-CTS/modules/gles31/tes31TestPackage.cpp:102
#25 0x2c866c2 in tcu::DefaultHierarchyInflater::leaveTestPackage(tcu::TestPackage*) /home/zmike/src/VK-GL-CTS/framework/common/tcuTestHierarchyIterator.cpp:75
#26 0x2c87058 in tcu::TestHierarchyIterator::next() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestHierarchyIterator.cpp:252
#27 0x2c365da in tcu::TestSessionExecutor::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:122
#28 0x2c00b0c in tcu::App::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuApp.cpp:221
#29 0x4141b7 in main /home/zmike/src/VK-GL-CTS/framework/platform/tcuMain.cpp:58
#30 0x7fe2263e155f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
#31 0x7fe2263e160b in __libc_start_main_impl (/lib64/libc.so.6+0x2d60b)
#32 0x413fa4 in _start (/home/zmike/src/VK-GL-CTS/build/external/openglcts/modules/glcts+0x413fa4)
0x61a0000ae6c0 is located 64 bytes inside of 1328-byte region [0x61a0000ae680,0x61a0000aebb0)
freed by thread T0 here:
#0 0x7fe226cb6627 in free (/usr/lib64/libasan.so.6+0xae627)
#1 0x7fe20aab1751 in unsafe_free ../src/util/ralloc.c:302
#2 0x7fe20aab16c8 in unsafe_free ../src/util/ralloc.c:295
#3 0x7fe20aab13c3 in ralloc_free ../src/util/ralloc.c:265
#4 0x7fe20e269234 in descriptor_pool_free ../src/gallium/drivers/zink/zink_descriptors.c:286
#5 0x7fe20e26937d in descriptor_pool_delete ../src/gallium/drivers/zink/zink_descriptors.c:296
#6 0x7fe20e26ff53 in zink_descriptor_pool_reference ../src/gallium/drivers/zink/zink_descriptors.c:967
#7 0x7fe20e270db2 in zink_descriptor_program_deinit ../src/gallium/drivers/zink/zink_descriptors.c:1071
#8 0x7fe20e3b6536 in zink_destroy_gfx_program ../src/gallium/drivers/zink/zink_program.c:695
#9 0x7fe20e1eaaf9 in zink_gfx_program_reference ../src/gallium/drivers/zink/zink_program.h:242
#10 0x7fe20e20d386 in zink_shader_free ../src/gallium/drivers/zink/zink_compiler.c:2099
#11 0x7fe20e3b9f0b in zink_delete_shader_state ../src/gallium/drivers/zink/zink_program.c:1074
#12 0x7fe20c3e29ad in util_shader_reference ../src/gallium/auxiliary/util/u_live_shader_cache.c:188
#13 0x7fe20e3ba11e in zink_delete_cached_shader_state ../src/gallium/drivers/zink/zink_program.c:1093
#14 0x7fe20c41709e in tc_call_delete_fs_state ../src/gallium/auxiliary/util/u_threaded_context.c:998
#15 0x7fe20c411706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
#16 0x7fe20c4124ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
#17 0x7fe20c423683 in tc_flush ../src/gallium/auxiliary/util/u_threaded_context.c:3003
#18 0x7fe20b62d996 in st_flush ../src/mesa/state_tracker/st_cb_flush.c:60
#19 0x7fe20b62dbe3 in st_glFlush ../src/mesa/state_tracker/st_cb_flush.c:94
#20 0x7fe20ae4bded in _mesa_make_current ../src/mesa/main/context.c:1493
#21 0x7fe20ae49702 in _mesa_free_context_data ../src/mesa/main/context.c:1187
#22 0x7fe20b65668b in st_destroy_context ../src/mesa/state_tracker/st_context.c:1005
#23 0x7fe20b7055ab in st_context_destroy ../src/mesa/state_tracker/st_manager.c:944
#24 0x7fe20a9c75bd in dri_destroy_context ../src/gallium/frontends/dri/dri_context.c:256
#25 0x7fe20a9d4bef in driDestroyContext ../src/gallium/frontends/dri/dri_util.c:534
#26 0x7fe22361f25c in drisw_destroy_context ../src/glx/drisw_glx.c:429
#27 0x7fe223625d95 in glXDestroyContext ../src/glx/glxcmds.c:523
#28 0x7fe22636aaeb in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GLX/libglx.c:332
#29 0x7fe2269d9e7d in glXDestroyContext /home/zmike/src/libglvnd-v1.3.2/src/GL/g_libglglxwrapper.c:384
previously allocated by thread T0 here:
#0 0x7fe226cb691f in __interceptor_malloc (/usr/lib64/libasan.so.6+0xae91f)
#1 0x7fe20aab0c81 in ralloc_size ../src/util/ralloc.c:120
#2 0x7fe20aab0e33 in rzalloc_size ../src/util/ralloc.c:153
#3 0x7fe20aab12c8 in rzalloc_array_size ../src/util/ralloc.c:233
#4 0x7fe20e26c76d in allocate_desc_set ../src/gallium/drivers/zink/zink_descriptors.c:657
#5 0x7fe20e26e9cb in zink_descriptor_set_get ../src/gallium/drivers/zink/zink_descriptors.c:840
#6 0x7fe20e2747aa in zink_descriptors_update ../src/gallium/drivers/zink/zink_descriptors.c:1424
#7 0x7fe20e36fc48 in void zink_draw<(zink_multidraw)1, (zink_dynamic_state)2, true, false>(pipe_context*, pipe_draw_info const*, unsigned int, pipe_draw_indirect_info const*, pipe_draw_start_count_bias const*, unsigned int, pipe_vertex_state*, unsigned int) ../src/gallium/drivers/zink/zink_draw.cpp:788
#8 0x7fe20e29166d in zink_draw_vbo<(zink_multidraw)1, (zink_dynamic_state)2, true> ../src/gallium/drivers/zink/zink_draw.cpp:907
#9 0x7fe20c424982 in tc_call_draw_single ../src/gallium/auxiliary/util/u_threaded_context.c:3155
#10 0x7fe20c411706 in tc_batch_execute ../src/gallium/auxiliary/util/u_threaded_context.c:211
#11 0x7fe20c4124ba in _tc_sync ../src/gallium/auxiliary/util/u_threaded_context.c:362
#12 0x7fe20c41f7a9 in tc_texture_map ../src/gallium/auxiliary/util/u_threaded_context.c:2279
#13 0x7fe20b630757 in pipe_texture_map_3d ../src/gallium/auxiliary/util/u_inlines.h:572
#14 0x7fe20b6341f6 in st_ReadPixels ../src/mesa/state_tracker/st_cb_readpixels.c:546
#15 0x7fe20b42fea7 in read_pixels ../src/mesa/main/readpix.c:1178
#16 0x7fe20b42fea7 in _mesa_ReadnPixelsARB ../src/mesa/main/readpix.c:1195
#17 0x7fe20b42ffc0 in _mesa_ReadPixels ../src/mesa/main/readpix.c:1210
#18 0x2a6d094 in glu::readPixels(glu::RenderContext const&, int, int, tcu::PixelBufferAccess const&) /home/zmike/src/VK-GL-CTS/framework/opengl/gluPixelTransfer.cpp:61
#19 0x29eaa06 in deqp::gls::ShaderExecUtil::FragmentOutExecutor::execute(int, void const* const*, void* const*) /home/zmike/src/VK-GL-CTS/modules/glshared/glsShaderExecUtil.cpp:677
#20 0x25a600b in iterate /home/zmike/src/VK-GL-CTS/modules/gles31/functional/es31fOpaqueTypeIndexingTests.cpp:585
#21 0x2322b53 in deqp::gles31::TestCaseWrapper<deqp::gles31::TestPackage>::iterate(tcu::TestCase*) /home/zmike/src/VK-GL-CTS/modules/gles31/tes31TestCaseWrapper.hpp:86
#22 0x2c376fd in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:302
#23 0x2c366e3 in tcu::TestSessionExecutor::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuTestSessionExecutor.cpp:139
#24 0x2c00b0c in tcu::App::iterate() /home/zmike/src/VK-GL-CTS/framework/common/tcuApp.cpp:221
#25 0x4141b7 in main /home/zmike/src/VK-GL-CTS/framework/platform/tcuMain.cpp:58
#26 0x7fe2263e155f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
cc: mesa-stable
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15173>
this ensures that swapchain images will have been acquired before potentially
accessing swapchain images bound as descriptors
fixes caselist like:
dEQP-GLES31.functional.fbo.color.texcubearray.r8ui
dEQP-GLES31.functional.primitive_bounding_box.blit_fbo.blit_default_to_fbo
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15173>
Correct type for sysctl argument to fix the build.
../src/util/u_cpu_detect.c:631:29: error: incompatible pointer types passing 'int *' to parameter of type 'size_t *' (aka 'unsigned long *') [-Werror,-Wincompatible-pointer-types]
sysctl(mib, 2, &ncpu, &len, NULL, 0);
^~~~
Fixes: 5623c75e40 ("util: Fix setting nr_cpus on some BSD variants")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13448>
move include so va_list will be picked up via stdarg.h
In file included from ../src/util/u_printf.cpp:24:
../src/util/u_printf.h:43:41: error: unknown type name 'va_list'; did you mean '__va_list'?
size_t u_printf_length(const char *fmt, va_list untouched_args);
^~~~~~~
__va_list
/usr/include/machine/_types.h:126:27: note: '__va_list' declared here
typedef __builtin_va_list __va_list;
^
and add includes to u_printf.h as suggested by Ilia Mirkin
stdarg.h for va_list and stddef.h for size_t
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13448>
This required relaxing a core NIR assertion which I don't think is doing
any important validation.
The shader-db effects here are small, but they're important for avoiding a
regression when we start doing per-component DCE in opt_shrink_vectors
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12468)
softpipe shader-db:
total instructions in shared programs: 2859777 -> 2859454 (-0.01%)
instructions in affected programs: 18881 -> 18558 (-1.71%)
total temps in shared programs: 293994 -> 293914 (-0.03%)
temps in affected programs: 418 -> 338 (-19.14%)
i915g:
total instructions in shared programs: 407562 -> 407544 (<.01%)
instructions in affected programs: 570 -> 552 (-3.16%)
r300:
total instructions in shared programs: 1414450 -> 1414459 (<.01%)
instructions in affected programs: 44494 -> 44503 (0.02%)
total vinst in shared programs: 473782 -> 473727 (-0.01%)
vinst in affected programs: 1102 -> 1047 (-4.99%)
total sinst in shared programs: 231224 -> 231216 (<.01%)
sinst in affected programs: 432 -> 424 (-1.85%)
total temps in shared programs: 197605 -> 197607 (<.01%)
temps in affected programs: 103 -> 105 (1.94%)
crocus hsw:
total instructions in shared programs: 8158185 -> 8158134 (<.01%)
instructions in affected programs: 10927 -> 10876 (-0.47%)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15178>
Because we have to maintain two different packages of Mesa, one
specific to RADV and another one for RadeonSI and such, it's a bit
annoying to have to synchronize the drirc entries. Currently, only our
Mesa package installs 00-mesa-defaults.conf which means we have to
backport the drirc RADV changes.
This splits 00-mesa-defaults.conf in two to move the drirc RADV entries
to src/amd/vulkan/00-radv-defaults.conf. Meson will install the file
only if RADV is built.
There is still a caveat for common drirc workarounds like for WSI but
they are rare enough and we could still duplicate them if needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15152>
Task shader outputs work differently than other shaders, so they
need special consideration. Essentially, they have two kinds of
outputs:
1. Number of mesh shader workgroups to launch.
Will be still represented by a shader output.
2. Optional payload of up to (at least) 16K bytes.
These payload variables behave similarly to shared memory, but
the spec doesn't actually define them as shared memory (also, they
may be implemented differently by each backend), so we need to add
a new NIR variable mode for them.
These payload variables can't be represented by shader outputs
because the 16K bytes don't fit the 32x vec4 model that NIR uses
for its output variables.
This patch adds a new NIR variable mode: nir_var_mem_task_payload
and corresponding explicit I/O intrinsics, as well as support for
this new mode in nir_lower_io.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14930>
The problem is that the real workgroup launched on NGG HW
can be larger than the size specified by the API, and the
extra waves need to keep up with barriers in the API waves.
There are 2 different cases:
1. The whole API workgroup fits in a single wave.
We can shrink the barriers to subgroup scope and
don't need to insert any extra ones.
2. The API workgroup occupies multiple waves, but not
all. In this case, we emit code that consumes every
barrier on the extra waves.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15034>
This commit adds support for Vulkan backend on a630_skqp job.
= Needed changes
- Needed to install libvulkan-dev package on system
- Refactored the way the available skqp reports are printed
tested in development builds with skia tools
Piglit expectations had to be updated in various drivers due to !14750 not
having bumped the tags when it tried to uprev.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14686>
The Android CTS 10 version is relative old when compared with skia main
branch, which was being used before. Some modifications in the skqp
build/runner scripts were needed to make it run on CI.
- skqp versions from android-cts have already all assets inside
platform_tools folder.
- along with the assets, are the render and unit files which are
expected to pass in the Android CTS execution.
- removed custom test files from the a630 folder, to make it comply
with the CTS expectations.
- include new patches to remove Python2 dependencies and avoid the
installation of it in rootfs.
- strip binariesthe built binaries `skqp` and `list_gpu_unit_tests`, as
`is_debug = false` gn argument did not work, maybe it is not well
tested in development builds with skia tools
- use Clang instead of GCC. The GCC support is not so graceful as it is
in the skia main branch, some NEON instructions needs to be turned off
in the GCC compilation, causing different tests result. This change
does not imply a bigger rootfs, since the built skqp binary uses GCC
libc++ and other library runtimes. So clang is just a build
dependency.
= Changes in skqp results =
Some errors were found for GL backend and unit tests. GLES and VK tests are green.
All the failed tests were classified as expected to fail in the render and unit tests list.
```
gl_blur2rectsnonninepatch
gl_bug339297_as_clip
gl_bug6083
gl_dashtextcaps
```
```
SRGBReadWritePixels (../../tests/SRGBReadWritePixelsTest.cpp:214 Could not create sRGB surface context. [OpenGL])
```
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14686>
This extension isn't wired up in Gallium, so there's just no way a
Gallium driver like Panfrost exposes it.
While there were support in i965 for this for the cancelled Broxton GPU,
thre's no such support in the Iris driver. And since Broxton has been
cancelled, it's unlikely to be wired up any time soon.
Fixes: da23a31726 ("docs/features: Update ASTC entries for Panfrost")
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15145>
When we shadow a resource, the backing BO is changed; as such,
existing references to the resource become invalid. So batches accessing the
resource need to be flushed (or otherwise have their references invalidated).
The wrong behaviour change (not flushing) was introduced when we started
tracking resources instead of BOs. The issue manifested as a severe performance
regression in glmark2's -bbuffer test, particular the subdata subtest. The issue
is magnified on slow CPUs; without the fix, the test becomes completely CPU
bound
Relevant glmark2 -bbuffer test from 43fps to 84fps.
Apparently, this causes functional issues too -- this performance-minded change
also fixes a few piglits.
Fixes: cecb889481 ("panfrost: Do tracking of resources, not BOs")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Chris Healy <cphealy@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13502>
The MI_FLUSH_DW post-sync operation uses the same encoding as the
PIPE_CONTROL one so we can use the same helper. Write PS Depth Count
is not supported, of course, as the blitter has no depth pipeline.
This means that we can write the timestamp register from the blitter.
Fixes: 604d97671b ("iris: Add support for flushing the blitter (hackily)")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15157>
To make sure it is applied in the cases we expect it to be, to avoid code
generation regressions. Functional regressions are expected to be caught by
integration-testing, so that is not focused on here.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9438>
If a message-passing instruction like LD_VAR is preloaded, it will no longer be
counted in the shader cycle counts. Add a special message preload counter that
approximates the cost of preloading, so this information doesn't get a lost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9438>
The compiler will soon produce preloaded messages, but it should not pack them
itself, as this would require depending on GenXML or handcoding bitfields / bit
packs in the compiler. Instead, add a struct encoding the unpacked form of the
message, used as ABI between the compiler and the common driver.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9438>
On V3D the quality of the code we generate is significantly affected by
how we decide to assign accumulators during register allocation, which
is determined by liveness, favoring short-lived temps.
There are many shaders that end up doing a whole lot of uniform loads
first, and using them later, which is very inconvenient for our register
allocation process because this increases uniform liveness and causes
us to use accumulators less efficientely, leading to significant churn.
To fix this, we move uniforms right before their first use in the same
block, but we need to do this after NIR scheduling, which means we are
doing it in non-SSA form, since the scheduler has a tendency to undo
this optimization and it is not easy to modify it to avoid it, since it
works in more abstract terms, using instruction dependencies, estimated
register pressure and instruction delay information to do its work,
which are very different concepts.
total instructions in shared programs: 13316738 -> 13033613 (-2.13%)
instructions in affected programs: 10389172 -> 10106047 (-2.73%)
helped: 55442
HURT: 16144
total threads in shared programs: 413722 -> 415048 (0.32%)
threads in affected programs: 1428 -> 2754 (92.86%)
helped: 680
HURT: 17
total loops in shared programs: 1716 -> 1690 (-1.52%)
loops in affected programs: 26 -> 0
helped: 26
HURT: 0
total uniforms in shared programs: 3704313 -> 3705181 (0.02%)
uniforms in affected programs: 687730 -> 688598 (0.13%)
helped: 2920
HURT: 7384
total max-temps in shared programs: 2364785 -> 2175190 (-8.02%)
max-temps in affected programs: 1215387 -> 1025792 (-15.60%)
helped: 49667
HURT: 1556
total spills in shared programs: 4241 -> 4248 (0.17%)
spills in affected programs: 642 -> 649 (1.09%)
helped: 11
HURT: 19
total fills in shared programs: 6115 -> 6125 (0.16%)
fills in affected programs: 1276 -> 1286 (0.78%)
helped: 11
HURT: 21
total sfu-stalls in shared programs: 34381 -> 36578 (6.39%)
sfu-stalls in affected programs: 16055 -> 18252 (13.68%)
helped: 3647
HURT: 5206
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15056>
On Bifrost and Valhall, push uniforms are loaded into Fast Access Uniform
Random Access Memory (FAU-RAM). FAU-RAM is organized as an array of 64-bit
slots. A given tuple (Bifrost) or instruction (Valhall) may access at most a
single 64-bit slot. If an instruction requires uniforms from multiple 64-bit
slots, a uniform-to-register move must be inserted to avoid the hazard. However,
if an instruction requires a pair of 32-bit uniforms from the same 64-bit slot,
no move is required.
To reduce the number of moves we emit, this commit adds an optimization pass
that reorders pushed uniforms, trying to group uniforms used by the same
instruction. The pass works by creating a graph of pushed uniforms, where edges
denote the "both 32-bit uniforms required by the same instruction" relationship.
We perform depth-first search on this graph to find the connected components,
where each connected component is a cluster of uniforms that are used together.
We then select pairs of uniforms from each connected component. The remaining
unpaired uniforms (from components of odd sizes) are paired together
arbitrarily.
In principle, we should weight the graph by number of occurences and choose
pairs that maximize the total selected edge weight. This is left for
future work, as it is nontrivial -- selecting these edges optimally appears to
be NP-hard at first blush.
Implementation note: As position and varying shaders share FAU on Bifrost, extra
care is taken with a `push_offset` shader stage info parameter that ensures
varying shaders do not reorder uniforms selected by the previous position
shader.
total instructions in shared programs: 2503343 -> 2451758 (-2.06%)
instructions in affected programs: 1553309 -> 1501724 (-3.32%)
helped: 14256
HURT: 8
helped stats (abs) min: 1.0 max: 80.0 x̄: 3.62 x̃: 3
helped stats (rel) min: 0.06% max: 36.36% x̄: 7.31% x̃: 6.67%
HURT stats (abs) min: 1.0 max: 2.0 x̄: 1.38 x̃: 1
HURT stats (rel) min: 1.30% max: 12.50% x̄: 4.99% x̃: 3.85%
95% mean confidence interval for instructions value: -3.66 -3.58
95% mean confidence interval for instructions %-change: -7.41% -7.20%
Instructions are helped.
total tuples in shared programs: 2008399 -> 1969627 (-1.93%)
tuples in affected programs: 1146344 -> 1107572 (-3.38%)
helped: 12867
HURT: 147
helped stats (abs) min: 1.0 max: 61.0 x̄: 3.03 x̃: 2
helped stats (rel) min: 0.17% max: 42.86% x̄: 6.79% x̃: 4.65%
HURT stats (abs) min: 1.0 max: 3.0 x̄: 1.20 x̃: 1
HURT stats (rel) min: 0.29% max: 20.00% x̄: 2.12% x̃: 1.19%
95% mean confidence interval for tuples value: -3.03 -2.93
95% mean confidence interval for tuples %-change: -6.82% -6.57%
Tuples are helped.
total clauses in shared programs: 408005 -> 401708 (-1.54%)
clauses in affected programs: 90760 -> 84463 (-6.94%)
helped: 6006
HURT: 164
helped stats (abs) min: 1.0 max: 9.0 x̄: 1.08 x̃: 1
helped stats (rel) min: 0.45% max: 33.33% x̄: 12.44% x̃: 14.29%
HURT stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
HURT stats (rel) min: 1.64% max: 25.00% x̄: 9.81% x̃: 5.26%
95% mean confidence interval for clauses value: -1.03 -1.01
95% mean confidence interval for clauses %-change: -12.03% -11.66%
Clauses are helped.
total cycles in shared programs: 203308.37 -> 202737.83 (-0.28%)
cycles in affected programs: 19264.71 -> 18694.17 (-2.96%)
helped: 3024
HURT: 41
helped stats (abs) min: 0.041665999999999315 max: 2.5416680000000014 x̄: 0.19 x̃: 0
helped stats (rel) min: 0.17% max: 33.33% x̄: 3.83% x̃: 2.83%
HURT stats (abs) min: 0.041665999999999315 max: 0.125 x̄: 0.06 x̃: 0
HURT stats (rel) min: 0.30% max: 5.88% x̄: 1.41% x̃: 0.93%
95% mean confidence interval for cycles value: -0.19 -0.18
95% mean confidence interval for cycles %-change: -3.89% -3.64%
Cycles are helped.
total arith in shared programs: 76265.67 -> 74669.25 (-2.09%)
arith in affected programs: 45001.50 -> 43405.08 (-3.55%)
helped: 12945
HURT: 97
helped stats (abs) min: 0.041665999999999315 max: 2.5416680000000014 x̄: 0.12 x̃: 0
helped stats (rel) min: 0.17% max: 50.00% x̄: 8.06% x̃: 4.88%
HURT stats (abs) min: 0.041665999999999315 max: 0.125 x̄: 0.05 x̃: 0
HURT stats (rel) min: 0.21% max: 33.33% x̄: 2.16% x̃: 0.96%
95% mean confidence interval for arith value: -0.12 -0.12
95% mean confidence interval for arith %-change: -8.16% -7.81%
Arith are helped.
total quadwords in shared programs: 1796563 -> 1766803 (-1.66%)
quadwords in affected programs: 948830 -> 919070 (-3.14%)
helped: 12078
HURT: 219
helped stats (abs) min: 1.0 max: 42.0 x̄: 2.49 x̃: 2
helped stats (rel) min: 0.10% max: 33.33% x̄: 5.57% x̃: 5.26%
HURT stats (abs) min: 1.0 max: 4.0 x̄: 1.21 x̃: 1
HURT stats (rel) min: 0.33% max: 6.67% x̄: 2.00% x̃: 1.14%
95% mean confidence interval for quadwords value: -2.46 -2.38
95% mean confidence interval for quadwords %-change: -5.52% -5.36%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14163>
Gives us memory back faster which is useful for pathalogical CTS
tests.
The GLSL IR was previously used after converting to NIR for things
like building the GL resource list but we have had a NIR version
for this for some time and I don't believe there are any other
use cases left for keeping the old IR hanging around this long.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15127>
These runners are configured to have a single job take up the whole
runner, which means we get to use threads to our hearts content. The pile
of cores means we don't need to spawn separate jobs to try to load-balance
across fdo's shared runner capacity. Having dedicated runners means we
won't get our MRs blocked as much waiting on non-Mesa testing happening on
fd.o.
We manage to complete all of this llvmpipe testing in about 6:15.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14962>
When handle->type is WINSYS_HANDLE_TYPE_FD, the caller wants a file descriptor
for the BO backing the resource. We previously had two paths for this:
1. If rsrc->scanout is available, we prime the GEM handle from the KMS device
(rsrc->scanout->handle) to a file descriptor via the KMS device.
2. If rsrc->scanout is not available, we prime the GEM handle from the GPU
(bo->gem_handle) to a file descriptor via the GPU device.
In both cases, the caller passes in a resource (with BO) and expects out a file
descriptor. There are no direct GEM handles in the function signature; the
caller doesn't care which GEM handle we prime to get the file descriptor. In
principle, both paths produce the same file descriptor for the same BO, since
both GEM handles represent the same underlying resource (viewed from different
devices).
On grounds of redundancy alone, it makes sense to remove the rsrc->scanout path.
Why have a path that only works sometimes, when we have another path that works
always?
In fact, the issues with the rsrc->scanout path are deeper. rsrc->scanout is
populated by renderonly_create_gpu_import_for_resource, which does the
following:
1. Get a file descriptor for the resource by resource_get_handle with
WINSYS_HANDLE_TYPE_FD
2. Prime the file descriptor to a GEM handle via the KMS device.
Here comes strike number 2: in order to get a file descriptor via the KMS
device, we had to /already/ get a file descriptor via the GPU device. If we go
down the KMS device path, we effectively round trip:
GPU handle -> fd -> KMS handle -> fd
There is no good reason to do this; if everything works, the fd is the same in
each case. If everything works. If.
The lifetimes of the GPU handle and the KMS handle are not necessarily bound. In
principle, a resource can be created with scanout (constructing a KMS handle).
Then the KMS view can be destroyed (invalidating the GEM handle for the KMS
device), even though the underlying resource is still valid. Notice the GPU
handle is still valid; its lifetime is tied to the resource itself. Then a
caller can ask for the FD for the resource; as the resource is still valid, this
is sensible. Under the scanout path, we try to get the FD by priming the GEM
handle on the KMS device... but that GEM handle is no longer valid, causing the
PRIME ioctl to fail with ENOENT. On the other hand, if we primed the GPU GEM
handle, everything works as expected.
These edge cases are not theoretical; recent versions of Xwayland trigger this
ENOENT, causing issue #5758 on all Panfrost devices. As far as I can tell, no
other kmsro driver has this 'special' kmsro path; the only part of
resource_get_handle that needs special handling for kmsro is getting a KMS
handle.
Let's remove the broken, useless path, fix Xwayland, bring us in line with other
drivers, and delete some code.
Thank you for coming to my ted talk.
Closes: #5758
Fixes: 7da251fc72 ("panfrost: Check in sources for command stream")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-and-tested-by: Jan Palus <jpalus@fastmail.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: James Jones <jajones@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Tested-by: Dan Johansen <strit@manjaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15120>
We always blit from a particular level, so it's a waste to compute the LOD.
This corresponds to a simple texture instruction with implement 0 LOD, which is
the optimal texturing path on Bifrost -- it maps to TEXS_2D but does not require
helper invocations.
Functional change on Bifrost: Blit shaders no longer set .computed_lod or
shader_contains_barrier.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15123>
There are always set to true. Don't pollute the driver code with them, make
their existence a local detail to pre-Valhall XML and that's it.
Functional change: "four components per vertex" is now set on vertex job DCDs.
This should be a no-op.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15123>
This reduces code duplication and will ease Valhall porting. Functional changes
on v7:
* Shader contains barrier is now set (perf loss, fixed later in series)
* Shader register allocation is now set (perf win)
* Point sprite inverted, no-op for blit shaders
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15123>
Trivially implemented by using A6XX_GRAS_SC_CNTL_SINGLE_PRIM_MODE.
This extension is useful for emulators e.g. AetherSX2 PS2 emulator and
could drastically improve performance when blending is emulated.
Relevant tests:
dEQP-VK.rasterization.rasterization_order_attachment_access.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15106>
Otherwise a shader invocation would read the value which should have
been set AFTER this shader invocation.
Fixes tests:
dEQP-VK.rasterization.rasterization_order_attachment_access.depth.samples_1.multi_draw_barriers
dEQP-VK.rasterization.rasterization_order_attachment_access.stencil.samples_1.multi_draw_barriers
Fixes: 71595a189a
("tu: Fix feedback loops in sysmem mode")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15106>
XeHP scratch space is handled differently. Commit ae18e1e707
implemented support for it, but handled it differently between render
and compute shaders: it calculates scratch_addr differently and
doesn't pin the buffer on compute. Make it work on compute shaders by
calling pin_scratch_space() from iris_compute_walker(), which fixes
both the address and the pinning.
This commit can be verified by the two-year-old-but-still-unreviewed
Piglit MR 234. You can also verify this by running a very simple
compute shader with INTEL_DEBUG=spill_fs.
References: https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/234
Fixes: ae18e1e707 ("iris: Add support for scratch on XeHP")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15070>
This raises our vertex input bindings limit from 28 to 31, and our
vertex input attribute limit from 28 to 29. We could theoretically
go higher, but it will take additional work.
The 3DSTATE_VERTEX_BUFFERS and 3DSTATE_VERTEX_ELEMENTS limits are 33
vertex buffers, and 34 vertex elements. But we need up to two vertex
elements for system values (FirstVertex, BaseVertex, BaseInstance,
DrawID), and we currently use two vertex bindings for those.
There is another hidden limit: our compiler backend only supports the
push model for VS inputs currently. 3DSTATE_VS only allows URB Read
Lengths between [0, 15], which is measured in pairs of inputs, which
means we can theoretically push no more than 32 vertex elements. This
is no artifical limit either, as a vec4 element takes up 4 registers
in the payload, and 32 * 4 = 128, the entire size of our register file.
Plus, the VS Thread payload needs at least g0 and g1 for other things,
so we can really only push 31.
We can theoretically support one additional binding, by combining our
two SGV bindings into a single upload. In order to support additional
vertex elements, we would need to add support to the backend compiler
for the pull model for VS inputs.
References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5917
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14991>
I got confused while writing this somehow because of the null descriptor
feature, which enables drivers to consume a null descriptor, which has no
relation to a descriptor layout containing no descriptors
failing to accurately use zero descriptors can put layouts over the maximum
per-stage limits, which causes tests to crash
fixes (lavapipe):
KHR-GL46.shading_language_420pack.binding_uniform_block_array
KHR-GL46.multi_bind.dispatch_bind_buffers_base
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15067>
I chose to implement this as a global flag in the device, because
otherwise we would end up with extra draw overhead trying to avoid it in
the implicit-sync WSI case, and you're probably going to end up needing
implicit sync anyway because you used one of the BOs in any of the
submitted cmdbufs. To do better than this, we would probably want a
skip-implicit-sync flag on the BOs in the BO list, rather than global on
the submit.
Reports about venus on turnip say that this flag reduces worst-case
QueueSubmit time in a game workload from ~10ms to ~4ms.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14838>
Bifrost post-RA dead code elimination can cull the destinations of
regular ALU instructions, by weakening from a register write to a
temporary write. However, there is no way to suppress staging writes, so
culling the destinations will result in invalid code generation.
Fixes a regression in
dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_static_vertex
with scoreboarding. The root cause there is the backend dead code
elimination not being sufficiently aggressive in the presence of control
flow. Usually this does not matter, since the backend optimizations are
intended to be local with global optimizations happening in NIR.
Unfortunately, our implementation of IDVS hits this hard. That will need
to be optimized (probably by specializing IDVS shaders in NIR instead of
the backend). In the mean time, let's fix the actual bug affecting
scoreboarding.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14298>
"raw" (IDXEN=0) and "structured" (IDXEN=1) do bounds checking differently.
From `si_make_buffer_descriptor`:
* - For VMEM and inst.IDXEN == 0 or STRIDE == 0, it's in byte units.
* - For VMEM and inst.IDXEN == 1 and STRIDE != 0, it's in units of STRIDE.
so there is a difference between setting vindex = i32_0 and vindex = NULL.
Instead of having the `structured` flag, we can just check if vindex is NULL.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15098>
"raw" (IDXEN=0) and "structured" (IDXEN=1) do bounds checking differently.
From `si_make_buffer_descriptor`:
* - For VMEM and inst.IDXEN == 0 or STRIDE == 0, it's in byte units.
* - For VMEM and inst.IDXEN == 1 and STRIDE != 0, it's in units of STRIDE.
so there is a difference between setting vindex = i32_0 and vindex = NULL.
Instead of having the `structured` flag, we can just check if vindex is NULL.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15098>
If we have a postponed spill, the temp we create at ip is no longer
the spilled temp and therefore is affected by the thrsw injection.
Fixes corruption in the additive blending animation demo from
Three.js.
Fixes: f3c3228522 ('broadcom/compiler: do not rebuild the interference graph after each spill')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15112>
Not all Vulkan implementations allows rendering to linear images, so in
order to support scanning out from these on Windows we might have to copy
through a buffer like we do in the PRIME path.
To avoid reimplementing the same, let's instead generalize the code a
bit so it doesn't have to specfy any PRIME-specific details.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12210>
Application may exit without freeing created display list, this
may leave the cache not empty.
This is triggered by Abaqus which just close X11 display without
calling any of GLX cleanup functions like glXDestroyContext. But
GLX hook to X11 display close function to free GLX screen resource.
So display list as a context resource has not been freed, but
cache as a screen resource is freed.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14926>
DRI3 window back buffer is a client resource, so it's destroyed
when context switch drawable for native window.
But some application like Abaqus may leave a dirty back buffer
and reuse it when switch back. So add a driconf option for these
kind of app to keep the entire GLX drawable for native window.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14926>
When we spill we add new temps. We should be careful not to access
liveness for these until we have re-computed it after all spills and
fill for that the spilled temp have been processed so as to avoid
out-of-bounds accesses to the c->temp_start and c->temp_end arrays.
This fixes a crash in a Three.js demo when we try to patch register
classes after a TMU spill that was caused because we would incorrectly
try to patch the same temps we had just added for the spill itself,
which is not only unnecessary but also incorrect since we these temps
would not have liveness information available yet and thus would
cause out of bounds accesses.
Fixes: f3c3228522 ('broadcom/compiler: do not rebuild the interference graph after each spill')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15107>
We need to take ownership of shader keys before we can insert them into
the hash table. Caught by ASan.
==6343==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x00016bc51410 at pc 0x00010498d6cc bp 0x00016bc50240 sp 0x00016bc4f9d0
READ of size 592 at 0x00016bc51410 thread T0
#0 0x10498d6c8 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long)+0x208 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x196c8)
#1 0x10498da08 in wrap_memcmp+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x19a08)
#2 0x10b7f3f18 in asahi_shader_key_equal agx_state.c:867
#3 0x10a482e7c in hash_table_search hash_table.c:325
#4 0x10b7f4e94 in agx_update_shader agx_state.c:899
#5 0x10b7f0dc4 in agx_draw_vbo agx_state.c:1590
#6 0x10a7c28c4 in u_vbuf_draw_vbo u_vbuf.c:1498
#7 0x10a5db03c in cso_multi_draw cso_context.c:1639
#8 0x10aed03d0 in _mesa_validated_drawrangeelements draw.c:1812
#9 0x10aed08d4 in _mesa_DrawElements draw.c:1945
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15109>
This removes the block at the end of fragment processing and allow
multiple scenes to be queued to the rasterizer up to MAX_SCENES.
Running heaven with this shows up to 39 scenes been allocated in the
first few renders, but it also removes all stalls in rendering until
the present stalls for completion. It reduces frame rendering times
(not enough to make it > 1 fps) but looks to be about 10% increase.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
When llvmpipe is allowed execute fragment shaders overlapping
with other stuff, we have to start using the pipe fences.
With presentation, the acquire path needs to signal a semaphore
that can be waited on by the user, so add support for passing
signal/wait semaphores for non-timeline in, and just put a
fence pointer in the semaphore for that case.
This fixes rendering once we allow overlapping rasterization.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
Currently neither llvmpipe or softpipe ever leave any drawing in
the pipeline, but I'd like to change that for llvmpipe.
This makes drisw block for completed rendering before sending
data to the X server.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
If the luminance formats aren't renderable, they back out to R*
formats, but those will end up with a 1 in alpha rather than 0 when
textured. So instead make them explicitly renderable, which will cause
the correct texture format swizzle to be applied.
Fixes query-rgba-signed-components and probably others.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15097>
Until now we have lived without a refcount mechanism in the driver
because in Vulkan the user is responsible for handling the life
span of memory allocations for all Vulkan objects, however,
imported BOs are tricky because the kernel doesn't refcount
so user-space needs to make sure that:
1. When importing a BO into the same device used to create it
(self-importing) it does not double free the same BO.
2. Frees imported BOs that were not allocated through the same
device.
Our initial implementation always freed BOs when requested,
so we handled 2) correctly but not 1) on drm and we would
double-free self-imported BOs because kernel doesn't return
a unique gem_handle on each import.
Beside this the submit ioctl checks for duplicates in the
BO list and returns an error if there is one.
This fixes the problem for good by adding refcounts to BOs
so that self-imported BOs have a refcnt > 1 and are only freed
when all references are freed.
KGSL on the other hand does not have the same problems,
at least not with ION buffers which are used for exportable
BOs on pre 5.10 android kernels.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5936
Fixes CTS tests: dEQP-VK.drm_format_modifiers.export_import.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15031>
It is good to know that we run out of ring space and have to wait. This
happens easily with fossilize-replay because encoding a
vkCreateGraphicsPipeline takes microseconds while executing it can take
milliseconds, >100ms sometimes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14966>
This reverts commit 29d319c767.
Now that we use nir_lower_bool_to_bitsize, we don't see 1-bit booleans
anymore, so the issue this fixed doesn't apply. Actually, that issue was
(in part) why I started looking into boolean handling in the first
place.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14576>
This lets us avoid generating SWZ instructions. Those instructions could
be constant folded but that complicates the replication analysis
introduced in the next commit.
Almost no shader-db changes.
quadwords HURT: shaders/glmark/1-22.shader_test MESA_SHADER_FRAGMENT: 718 -> 722 (0.56%)
total quadwords in shared programs: 68169 -> 68173 (<.01%)
quadwords in affected programs: 718 -> 722 (0.56%)
helped: 0
HURT: 1
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14576>
We've got lots of VK coverage on 618, so take some of the load off (but
leave a little bit of testing just to make sure we don't totally break
630). This should help with our Marge times since we've added some other
coverage to 630 that's started overloading the runners.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15085>
The old calculation for mat3 was clever, but it turns out that a
straightforward application of subdeterminants similar to how mat4 is
handled is more efficient: on a scalar architecture with some sort of
combined multiply+add instruction with a negate modifier (both fairly
common), the new determinant is 9 instructions vs. 15 for the old one,
and without the multiply-add it's 14 instructions vs. 18 for the old
one. When used as a routine for inverse() the savings are compounded,
because we now use the same method as used to compute the adjucate
matrix and so CSE can combine most of the calculations with the adjucate
matrix ones.
Once mat3 and mat4 use the same method for computing determinants, we
can combine them into a single recursive function. I also pulled up the
mat_subdet() function because it was doing basically what we need, so
it's now shared between determinant and inverse. This shrinks the
implementation significantly, as can be seen from the diffstat.
The real reason I want to change this, though, is that it fixes
dEQP-VK.glsl.builtin.precision_fp16_storage16b.inverse.compute.mat3 with
turnip. Qualcomm uses round-to-zero for 16-bit frcp, which combined with
some inaccuracy in the old method of calculating the determinant led us
to fail. Qualcomm's driver uses something like the new method to
calculate the determinant in the inverse. We could argue that Mesa's
method should be allowed, because round-to-zero for floating-point
division is within spec and there are no precision guarantees given for
determinant() or inverse(). However we might as well use the more
efficient method.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14652>
It won't work if the blob is fixed-size and we overrun the size, which
will be the case with the Vulkan pipeline cache.
This gets a bit tricky for the repeated-header optimization, because we
can't read the header from the blob. Instead we have to store the header
itself.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15028>
Label IDVS variants as being MESA_SHADER_{POSITION, VARYING} stages;
reserve the MESA_SHADER_VERTEX label for non-IDVS shaders. This reduces
confusion where a single shader compiles to two MESA_SHADER_VERTEX
shaders with different stats.
While we're at it, de-vendor the blend shader stage name; these stats
are internal anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15086>
C++ requires explicit casts from integers to enums. Fixes errors like
the following when trying to use Asahi GenXML from a GTest unit test.
src/asahi/lib/agx_pack.h:554:23: error: assigning to 'enum agx_channels' from incompatible type 'uint64_t' (aka 'unsigned long long')
values->channels = __gen_unpack_uint(cl, 0, 6);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
The texture descriptor we construct for reloading needs to respect the
surface's texture/layer selection. Fix exactly the same bug as
b8c31ac06d ("lima: fix glCopyTexSubImage2D").
Fixes:
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.2d_rgb
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.2d_rgba
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.cube_rgb
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.cube_rgba
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
We can only use linear for 2D images, not even 2D arrays. Even for 2D
images, we only want to use linear if:
* We are required to use linear due to window system requirements.
* The texture is streaming.
Otherwise, we want to use tiled textures. (Or better, compressed, but we
don't support that yet.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
nir_alu_instr_is_comparison needs to consider all comparison opcodes regardless
of size. Otherwise, they will be missed by nir_opt_move/sink.
Without this change, lowering booleans to integers regresses register
pressure (and spills/fills) significantly in certain shaders on Panfrost,
like android/com.miHoYo.GenshinImpact/1420.shader_test.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15073>
The Vulkan 1.3 spec says:
"The implementation-dependent limit bufferImageGranularity specifies
a page-like granularity at which linear and non-linear resources
must be placed in adjacent memory locations to avoid aliasing. Two
resources which do not satisfy this granularity requirement are said
to alias. bufferImageGranularity is specified in bytes, and must be
a power of two. Implementations which do not impose a granularity
restriction may report a bufferImageGranularity value of one.
Note: Despite its name, bufferImageGranularity is really a
granularity between "linear" and "non-linear" resources."
We set this limit to 64 bytes (a cacheline) at the dawn of time, without
any real rationale attached. There shouldn't be any restrictions here.
Our tile sizes are typically 4K, and tiled resource addresses are
aligned to the tile size, and the extent is also a multiple of the tile
sized. So if a linear resource occurs before a tiled one, there will
naturally be some space due to the alignment of the tiled resource's
starting address. If a linear resource occurs after a tiled one, the
tiled resource's ending address is already 4K aligned, which is already
guaranteeing that they won't share a cacheline.
So I think it should be fine to reduce this to 1. The other Vulkan
driver for our hardware seems to advertise 1 here as well.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15066>
Instead, we only recompute liveness and we add new nodes and
interferences to the graph manually (we also need to patch
register classes in some cases).
To assist in this process, we also add an ip counter to our
instructions that we also recompute after each spill, which we use
to identify registers that cross thrsw boundries introduced with
TMU spills and fills and adjust their register classes accordingly
(removing their capacity to use accumulators).
This significantly reduces the CPU cost of spills. Using
shaders/closed/gputest/piano/7.shader_test as reference:
Compile time up to the first successful compile strategy in main is
~24s and with this change it is ~11s. With this speed up, we can now
try all 2-thread compile strategies (including the fallback scheduler)
in only ~15s.
A full shader-db run results in:
Total CPU time (seconds): 9904.67 -> 9087.98 (-8.25%)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
We may be pipelining TMU writes and reads, in which case we can
see both TMUWT and LDTMU at the end of a TMU sequence, so we should
not assume that a TMUWT always terminates a sequence.
Also, we had a bug where we were using inst instead of scan_inst
to check if we find another TMUWT after the curent instruction.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Instead of whether they are allowed to spill or not. This is more flexible.
Also, while we are not currently enabling spilling on any 4-thread strategies,
should we do that in the future, always prefer a 4-thread compile.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Until now we would only allow spilling as a last resort in the
last 2 strategies, however, it is possible that in some cases
earlier strategies may produce less spills if we allowed spilling
on them.
Likewise, the fallback scheduler can sometimes produce less spills
than 2 threads with optimizations disabled.
With this change, we start allowing all our 2-thread strategies to
spill, and instead of choosing the first strategy that is successful,
we choose the one that doesn't spill or the one with the least amount
of spilling.
It should be noted that this may incur in a significant increase
of compile times. We will address this in a follow-up patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Commit 38800b38 changed nir_opcodes.py, but that doesn't seem to have
triggered nir_opt_algebraic.py. The change in 75ef5991 depends on
opt_algebraic lowering 16-bit versions of slt, but if opt_algebraic is
not rebuilt, this may not happen. This resulted in some people seeing
assertion failures in, for example,
dEQP-VK.spirv_assembly.instruction.compute.float16.arithmetic_3.step,
due to the backend seeing nir_op_slt that it didn't know how to handle.
v2: Add nir_opcodes.py to nir_algebraic_py so that all the per-driver
algebraic passes pick up the dependency too. Rename it to
nir_algebraic_depends. Suggested by Emma.
Closes: #6047
Fixes: d1992255bb ("meson: Add build Intel "anv" vulkan driver")
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15050>
memcpy is divided into chunks that are vec4 sized max. The problem
here happens with a structure of 24 bytes :
struct {
float3 a;
float3 b;
}
If you memcpy that struct, the lowering will emit 2 load/store, one of
sized 8, next one sized 16. But both end up located at offset 0, so we
effectively drop 2 floats.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a3177cca99 ("nir: Add a lowering pass to lower memcpy")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15049>
The mechanism currently used to pass data from the dEQP child process
executed in a crosvm guest environment towards the deqp-runner wrapper
script that starts the crosvm instance is based on creating, writing
and reading regular files.
In addition to the main drawback of using the storage, this approach
is potentially unreliable because the data cannot be transferred in
real-time and there is no control on ending the transmission. It also
requires a forced sleep for syncing the content, while the minimum
amount of time necessary to wait cannot be easily and safely
determined.
Replace this with an IPC based on the virtio transport for virtual
sockets (virtio-vsock).
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14995>
The img->deferred_info will out-live vn_CreateImage, so we need a deep
copy of the VkImageFormatListCreateInfo struct.
This change also avoids tracking VkImageFormatListCreateInfo struct with
a zero viewFormatCount.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15017>
lava_job_submitter.py unit tests are written in pytest and uses
freezegun in order to simulate timeouts in some tests scenarios. So,
this commit adds the packages `python3-pytest` and `python3-freezegun`
to fulfill this dependencies.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14876>
When the lava_job_submitter.py retry loop finishes normally (without
falling through break-loop) it means that the submitter has exceeded the
retry count limit. However, when it happens the script
finishes normally. This patch adds a treatment to this case, warning the
user what happened and forcing the job to fail.
Moreover, this commit will make retry configurations configurable by
CI job, as it can take the default value from the following variables:
- LAVA_DEVICE_HANGING_TIMEOUT_SEC
- LAVA_WAIT_FOR_DEVICE_POLLING_TIME_SEC
- LAVA_LOG_POLLING_TIME_SEC
- LAVA_NUMBER_OF_RETRIES_TIMEOUT_DETECTION
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14876>
If a secondary command buffer is used and the client provides a
framebuffer and that framebuffer has a stencil-only attchment, we would
try to get the aux usage for the depth component of that attachment and
crash. Check the aspects of the image before looking at aux usage.
This fixes at least the following SkQP tests on my Tigerlake:
- vk_circular-clips
- vk_filterfastbounds
- vk_innershapes_bw
- vk_lineclosepath
- vk_multipicturedraw_rrectclip_simple
- vk_pathinvfill
- vk_quadclosepath
- vk_rrect_clip_bw
- vk_windowrectangles
Fixes: 0d8b9c529c ("anv: Allow PMA optimization to be enabled in secondary command buffers")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15048>
Valhall has a new twist on Mali's task splitting voodoo, plus compute offset
support.
On Bifrost + Vulkan, compute offsets needed lowering on Bifrost (gl_GlobalID).
Valhall saves a few instructions here.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15047>
I was doing some RE on freedreno and we had some questions about when the
hardware might need non-uniform or non-constant array access for various
descriptor types, so let's leave some notes for the next person.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13621>
shader-db results (note that we expect many more loops unrolled, so instr
count is probably understating the win):
nv30:
total instructions in shared programs: 16535069 -> 14299105 (-13.52%)
instructions in affected programs: 16377286 -> 14141322 (-13.65%)
total gpr in shared programs: 81255 -> 67268 (-17.21%)
gpr in affected programs: 56714 -> 42727 (-24.66%)
LOST: 0
GAINED: 824
nv40:
total instructions in shared programs: 20907673 -> 18428749 (-11.86%)
instructions in affected programs: 20755510 -> 18276586 (-11.94%)
total gpr in shared programs: 104200 -> 82831 (-20.51%)
gpr in affected programs: 80278 -> 58909 (-26.62%)
14130
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14130>
With the recent addition of the shortcuts aiming to avoid atomic
operations, the reference count on resources can become unbalanced
in the Tegra driver since they are wrapped and then proxied to the
Nouveau driver.
Fix this by keeping a private reference count.
Fixes: 7688b8ae98 ("st/mesa: eliminate all atomic ops when setting vertex buffers")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
With the recent addition of the shortcuts aiming to avoid atomic
operations, the reference count on sampler views can become unbalanced
in the Tegra driver since they are wrapped and then proxied to the
Nouveau driver.
Fix this by keeping a private reference count.
Fixes: ef5d427413 ("st/mesa: add a mechanism to bypass atomics when binding sampler views")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
I had missed a int -> enum conversion in one recently added function and
it's probably nice to also dump the target value also in
trace_dump_resource_template() so let's do just that.
Signed-off-by: Matti Hamalainen <ccr@tnsp.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14980>
Once we simplified a phi node, we never updated the definition it points
to, which meant that it could become out of date if that definition were
also simplified, and we didn't check that when rewriting sources. That
could happen when there are multiple nested loops with phi nodes at the
header.
Fix it by updating the phi's pointer. Since we always update sources
after visiting the definition it points to, when we go to rewrite a
source, if that source points to a simplified phi, the phi's pointer
can't be pointing to a simplified phi because we already visited the phi
earlier in the pass and updated it, or else it's been simplified in the
meantime and this isn't the last pass. This way we don't need to
keep recursing when rewriting sources.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15035>
This fixes artifacts seen in games when using ASTC transcoding,
we need to use DXT5 for proper alpha channel support.
Number of components is a block specific property, there is no easy
way to see if we will require >1bit alpha support or not, so simply
use DXT5 to have support in place.
Fixes: 91cbe8d855 ("gallium: Add a transcode_astc driconf option")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15029>
This introduces inotify support in RADV to handle changes from the
RADV_FORCE_VRS_CONFIG_FILE. This is similar to MangoHUD. I'm personally
not sure it's the best solution but let's try this way and change it
later if we have issues (or if we have a lightweight solution).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14713>
When I originally added vk_image_view, I was overly clever when it came
to the format field. I decided to make it only contain the bits of the
format contained in the selected aspects. However, this is confusing
(not generally a good thing) and it's also not always what you want.
The Vulkan 1.3.204 spec says:
"When using an image view of a depth/stencil image to populate a
descriptor set (e.g. for sampling in the shader, or for use as an
input attachment), the aspectMask must only include one bit, which
selects whether the image view is used for depth reads (i.e. using a
floating-point sampler or input attachment in the shader) or stencil
reads (i.e. using an unsigned integer sampler or input attachment in
the shader). When an image view of a depth/stencil image is used as
a depth/stencil framebuffer attachment, the aspectMask is ignored
and both depth and stencil image subresources are used."
So, while the restricted format makes sense for texturing, it doesn't
for when the image is being used as an attachment. What we probably
actually want is both versions of the format. We'll call the one given
by the VkImageViewCreateInfo vk_image_view::format and the restricted
one vk_image_view::view_format.
This is just the first commit which switches format to view_format so
the compiler will make sure we get them all. The next commit will
re-add vk_image_view::format but this time unmodified.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15007>
I suspect this double-check and comment was due to originally using ir.nir
as the condition, which might be uninitialized if !IR_NIR. You could only
take the branch if IR_NIR was set, and you should always not take if it
!IR_NIR, so it worked out in the end, but it would cause spurious valgrind
warnings if you hadn't zeroed out your TGSI shader's struct.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14896>
There were two paths in this code: One was transform_loops, which would
try to detect loops with an iteration count it understood and unroll that
many times. The other was emulate_loops, which would just figure out how
many instructions the program could have and still compile (hopefully),
and unroll this loop however many times would fit in that.
The transform_loops had no analysis as good as GLSL or NIR loop unrolling
have, so it shouldn't be missed -- any opportunity it found would only be
due to bugs in the unrolling code.
The emulate_loops path had an issue with computing the number of times it
should try to unroll -- if you had more instrs than ALUs available
already, you'd overflow and unroll approximately infinitely many times,
OOMing the system. But, also, it's better to throw a compiler error about
unsupported loops than to run the loop an incorrect number of times and
call it a success.
Fixes: #5883, #6018
Reviewed-by: Filip Gawin <filip.gawin@zoho.com>
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15004>
this otherwise treates begin/end/begin the same as begin/pause/resume
cc: mesa-stable
fixes:
KHR-GL46.texture_view.view_classes
KHR-GL46.transform_feedback.capture_geometry_separate_test
KHR-GL46.transform_feedback.capture_vertex_separate_test
KHR-GL46.transform_feedback.query_geometry_separate_test
KHR-GL46.transform_feedback.query_vertex_separate_test
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15020>
virglrenderer makes invalid shaders when faced with vector 64-bit
instructions, which GLSL-to-TGSI never produced. While this doesn't fix
everything, it does get more tests running, and virgl probably the primary
consumer of 64-bit TGSI. virgl may be deprecating its host 64-bit
support, at which point we can drop this workaround.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15014>
Prior to the noted MR, virglrenderer encoded the tex operands in a
limited-size temp buffer which we could easily overflow with TGSI
immediates. GLSL-to-TGSI always emitted an extra MOV, so keep that
behavior when nir-to-tgsi feeds us more optimized TGSI.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15014>
GLSL-to-TGSI would emit some MOVs that made things OK, but NTT
successfully copy-propagates the inputs and sysvals more, triggering bugs
in virglrenderer when the signedness of the input is different from that
of the ALU op.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15014>
Various workaround paths in virglrenderer assume that outputs are written
with a full writemask, which is not required by TGSI. To work around
that, store affected outputs in a temp and do a full write after each
writemasked write.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15014>
We really need offsets to be in dwords, not in vec4s.
The bug manifests as random failure of func.mesh.clipdistance.5 crucible
test, where stores to gl_MeshVerticesNV[x].gl_ClipDistance[4+n] actually write to
gl_MeshVerticesNV[x].gl_ClipDistance[1+n].
Fixes: 1f438eb033 ("intel/compiler: Implement Mesh Output")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14997>
Contains stuff needed for layered rendering. Unfortunately, there's no more
provoking vertex per draw -- ugh! That's fine for Vulkan (just don't set
provokingVertexModePerPipeline), but requires inserting extra flushes on desktop
OpenGL.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15003>
For some games that take like 400 MiB of shader binaries, the
number of shader arenas ends up going >1500. Cut that down a bit
by using larger arenas.
8 MiB should still be decent with small BAR and should still cut
things down from ~1500 to ~50 buffers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14591>
Some functions in ppir iterate the ppir_op_info slots arrays looking
for the PPIR_INSTR_SLOT_END token. The dummy/undef internal ops may
appear in the scheduling code and their slots arrays did not contain
that token, which could result in invalid array reads.
Reported by gcc -fsanitize=address.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Cc: 22.0 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14894>
Reported by gcc -fsanitize=address, sometimes gpir regalloc attempts to
handle an uninitialized node->value_reg (containing the value -1), which
results in an invalid array access.
Avoid it for now to prevent crashes, but more investigation may be
required later on.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Cc: 22.0 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14894>
Static analysis complains that spill_costs might be accessed in
non-initialized positions.
It does not seem to be an issue with the current code which initializes
it for every relevant register index, but we can also just initialize it
to not have to worry about that.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14894>
This was needed, so that in case of active helper lanes,
these contain the correct value. It is now handled implicitly.
Totals from 1004 (0.74% of 134913) affected shaders: (GFX10.3)
CodeSize: 7581020 -> 7580892 (-0.00%); split: -0.00%, +0.00%
Instrs: 1454940 -> 1454908 (-0.00%); split: -0.00%, +0.00%
Latency: 12984953 -> 12984894 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 3173037 -> 3173049 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 47498 -> 47273 (-0.47%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
Platforms that don't have flow control also don't have anything that
could be written that has a side effect. It should be safe to implement
these condition writes as
foo = csel(condition, bar, foo);
This should eliminate the last thing in the GLSL compiler that can
create new conditions on assignments. Everything else that can store
something in ir_assignment::condition derives it from a pre-existing
condition.
v2: Fix bad rebase.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14573>
When there are four or fewer elements left in the array partition, the
strategy changes from a binary search of nested flow control to sequence
of conditional assignments like
(assign, dest, src[constant_i+0], index == constant_i+0)
(assign, dest, src[constant_i+1], index == constant_i+1)
(assign, dest, src[constant_i+2], index == constant_i+2)
(assign, dest, src[constant_i+3], index == constant_i+3)
or
(assign, dest[constant_i+0], src, index == constant_i+0)
(assign, dest[constant_i+1], src, index == constant_i+1)
(assign, dest[constant_i+2], src, index == constant_i+2)
(assign, dest[constant_i+3], src, index == constant_i+3)
Realistically, the first case should use ir_triop_csel instead.
The second case will either get turned back into flow control like
if (index == constant_i+0)
(assign, dest[constant_i+0], src)
if (index == constant_i+1)
(assign, dest[constant_i+1], src)
if (index == constant_i+2)
(assign, dest[constant_i+2], src)
if (index == constant_i+3)
(assign, dest[constant_i+3], src)
or a sequence of conditional selects like
(assign, dest[constant_i+0], (csel, index == constant_i+0, src, dest[constant_i+0]))
(assign, dest[constant_i+1], (csel, index == constant_i+1, src, dest[constant_i+1]))
(assign, dest[constant_i+2], (csel, index == constant_i+2, src, dest[constant_i+2]))
(assign, dest[constant_i+3], (csel, index == constant_i+3, src, dest[constant_i+3]))
The former case should continue to use the binary search. The later
case could be generated from the binary search by other lowering passes.
At the end of the day, conditional assignments don't really help
anything here, so stop using them.
Radeon R430
total instructions in shared programs: 2398683 -> 2398419 (-0.01%)
instructions in affected programs: 5143 -> 4879 (-5.13%)
helped: 9
HURT: 8
total vinst in shared programs: 616292 -> 616010 (-0.05%)
vinst in affected programs: 4467 -> 4185 (-6.31%)
helped: 9
HURT: 8
total sinst in shared programs: 315417 -> 315667 (0.08%)
sinst in affected programs: 2568 -> 2818 (9.74%)
helped: 2
HURT: 15
total flowcontrol in shared programs: 1049 -> 1048 (-0.10%)
flowcontrol in affected programs: 7 -> 6 (-14.29%)
helped: 1
HURT: 0
total presub in shared programs: 47027 -> 47027 (0.00%)
presub in affected programs: 127 -> 127 (0.00%)
helped: 1
HURT: 1
total omod in shared programs: 3618 -> 3615 (-0.08%)
omod in affected programs: 8 -> 5 (-37.50%)
helped: 3
HURT: 0
total temps in shared programs: 450757 -> 451312 (0.12%)
temps in affected programs: 837 -> 1392 (66.31%)
helped: 8
HURT: 6
total consts in shared programs: 1031928 -> 1031920 (<.01%)
consts in affected programs: 1211 -> 1203 (-0.66%)
helped: 6
HURT: 7
The shaders that were hurt for temps... are all lies. None of those
shaders should have compiled as all 6 had more than 32 temps to begin
with.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14573>
Use if-statements instead. Any hardware that supports this sort of
tessellation has flow control, so it will probably emit the conditional
assignment using an if-statement anyway. This is definitely what
st_glsl_to_nir does.
v2: Fix copy-and-paste bug in the ir_type_swizzle handling. This bug
caused segfaults in tests/spec/arb_tessellation_shader/execution/variable-indexing/tcs-patch-vec4-swiz-index-wr.shader_test.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14573>
The nir approximation is a bit more precise so there is one more
instruction for the scalar version but if the shader actually uses
vector one or some other stuff like sin(x) followed by cos(x) we
save more.
This nir approximation importantly seems to have better precision
so this should also fix some piglits/dEQPs.
With my shader-db and faked R300:
total instructions in shared programs: 67751 -> 65978 (-2.62%)
instructions in affected programs: 8637 -> 6864 (-20.53%)
total temps in shared programs: 9191 -> 9137 (-0.59%)
temps in affected programs: 486 -> 432 (-11.11%)
total consts in shared programs: 45427 -> 45412 (-0.03%)
consts in affected programs: 856 -> 841 (-1.75%)
total lits in shared programs: 2317 -> 2346 (1.25%)
lits in affected programs: 69 -> 98 (42.03%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14957>
This patch moves the shrinking of store sources into
a separate pass.
The reasoning behind this is that this pass usually only
needs to be called once while nir_shrink_vectors might
better be called several times. This allows to move
the pass(es) out of the optimization loops.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14480>
This wasn't much of a problem before because vk_command_buffer_finish()
doesn't do much on an empty command buffer. However, it's about to be
responsible for managing the pool's list of command buffers so it will
be critical to get this right.
Fixes: c9189f4813 ("anv: Use a common vk_command_buffer structure")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14917>
Now that we have no overlapping memory addresses for different
contexts we can go ahead and also share the same VM for every context.
This should make VM binding much easier to implement once we have
Kernel support for it.
v2: We now have engines_ctx, set it there too.
v3: Fix indenting (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12028>
Have a single border color pool per bufmgr instead of per context.
We want to have a single VM shared among every context and the border
color pool is the last feature preventing us from having that.
Previously we had 1024 colors per context but once the buffer was full
we just waited for the buffer to be unused and restarted it. After
this patch we have 4096 colors for every single context and we can't
just flush buffers if they are full, so we simply return black.
There are many strategies we could try to implement to help alleviate
this new 4096 limit, none of which are implemented by this patch:
- We could just expand the buffer to the full 16MB we can use,
allowing 262144 colors.
- We could use multiple buffers and make the contexts refcount them,
so eventually older buffers would reach zero references and be
recycled, moving us to a working set maximum from a lifetime
maximum.
- We could also make the border color pool be a standard memzone and
then give smaller buffers to each context when they need, so the
limit would be in the number of contexts that can use border color
pools. This was my first implementation but Ken suggested I switch
to the one provided by this patch, which is simpler.
Keep it like this since border colors don't seem to be used very much
and other Mesa drivers such as radeonsi also seem to employ the
"return black once we reach the limit" strategy.
As a last note, we could also move the contents of iris_border_color.c
to iris_bufmgr.c in order to avoid breaking some abstractions we have
in Iris, like we do with iris_bufmgr_get_border_color_pool(). I can do
this in case we want it.
v2: Switch from standard memzone to a per-screen thing (see above).
v3: Actually make it per bufmgr. Just making it per screen is not
enough, since screens can share the same VM, an example being the
gputest benchmark suite.
v4: Rebase.
v5: Remove dead code, lock around hash table lookup (Ken).
v6: Simple rebase.
v7: Another rebase (for_each_batch).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12028>
We're moving towards a path where all contexts share the same virtual
memory - because this will make implementing vm_bind much easier - ,
and to achieve that we need to rework the binder memzone. As it is,
different contexts will choose overlapping addresses. So in this patch
we adjust the Binder to be 1GB - per Ken's suggestion - and use a real
vma_heap for it. As a bonus the code gets simpler since it just reuses
the same pattern we already have for the other memzones.
Credits to Kenneth Granunke for helping me with this change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12028>
this codepath needs to update literally one spirv word to get the right shader
(why can't this be a spec constant?)
so just update that word instead of running all the nir passes and doing a
full recompile
fixes:
KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_MaxPatchVertices_Position_PointSize
KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14976>
by initializing this on context creation, we can ensure that the correct
value is always here
cc: mesa-stable
fixes:
dEQP-GLES31.functional.texture.multisample.samples_1.sample_mask_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_1.sample_mask_and_sample_coverage
dEQP-GLES31.functional.texture.multisample.samples_1.sample_mask_and_sample_coverage_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_1.sample_mask_only
dEQP-GLES31.functional.texture.multisample.samples_2.sample_mask_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_2.sample_mask_and_sample_coverage
dEQP-GLES31.functional.texture.multisample.samples_2.sample_mask_and_sample_coverage_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_2.sample_mask_only
dEQP-GLES31.functional.texture.multisample.samples_3.sample_mask_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_3.sample_mask_and_sample_coverage
dEQP-GLES31.functional.texture.multisample.samples_3.sample_mask_and_sample_coverage_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_3.sample_mask_only
dEQP-GLES31.functional.texture.multisample.samples_4.sample_mask_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_4.sample_mask_and_sample_coverage
dEQP-GLES31.functional.texture.multisample.samples_4.sample_mask_and_sample_coverage_and_alpha_to_coverage
dEQP-GLES31.functional.texture.multisample.samples_4.sample_mask_only
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14974>
There's already a single command queue for the screen, meaning that
all commands are being serialized implicitly into that queue. There's
no need to have separate fences for parallel contexts when those
fences would all share the same underlying timeline.
This adds an explicit lock to expand the scope of the implicit screen
command queue ordering to include fence signals.
Each context still gets its own submit sequence, which is used for 1
purpose right now: A uniqueness check in the state manager to see
if states are coming from separate command lists, to apply promotion
and decay logic.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14959>
The no-wait condition was wrong before. If the query was used in the
current batch (query->fence_value == context->fence_value), we'd
continue on with the operation instead of returning false. Then the
buffer map would see that the bo is referenced in the current batch,
and would flush and wait, even though we were asked not to wait.
This fixes the condition by simply using the (correct) buffer map
logic.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14959>
Properly handling NaN adversely affects several hundred shaders in
shader-db (lots of Skia and a few others from various synthetic
benchmarks) and fossil-db (mostly Talos and some Doom 2016). Only apply
the NaN handling work-around when the shader demands it.
v2: Add comment explaining the 1.0*y_over_x. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 2098ae16c8 ("nir/builder: Move nir_atan and nir_atan2 from SPIR-V translator")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
frexp_sig of ±0, ±Inf, or NaN should just return the input unmodified.
frexp_exp of ±Inf or NaN is undefined, and frexp_exp of ±0 should return
the input unmodified. This seems to already work.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 23d30f4099 ("spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
NOTE: This commit needs "nir: All set-on-comparison opcodes can take all
float types" or regressions will occur in other Vulkan SPIR-V tests.
No shader-db changes on any Intel platform.
NOTE: This commit depends on "nir: All set-on-comparison opcodes can
take all float types".
v2: Fix handling 16-bit (and presumably 64-bit) values.
About 280 shaders in Talos are hurt by a few instructions, and a couple
shaders in Doom 2016 are hurt by a few instructions.
Tiger Lake
Instructions in all programs: 159893290 -> 159895026 (+0.0%)
SENDs in all programs: 6936431 -> 6936431 (+0.0%)
Loops in all programs: 38385 -> 38385 (+0.0%)
Cycles in all programs: 7019260087 -> 7019254134 (-0.0%)
Spills in all programs: 101389 -> 101389 (+0.0%)
Fills in all programs: 131532 -> 131532 (+0.0%)
Ice Lake
Instructions in all programs: 143624235 -> 143625691 (+0.0%)
SENDs in all programs: 6980289 -> 6980289 (+0.0%)
Loops in all programs: 38383 -> 38383 (+0.0%)
Cycles in all programs: 8440083238 -> 8440090702 (+0.0%)
Spills in all programs: 102246 -> 102246 (+0.0%)
Fills in all programs: 131908 -> 131908 (+0.0%)
Skylake
Instructions in all programs: 134185495 -> 134186618 (+0.0%)
SENDs in all programs: 6938790 -> 6938790 (+0.0%)
Loops in all programs: 38356 -> 38356 (+0.0%)
Cycles in all programs: 8222366923 -> 8222365826 (-0.0%)
Spills in all programs: 98821 -> 98821 (+0.0%)
Fills in all programs: 125218 -> 125218 (+0.0%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 1feeee9cf4 ("nir/spirv: Add initial support for GLSL 4.50 builtins")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
This (sort of) matches the behavior of nir_opt_algebraic. This ensures
that subnormal values are properly flushed to zero.
With the aid of "nir/search: Float sources of texture instructions are
float users" and "nir/search: Transitively apply is_only_used_as_float",
there would have been no shader-db regressions on Intel platforms.
However, those caused a significant increase in compile time. Since the
instruction regressions were so small, I just dropped those commits
rather than improve them.
All Haswell and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20125042 -> 20125094 (<.01%)
instructions in affected programs: 7184 -> 7236 (0.72%)
helped: 0
HURT: 32
HURT stats (abs) min: 1 max: 4 x̄: 1.62 x̃: 2
HURT stats (rel) min: 0.11% max: 1.49% x̄: 0.85% x̃: 0.78%
95% mean confidence interval for instructions value: 1.39 1.86
95% mean confidence interval for instructions %-change: 0.74% 0.96%
Instructions are HURT.
total cycles in shared programs: 862745586 -> 862746551 (<.01%)
cycles in affected programs: 109872 -> 110837 (0.88%)
helped: 12
HURT: 23
helped stats (abs) min: 2 max: 774 x̄: 90.83 x̃: 19
helped stats (rel) min: 0.07% max: 25.23% x̄: 3.06% x̃: 0.40%
HURT stats (abs) min: 2 max: 1106 x̄: 89.35 x̃: 12
HURT stats (rel) min: 0.08% max: 45.40% x̄: 3.01% x̃: 0.47%
95% mean confidence interval for cycles value: -60.09 115.23
95% mean confidence interval for cycles %-change: -2.21% 4.07%
Inconclusive result (value mean confidence interval includes 0).
All of the shaders hurt are in either UE4 shooter-game or shooter_demo.
Tiger Lake
Instructions in all programs: 159893213 -> 159893290 (+0.0%)
SENDs in all programs: 6936431 -> 6936431 (+0.0%)
Loops in all programs: 38385 -> 38385 (+0.0%)
Cycles in all programs: 7019259514 -> 7019260087 (+0.0%)
Spills in all programs: 101389 -> 101389 (+0.0%)
Fills in all programs: 131532 -> 131532 (+0.0%)
Ice Lake
Instructions in all programs: 143624164 -> 143624235 (+0.0%)
SENDs in all programs: 6980289 -> 6980289 (+0.0%)
Loops in all programs: 38383 -> 38383 (+0.0%)
Cycles in all programs: 8440082767 -> 8440083238 (+0.0%)
Spills in all programs: 102246 -> 102246 (+0.0%)
Fills in all programs: 131908 -> 131908 (+0.0%)
Skylake
Instructions in all programs: 134185424 -> 134185495 (+0.0%)
SENDs in all programs: 6938790 -> 6938790 (+0.0%)
Loops in all programs: 38356 -> 38356 (+0.0%)
Cycles in all programs: 8222366529 -> 8222366923 (+0.0%)
Spills in all programs: 98821 -> 98821 (+0.0%)
Fills in all programs: 125218 -> 125218 (+0.0%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: f5dd6dfe01 ("anv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controls")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
Extend 4195a9450b so that the next poor fool doesn't come along and
say, "sge does the right thing for 16-bit sources, but slt gives a NIR
validation failure. What the deuce?"
NOTE: This commit is necessary to prevent regressions in GLSLstd450Step
tests of 16-bit sources at "spriv: Produce correct result for
GLSLstd450Step with NaN".
Fixes: 4195a9450b ("nir: sge operation is defined for floating-point types")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
When slots is 64 only the first bit was being set, instead of setting
all 64 bits of the variable, so for that case the function
get_variable_io_mask() always returned 0.
This behaviour caused variables that are being used both on producer and
consumer to be considered unused and thus being removed on
nir_remove_unused_io_vars().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14955>
Expose most of the counters exposed by blob. By faking the value of
counters returned from kgsl I found the exact underlying counters and
constant coefficients being used.
Note, coefficients for counters that depend on time are NOT verified.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14323>
Although a resource may support CCS with its original format, a texture
view of that resource may have a format that doesn't support
compression. Don't create CCS surface states for such texture views.
This change affects iris' behavior when running piglit's
arb_texture_view-rendering-formats_gles3 test on SKL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14806>
Fast clear with the resource format instead. This is safe to do because
can_fast_clear_color ensures that the clear color generates the same
pixel with either the view format or the resource format.
On SKL, this prevents us from using an invalid surface state. This platform
doesn't support CCS_E with sRGB formats, but prior to this patch we allowed
fast-clearing with this combination. Piglit's fcc-write-after-clear test
can trigger this.
Fixes: 230952c210 ("iris: Don't support sRGB + Y_TILED_CCS on gen9")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14806>
the idx param for LLVMBuildInsertElement is zero-indexed based on the
value of 'vector_length' (always 4), and the vector length is (obviously)
sized to 'vector_length', so this should be the member of the vec that is being
inserted, not the invocation index
cc: mesa-stable
fixes (zink, but only on my one machine):
KHR-GL46.tessellation_shader.single.max_patch_vertices
KHR-GL46.tessellation_shader.tessellation_shader_tc_barriers.barrier_guarded_read_write_calls
dEQP-GLES31.functional.tessellation.shader_input_output.barrier
dEQP-GLES31.functional.tessellation.shader_input_output.patch_vertices_5_in_10_out
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_isolines_geometry_output_points
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_isolines_point_mode_geometry_output_triangles
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_quads_geometry_output_points
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_quads_point_mode_geometry_output_lines
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_triangles_geometry_output_points
dEQP-GLES31.functional.tessellation_geometry_interaction.feedback.tessellation_output_triangles_point_mode_geometry_output_lines
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14949>
All of the opcodes in nir_opt_algebraic_late are the unsized (1-bit)
versions. If the lowering to int32 happens first, many of the
optimizations and lowerings won't happen.
Of particular importance is the lowering of fisfinite. If a shader
happens to contain fisfinite of an fp16 value, it will assert later
during compliation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 78b4e417d4 ("gallivm: handle fisfinite/fisnormal")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14942>
Allocating NIR registers ends up being required for drivers like r600 and
nv30, which don't do their own allocation (except in some cases on r600
where sb is used).
Rather than add a NIR register liveness impl
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14158), switch
from NIR-based liveness to just doing the same channel-based liveness
logic that the NIR registers needed at the TGSI level. The actual
liveness code here basically comes straight out of
brw_vec4_live_variables.cpp.
Since we do the liveness in TGSI now, it also means we don't need to be
careful about not reading SSA values from later TGSI instructions (which
may be useful for doing some greedy instruction selection in generating
TGSI instructions).
i915g:
total instructions in shared programs: 400719 -> 380730 (-4.99%)
instructions in affected programs: 284760 -> 264771 (-7.02%)
total tex_indirect in shared programs: 12289 -> 12290 (<.01%)
tex_indirect in affected programs: 4 -> 5 (25.00%)
total temps in shared programs: 32172 -> 22086 (-31.35%)
temps in affected programs: 30647 -> 20561 (-32.91%)
LOST: 0
GAINED: 148
r300:
total instructions in shared programs: 1472463 -> 1459286 (-0.89%)
instructions in affected programs: 507009 -> 493832 (-2.60%)
total temps in shared programs: 212143 -> 201678 (-4.93%)
temps in affected programs: 78007 -> 67542 (-13.42%)
softpipe:
total temps in shared programs: 517071 -> 294387 (-43.07%)
temps in affected programs: 509324 -> 286640 (-43.72%)
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14404>
To do register allocation well, we want to have a point before
ureg_insn_emit() to look at the liveness of the values and allocate them
to TGSI temporaries. In order to do that, we have to switch from
ureg_OPCODE() emitting TGSI tokens directly to a new ntt_OPCODE() that
stores the ureg args in a block structure.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14404>
PIPE_CAP_FORCE_PERSAMPLE_INTERP is broken for the no-attachment case, so
this is the only option
fixes (lavapipe):
KHR-GL46.sample_shading.render*
dEQP-GLES31.functional.sample_shading.min_sample_shading*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14852>
This cap bit only affects DRI_PRIME setups. Since iris now uses the
blitter to perform dGPU -> iGPU copies asynchronously, it's better to
always use at least two backbuffers so the 3D engine can start rendering
the next frame during the copy.
See commit d17e752857 where this change
was made for radeonsi.
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13877>
In a hybrid graphics setup, Mesa allocates two buffers for the window
surface. The first is what the discrete card renders to; it lives in
VRAM and is usually tiled and possibly compressed. The second is a
shadow copy that lives in system memory (readable by the integrated
card with the displays); it's usually linear and uncompressed.
Mesa's window system code schedules blits to update the shadow copy
when needed, typically at the end of a frame. These can be fairly
costly when running a full-screen application at high resolutions.
We'd like to use the blitter for these copies, as it lets us perform
the copy asynchronously, letting the 3D engine race ahead and start
rendering the next frame. If we used the 3D engine, the next frame
could not start rendering until the PRIME blit finishes, giving us
less time to draw the frame. Fortunately, Tigerlake introduced new
blitter commands which can operate at full memory bandwidth.
DRI PRIME blits happen via the Gallium blit() hook. We can detect that
case by looking for the PIPE_BIND_PRIME_BLIT_DST flag on the destination
resource. This patch detects that case and calls iris_copy_region() on
IRIS_BATCH_BLITTER to handle it. We know a priori that the blitter can
handle this operation (it's not a scaled blit, the formats match and
should not be 96bpp, there's no combined depth stencil, or other weird
edge cases). blorp_copy() will also assert that edge cases don't occur.
Together with the next patch, this improves performance on DG1 Hybrid
scenarios by about 5-6%.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13877>
Use drmSyncobjSignal to signal out_syncobjs when a GPU job submission
ends in the simulator. With this, we can enable multisync support in the
simulator and keep the multisync approach to process fence by submitting
a serialized no-op job that adds the fence to the array of out syncobjs,
i.e. syncobjs to be signaled in the kernel when a job completes (job
post deps).
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14768>
The existing logic would drop the low bit. Instead, let's drop the high
bit, do the conversion, and then add the fixed constant back in if the
value had the high bit set originally.
Fixes KHR-GL45.direct_state_access.vertex_arrays_attribute_format on
drivers that use this module to handle the format conversion.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Emma Anholt <emma@anholt.net>
Tested-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14922>
Saves instructions if the same fabs value is used multiple times.
i915g:
total instructions in shared programs: 397005 -> 396525 (-0.12%)
instructions in affected programs: 11061 -> 10581 (-4.34%)
LOST: 0
GAINED: 22
r300 (not r500):
total instructions in shared programs: 180286 -> 179767 (-0.29%)
instructions in affected programs: 27102 -> 26583 (-1.91%)
total temps in shared programs: 29692 -> 29638 (-0.18%)
temps in affected programs: 356 -> 302 (-15.17%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14938>
Given that our fcsels are on float-bools, we can emit the LRP directly and
save the backend having to emit a SLT to turn the CMP src[0] into a bool.
This required passing a codegen flags struct for nir-to-tgsi. I think
this is a good way forward for it, as the alternative I think has mostly
been adding flags to nir_shader_compiler_options (since adding
PIPE_SHADER_CAPs is an unreasonable amount of pain).
r300 shader-db:
total instructions in shared programs: 1484320 -> 1472463 (-0.80%)
instructions in affected programs: 243588 -> 231731 (-4.87%)
total temps in shared programs: 212485 -> 212143 (-0.16%)
temps in affected programs: 3845 -> 3503 (-8.89%)
Acked-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14886>
The main thing is VK 1.3 testing, but also includes test bugfixes. The
1.3 CTS required an uprev of deqp-runner to handle a new style of test
output, and that deqp-runner brings in some neat new features, too (piglit
in your deqp-runner suite, and extension list checking).
A bunch of VK tests got renamed, so I replaced panvk's custom test list
with simple include filters on the main test list.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (panvk)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14920>
Block compressed formats like ETC2 are now indicated in the plane descriptor,
rather than the pixel format descriptor. Various other minor formats were
removed in Valhall; remove them from the XML so we don't accidentally try to use
them.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14935>
Remove a few that no longer exist, and rename IDVS helper to Malloc Vertex. The
distinction between Malloc Vertex jobs and regular Indexed Vertex jobs is that
the hardware allocates varying buffers dynamically for Malloc Vertex jobs.
Regular IDVS and even legacy tiler jobs are also supported where desired.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14935>
Merged with the Buffer descriptor, hence why it shares a type nibble. However,
Bifrost uses a dedicated tiler heap descriptor, and I see no benefit to merging.
So pretending it's a dedicated descriptor on Valhall too allows us to reuse the
Bifrost code with no modifications.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14935>
The Entrypoint class already has utilities for gettingt he parameter
list as either declarations or as comma-separated argument names for a
call. Use that instead of hand-rolling it. The only modification we
need to make is to add the ability to start the list somewhere other
than at the beginning.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14919>
This fixed a problem for Zink where uniform buffer alignment varies by
GPU, e.g. 64 bytes for an RTX 2070 SUPER but 256 bytes for a GTX 1070
Ti.
Tested running Superposition on Windows 10 with Nvidia 1070 Ti with
496.13 driver. Without the fix Superposition soft locks on its splash
screen. With the fix Superposition runs through its benchmark.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14674>
The bit fields in zink_framebuffer_state cause a false positive with
MSVC's run-time checks enabled. setting state.num_attachments in
zink_get_framebuffer_imageless(). Writing some bits of num_attachments
involves reading bits from layers and samples that haven't been
initialized.
Fixed by assigning to num_attachments earlier in the function. Not
quite sure why that makes a difference but at a guess there's a
heuristic that considers assignment close to declaration as
initialization.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14566>
In the future we'll want to reuse this intrinsic to deal with ray
queries. Ray queries will use a different global pointer and
programmatically change the control/level arguments of the trace send
instruction.
v2: Comment on barrier after sync trace instruction (Caio)
Generalize lsc helper (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
We'll want different types of IDs based on topology. Let's make this
more flexible and also move the bit shifting code a layer above where
it's easier to do bitshifting operations, especially if you need to
stash things into temporary registers.
v2: Keep previous comment.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
spec requires that the number of timeline waits/signals matches the
base number of waits/signals if there are any timeline semaphores
being processed by the submit, so asserting here is in line with what
validation will yield
failure to match these will also hang every driver I've tested, so asserting
here potentially saves some people their desktop session
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14741>
these regions might not have the coords in the correct order, which will
cause them to fail intersection tests, resulting in clears that are never
applied
cc: mesa-stable
fixes:
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_all_buffer_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_color_and_depth_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_color_and_stencil_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_linear_filter_color_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_magnifying_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_minifying_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_missing_buffers_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_nearest_filter_color_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_negative_dimensions_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_negative_height_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_negative_width_blit
GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_scissor_blit
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14867>
e9c3dbd046 added PIPE_BIND_DRI_PRIME but it was only set when
importing a prime buffer.
This commit adds handling of this flag in the other codepath = the
one where the prime buffer is allocated by the render GPU.
With this change PIPE_BIND_DRI_PRIME is still only set for the
render GPU - the display GPU will never see this flag; a future
commit will rename it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14615>
To flush the blitter, we need to use MI_FLUSH_DW rather than the usual
PIPE_CONTROL we use on the 3D engine. Most of our code is set up to
suggest flushes via PIPE_CONTROL commands, however, so we hackily just
emit MI_FLUSH_DW when they ask for any kind of PIPE_CONTROL flush.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14912>
Graphics Flight Recorder is:
"The Graphics Flight Recorder (GFR) is a Vulkan layer to help
trackdown and identify the cause of GPU hangs and crashes.
It works by instrumenting command buffers with completion tags."
This is a nice little tool which could help quickly identify the call
which hanged. Or if command buffer is executed for too long.
The tiling nature of our GPU shouldn't be a big issue aside from
lower performance.
For non-segfault case, if:
- Hang happens at the same place in cmdbuf and draw/dispatch is not
finished at that point - it is likely that there is an infinite
loop in some of the shaders in this draw.
- Hang happens always in different place - likely there is nothing
wrong and command buffer just takes too long to execute and you
should try increasing hangcheck_period_ms. If it doesn't help
it is likely a synchronization issue.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13553>
If the surface already has a current backbuffer - say through a
buffer_age query - we do not want to replace it in get_buffers, because
it means the result we'd previously returned them is stale.
If we already have a backbuffer set on the surface, keep it locked in no
matter what until we hit SwapBuffers.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14873>
A buffer age of 0 means that the buffer is uninitialised or has unknown
content. We rely on the buffer age initially being 0 through zalloc when
the surface is first created; when they are first used for a swap, we
set their age to 1, and then we increment the age of every buffer in the
chain with a non-zero age when we swap.
Now that we can release buffers, both through dmabuf-feedback as well as
detecting when we're using a deeper swapchain than the compositor needs,
make sure to reset their age as they are released. Without doing this,
the age will stay as it was before it was released and be incremented,
returning the wrong age to the user the first time a previously-released
buffer slot has been reused.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5977
Fixes: 22d796feb8 ("egl/wayland: break double/tripple buffering feedback loops")
Fixes: b5848b2dac ("egl/wayland: use surface dma-buf feedback to allocate surface buffers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14873>
The spec states that descriptor set layouts can be destroyed almost
at any time:
"VkDescriptorSetLayout objects may be accessed by commands that
operate on descriptor sets allocated using that layout, and those
descriptor sets must not be updated with vkUpdateDescriptorSets
after the descriptor set layout has been destroyed. Otherwise,
descriptor set layouts can be destroyed any time they are not in
use by an API command."
Based on ANV.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5893
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14621>
If we have batch a + b, and writing to batch b, causes batch a
to flush, all the bo->index get reset, and we try to submit a -1
to the kernel.
Look the bo index up when creating relocations.
Fixes crash seen in KHR-GL46.compute_shader.pipeline-post-fs
and a trace from Wasteland 3
Fixes: f3630548f1 ("crocus: initial gallium driver for Intel gfx 4-7")
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14905>
Like explicit LODs, biases must be 16-bit, so add a lowering rule for
this. With the LOD mode selection updated for txb, we can then ingest
biases like explicit LODs and allowlist txb. Passes:
dEQP-GLES2.functional.shaders.texture_functions.fragment.texture2d_bias
dEQP-GLES2.functional.texture.mipmap.2d.bias.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14899>
In cases where the to-be-allocated node size with padding exceeds BLOCK_SIZE
but without padding doesn't, a new block is not created and no padding is done
to the previous instruction, causing a misaligned pointer to be returned.
v2: Per Ilia Mirkin's suggestion, remove the extra condition in the first
if statement, let it unconditionally pad the last instruction if needed.
The updated currentPos will then be taken into account in the
block size checking.
This fixes crash seen with lightsmark and Optuma apitraces
Fixes: 05605d7f53 (' mesa: remove display list OPCODE_NOP')
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Tested-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14871>
When new context was created, shared_mem_size was getting overwritten.
This fixes glretrace failure seen with manhattan, aztec and BASS2_intro
apitraces
Fixes: 247c61f2d0 ('svga: Add support for compute shader, shader buffers and image views')
Tested with glretrace, piglit
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
(cherry picked from commit dd6793ec9218782b1b716a87582d7219bae4e75f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14870>
The limit here is from the RENDER_SURFACE_STATE height/width/depth
fields - it's 2^30 for ISL_FORMAT_RAW buffers, and 2^27 otherwise.
anv_isl_format_for_descriptor_type() uses ISL_FORMAT_R32G32B32A32_FLOAT
for uniform buffers when compiler->indirect_ubos_use_sampler is set
(Icelake and earlier), but ISL_FORMAT_RAW when it isn't (Tigerlake+).
So we can increase the limit on Tigerlake and later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14892>
We are updating the deadcode state while walking the program backwards.
When encountering ENDLOOP, we scan the loop, mark everything in the loop
as used and than continue as usuall. We were previously trying to be
smart with the breaks. This was however not working as expected.
Instead, save the most pesimistic deadcode state from the ENDLOOP and
just restore it anytime we see a break.
This keeps the code simple and more importantly does not touch the flat
and IF(-ELSE)-ENDIF paths at all so reduces the chances of regression.
No changes with my shader-db.
Fixes piglits on RV530:
shaders/ssa/fs-if-def-else-break.shader_test
spec/glsl-1.10/execution/vs-loop-array-index-unroll.shader_test
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5832
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14661>
Compute shaders might do vec3(Xbits) loads which are translated
to LD.<next_pow2(3 * X)> by the midgard compiler. This might cause
out-of-bound accesses potentially leading to pagefaults if the
access is at the end of a BO. One solution to avoid that (maybe not
the best) is to split non-log2 loads to make sure we only read what's
requested.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14885>
Swizzling happens in 2 steps on Midgard:
1. Vector expansion/shuffling
2. Swizzling at the instruction-size granularity, but defined using
the source size. Those size are different if the source is expanded.
So, when we print 64 bit swizzles on an expanded source, we first need
to apply an offset if the high part of the 32bit vector was selected,
and then divide the result by 2 to account for vector expansion.
To sum-up, swizzling on midgard is complicated, and I'm not sure I got
it right, but it seems to print what I expect on the few compute
shaders using 64bit arithmetic I debugged.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14885>
Even though 8-bit ALUs are not supported, we can have [un]pack_32_4x8
instructions which translate to IMOVs, and those operate on 8-bit
vectors. The problem is, the swizzling granularity is 16 bit, which
means we don't support
MOV.i8 R0.xyzw, TMP0.xxxx, R1.zyxw
and the compiler doesn't even complain, it just applies 8 bit
swizzling directly, which obviously doesn't work.
This is probably not the right way to fix that, but I thought I'd
raised the issue with a hack to fix, so we can get the discussion
started.
(Found while debugging FB store lowering on Midgard).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14885>
The register allocator assumes instructions are executed sequentially
and allows one instruction to overwrite a portion of a register written
by a previous instruction if this portion is never used. But scalar and
vector ALUs might be executed in parallel if they are part of the same
bundle, and when such instructions write to the same portion of the
register file, the result is undefined.
Let's add intra-bundle interferences to avoid this situation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14885>
We didn't remove desc set from the pool's list if pool was
host_memory_base. On the other hand in there is no point in removing
desc set from the list in DestroyDescriptorPool/ResetDescriptorPool.
Fixes: da7a4751
("turnip: Drop references to layout of all sets on pool reset/destruction")
Fixes cts tests:
dEQP-VK.api.buffer_marker.graphics.default_mem.bottom_of_pipe.memory_dep.draw
dEQP-VK.api.buffer_marker.graphics.default_mem.bottom_of_pipe.memory_dep.dispatch
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14855>
Turn the context state and shader key into a bitmask. When lowering
the depth invert into the shader, scan for writes to viewport index.
If found, move position to the end of the function (or current vertex)
and check if the current viewport needs the depth invert before applying.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14881>
This pass (lowering discard_if to control flow and unconditional
discard) was originally written for Zink, but is useful for hardware
that lacks conditional discard instructions like AGX. In theory AGX
could implement a conditional discard with CSEL, but the vendor
compiler uses a lowering like this one. Since I like not writing code,
I'd like to use the pass that's already in tree.
v2: Don't preserve dominance (Jason). Assert we don't see demotes or
terminates (Jason). Add Mike's ack.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14217>
The bug here is that the DRI context "flags" are intended to alias the
GLX context flag values, and they don't, DRI's no-error flag is GLX's
reset-isolation flag. GLX (and EGL!) treat no-error as a context
attribute, and reset isolation predates Mesa's no-error implementation
by several years. The GL_KHR_no_error spec does describe it as a
"context flag", though, so maybe that's why we do it as a (DRI) context
flag.
In order to unalias these we need a new contract with the loader. We
remove the old __DRI_NO_ERROR extension, and add a new
__DRI_RENDERER_HAS_CONTEXT_NO_ERROR value to query. Loaders can key on
that to know to pass no-error-ness through as a context attribute,
matching the GLX/EGL calling convention. We go ahead and define
__DRI_CTX_FLAG_RESET_ISOLATION as well, and update the drivers to refuse
it since we don't support it yet.
This means mismatched drivers/loaders will not be able to create
no-error contexts. Too bad. If you want performance that badly you can
build both things at once.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12474>
Meson devenv is a feature added in meson 0.58 (thus the features is
version guarded) that allows creating a shell environment with
environment variables automatically setup for running the project inside
the build dir. Some variables (such as LD_LIBRARY_PATH and PATH) are set
automatically, others must be added by the project.
For vulkan is is relativley simple, we create a new, uninstalled, icd
file for each driver and set the VK_ICD_FILENAMES variable
appropriately. This can be used with:
```sh
meson devenv -C $builddir
```
then, vulkan applications will automatically use the uninstall vulkan
driver, no need to install.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14826>
The syntax we're using doesn't work when included into C++ sources. So
let's make it C++ compabible.
It turns out, nobody needs this extra definition which is what's causing
issues. Let's just make the initializer trivial without casting the
struct.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14850>
We're about to need including this header from a C++ source, so let's
add some explicit casts for C++ compatibility.
In one case we can make things a bit cleaner by moving the
char-pointer-ism to the place that needs it, so let's clean that up
while we're at it.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14850>
Moved nir_lower_compute_system_values to lower
load_local_invocation_index which could be emitted by
nir_zero_initialize_shared_memory.
Relevant CTS tests:
dEQP-VK.compute.zero_initialize_workgroup_memory.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14829>
IRIS_BATCH_BLITTER isn't supported prior to Tigerlake; in general,
batches may not be supported on all hardware. In most cases, querying
them is harmless (if useless): they reference nothing, have no commands
to flush, and so on. However, the fence code does need to know that
certain batches don't exist, so it can avoid adding inter-batch fences
involving them.
This patch introduces a new iris_foreach_batch() iterator macro that
walks over all batches that are actually supported on the platform,
while skipping the others. It provides a central place to update should
we add or reorder more batches in the future.
Fixes various tests in the piglit.spec.ext_external_objects.* category.
Thanks to Tapani Pälli for catching this.
Fixes: a90a1f15 ("iris: Create an IRIS_BATCH_BLITTER for using the BLT command streamer")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14834>
viewport_index won't be >= 16
layer should be < 2048
view_index should be < 2048
this leaves the last 64-bits as padding, which something
expects, but not having to write to it means we have to write
less memory every triangle.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14846>
the bboxpos = bbox copy was visible overhead on perf traces,
but we don't need to really do it.
Extract all the things from bbox needed early, then don't copy
it just overwrite it.
use boolean to fix power build.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14846>
The simple-varyings piglit test attempts to use GL_MAX_VARYING_FLOATS
varyings, PLUS one additional vector for position (which is not used
as input to the PS). "Reserve" that additional position vector by
removing it from the max varyings and max PS inputs.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Reviewed-By: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14837>
When dealing with a vertex input that takes multiple rows, the value of
nir_intrinsic_base points to a driver-location-based index, but we need
to emit a location-based index (or more specifically, an index that
increments once per input, not once per register). Add a mapping to
the module of base -> ID.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Reviewed-By: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14837>
This brings in some interesting new vulkan tests and fixes for the
spurious KHR-GL TF failures. Also, reduces the runtime of
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36 so that it
should stop timing out.
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
It triggered on uprevving VK-GL-CTS, and @airlied says it's tripped
apparently spuriously before. There seems to be some interesting logic
behind it, so leave the big comment for whoever can revisit the issue some
day.
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
This ran into OOM-kills with the CTS uprev. Looking at caselists at the
time of fail, some had 500MB of system memory used by the CTS (mostly
spirv string codegen), plus whatever BOs were allocated, and the lazors
are only 4GB it looks like.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
Between adding features and increased test coverage, we're hitting the
1-hour job limit. !13441 tried to increase the full run timeout for LAVA,
but by having not bumped the gitlab-ci timeout value it ended up just
letting the job keep running in LAVA after gitlab had given up on it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
In the autotune merge, we added another 1/15th run of a configuration
knob, thinking that was small enough to be in the noise. But actually the
main run is only 1/9th, so another 1/15th took us from nearly hitting the
job runtime target, to totally missing it. Crank things back down to keep
MRs flowing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
Interleave mode specified per-plane on Valhall. The texture descriptor proper
merely has a flag specifying whether planes are somehow interleaved
(u-interleaved, AFBC, or block compressed formats) or whether they are all
linear (and uncompressed).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14851>
Flesh out the ZS/CRC XML, adding fields required for AFBC. Valhall allows AFBC
compressing stencil buffers independent of depth buffers, which is a new feature
since Bifrost. That results in a shuffling of the descriptor.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14851>
This looks superficially like the Bifrost "Surface" descriptor, but it
additionally specifies the in-memory representation of blocks (clumps). If I
understand correctly, decompression is controlled by the plane descriptor,
rather than the texture descriptor level. This is a bit more flexible than
Bifrost.
Once the new fields here are wired up to Mesa, my
dEQP-GLES2.functional.texture.* failures should go away... I hope!
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14851>
After starting to use a new version of the simulator, it got
outdated.
We made some initial effort to update it, but it was not
working. Taking into account that no one is using it, it is better to
just remove it.
We keep the noop drm drivers, as they could have some value for
developers that doesn't have access to the v3dv3 simulator.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14682>
Without this change LLVM 12 hits this error:
"""
LLVM ERROR: Error while trying to spill SGPR0_SGPR1 from class SReg_64:
Cannot scavenge register without an emergency spill slot!
"""
when running glcts KHR-GL46.arrays_of_arrays_gl.AtomicUsage test.
Fixes: 9ff086052a ("radeonsi: unroll loops of up to 128 iterations")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14848>
output mode is able to interleave across buffers. This is required for
ARB_transform_feedback3.
*``PIPE_CAP_TGSI_CAN_READ_OUTPUTS``: Whether every TGSI shader stage can read
*``PIPE_CAP_SHADER_CAN_READ_OUTPUTS``: Whether every TGSI shader stage can read
from the output file.
*``PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY``: Tell the GLSL compiler to use
the minimum amount of optimizations just to be able to do all the linking
@@ -426,7 +428,7 @@ The integer capabilities:
operations are supported.
*``PIPE_CAP_TGSI_TEX_TXF_LZ``: Whether TEX_LZ and TXF_LZ opcodes are
supported.
*``PIPE_CAP_TGSI_CLOCK``: Whether the CLOCK opcode is supported.
*``PIPE_CAP_SHADER_CLOCK``: Whether the CLOCK opcode is supported.
*``PIPE_CAP_POLYGON_MODE_FILL_RECTANGLE``: Whether the
PIPE_POLYGON_MODE_FILL_RECTANGLE mode is supported for
``pipe_rasterizer_state::fill_front`` and
@@ -434,10 +436,10 @@ The integer capabilities:
*``PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE``: The page size of sparse buffers in
bytes, or 0 if sparse buffers are not supported. The page size must be at
most 64KB.
*``PIPE_CAP_TGSI_BALLOT``: Whether the BALLOT and READ_* opcodes as well as
*``PIPE_CAP_SHADER_BALLOT``: Whether the BALLOT and READ_* opcodes as well as
the SUBGROUP_* semantics are supported.
*``PIPE_CAP_TGSI_TES_LAYER_VIEWPORT``: Whether ``TGSI_SEMANTIC_LAYER`` and
``TGSI_SEMANTIC_VIEWPORT_INDEX`` are supported as tessellation evaluation
*``PIPE_CAP_TES_LAYER_VIEWPORT``: Whether ``VARYING_SLOT_LAYER`` and
``VARYING_SLOT_VIEWPORT`` are supported as tessellation evaluation
shader outputs.
*``PIPE_CAP_CAN_BIND_CONST_BUFFER_AS_VERTEX``: Whether a buffer with just
PIPE_BIND_CONSTANT_BUFFER can be legally passed to set_vertex_buffers.
@@ -533,8 +535,8 @@ The integer capabilities:
*``PIPE_CAP_SURFACE_SAMPLE_COUNT``: Whether the driver
supports pipe_surface overrides of resource nr_samples. If set, will
enable EXT_multisampled_render_to_texture.
*``PIPE_CAP_TGSI_ATOMFADD``: Atomic floating point adds are supported on
images, buffers, and shared memory.
*``PIPE_CAP_IMAGE_ATOMIC_FLOAT_ADD``: Atomic floating point adds are
supported on images, buffers, and shared memory.
*``PIPE_CAP_RGB_OVERRIDE_DST_ALPHA_BLEND``: True if the driver needs blend state to use zero/one instead of destination alpha for RGB/XRGB formats.
*``PIPE_CAP_GLSL_TESS_LEVELS_AS_INPUTS``: True if the driver wants TESSINNER and TESSOUTER to be inputs (rather than system values) for tessellation evaluation shaders.
*``PIPE_CAP_DEST_SURFACE_SRGB_CONTROL``: Indicates whether the drivers
@@ -577,7 +579,8 @@ The integer capabilities:
*``PIPE_CAP_TEXTURE_SHADOW_LOD``: True if the driver supports shadow sampler
types with texture functions having interaction with LOD of texture lookup.
*``PIPE_CAP_SHADER_SAMPLES_IDENTICAL``: True if the driver supports a shader query to tell whether all samples of a multisampled surface are definitely identical.
*``PIPE_CAP_TGSI_ATOMINC_WRAP``: Atomic increment/decrement + wrap around are supported.
*``PIPE_CAP_IMAGE_ATOMIC_INC_WRAP``: Atomic increment/decrement + wrap around
are supported.
*``PIPE_CAP_PREFER_IMM_ARRAYS_AS_CONSTBUF``: True if gallium frontends should
turn arrays whose contents can be deduced at compile time into constant
buffer loads, or false if the driver can handle such arrays itself in a more
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.